07 Lesson 3A | LinuxChix Courses

Lesson-3: Part-A

by A. Mani

Part-A: Univariate, Bivariate and Multivariate Data.

Advanced students and those short of time can directly do all of the analysis over the dataset referred to in Part-B.

For this lesson, we will follow the corresponding sections in the book http://cran.r-project.org/doc/contrib/Verzani-SimpleR.pdf . You should be able to:

Attach Multiple Data Frames

Make frequency tables from categorical data and vice-versa.

Make Histograms, Bar charts, Box plots and Stem and Leaf plots for representing bi-variate data.

Make correlograms from multivariate data

Use the function ''density'' with different methods of smoothing.

Read up to page 24 of the suggested tutorial.

Suggested Exercises: 3.6, 3.7, 3.8, 4.1, 4.2 and 4.3 and the exercises in the Part-B

1. In which cases are you dealing with population data?

Notes:

Functions intended to be interpreted over sample data are known as ''statistics''. Those intended to be interpreted over population data are called ''parameters''. If the samples are truly random, then we can possibly use statistical methods to estimate the parameters with the help of statistics. Such methods often require assumptions about the distribution of the population data. Graphs can also be very useful in making reasonable assumptions about possible distributions (and densities).

In Part-C, Chi-sq tests are also introduced by Anne Laure Buisson. You will need to see
http://www.graphpad.com/articles/pvalue.htm
for a elementary exposition on the use of p-values in statistical testing. As explained therein, p-values are used for drawing conclusions in many statistical tests. The choice of significance levels is very context dependent and proper use should be decided on the basis of additional empirical evidence.The required background for a proper understanding of statistical hypothesis testing will be touched upon in later parts of this course.