This is an archived course. Information may be out of date, some links are broken and email addresses have been removed.

05 Lesson 2

Lesson-2: Data Frames and Basic Statistics

by A. Mani

1. Use the read.table() function to read the file "kernel.csv" (in lesson-2 folder) as the object ''abc''. The first column is actually truncated from 2.6.1, 2.6.2 ...

2. Is ''abc'' a data.frame?

Data frames are basically general forms of lists with many properties of matrices. Most R packages are designed to manipulate these.

The function data.frame() can be used to create data frames. ''as.data.frame()'' can be used for coercing other objects into data frames, while ''is.data.frame()'' is for testing.

3. dim(abc) gives you the dimension of ''abc''. In R, we can use arrays of arbitrary index too.

4. Explain each LOC (also run it):
efg <- abc[,2]
devel <- c(389,566,545,553,612,709,726,815,801,673,767,870,912,1057,1123,1027,1021,1075,1180,1150)
dim(devel)
devel <- as.data.frame(devel)
dim(devel)
lt <- abc[2,]
lt
kf <- abc[[6,3]]
ef <- abc[,3]

kf==ef

ef <- abc

[6,]
kf==ef

5. Create a new data frame ''zwi'' including abc and a column for "number_of_developers" (''devel'' in the above)

6. Insert NA in the positions (2,3), (3, 6) of the new data frame and store it as ''bli''
Hint. use is.na(x)<-

7. Explain each LOC (alsorun it):
sequence <- seq (1, 500, by=0.75)
vec1 <- as.vector(sequence)
vec2 <- as.vector(2:42)
Make a matrix and a data.frame with these two vectors (avoiding problems if any)
ref<- c(rep(4,5),rep(5,6), rep(2,3), rep(3,5))
table(ref)
rnorm(150)
The last one generates 150 random numbers following normal distribution.
asref <- table(as.integer(ref*rnorm(150)))
kfr <- data.frame(asref)
Draw nice Histograms.

8. We have plenty of optimised built-in statistical functions in R. In general we can do one thing in a many different ways and with many different libraries. Compute all measures of central tendency, deviations from such measures (sd, etc), quantiles at 0.3, 0.62, 0.72, quartiles, correlation coefficients, rank correlation coeffs, in all of the above data frames.

Hint. summary(), quantile(),

9. Investigate the plot function in the context of the above data frames. Do at least ten colourful plots with many variations.

Note: A supplementary on Basic Statistics is in process