09 Lesson 3C

by A. Mani

Part-C: Loops, Control Structures.

If you need to use them in R, then either you are trying to develop a package, have discovered a bug or do not understand the various "*apply" functions in R.

The function ''tapply'' can be used to apply a function over a ragged array, that is to each group (nonempty) of values given by a unique combination of the levels of certain factors.

Usage:

tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

Read help(tapply) for details.

Extended Example:

Dataset: ETPlanet.csv

You can find it in the lesson 3 folder. Assume that heights are in centimetres and the ages to be in years. The first column has categorical data.

1. For each ''x'', find the mean height of persons aged ''x'' years.

This can be done so:

>tapply(Height, Age, mean)

This is the same as

>tapply(Height, Age, function(x) {mean(c(x))})

This form lets you know about defining functions in general.

The general form is

function(arguments) {expression}

Other functions like lapply, sapply, apply, mapply can be used for similar purpose. ''tapply'' is the most advanced of these optimized loops.

2. What are the differences between these?

3. Use them over the dataset to illustrate the differences.

4. Analyse the dataset in more detail.

5. Split the dataset into 5 other datasets and use ''mapply'' to study the datasets by ''Health_Category''.