15A More about the preprocessor

The following is from K.N.King's _C Programming A Modern Approach_
Chapter 14 "The Preprocessor". (WWNorton, 1996)

The preprocessor is a piece of software that edits C programs just
prior to compilation. Its reliance on a preprocessor makes C (along
with C++) unique among major programming languages.

The preprocessor is a powerful tool, but it also can be a source of
hard-to-find bugs. Moreover, the preprocessor can easily be misused
to create programs that are almost impossible to understand. Modern C
programming style calls for decreased reliance on the preprocessor.

How the preprocessor works
--------------------------
The behaviour of the preprocessor is controlled by DIRECTIVES: commands
that begin with a hash character (#). We've encountered two of these
already, the #include and the #define.

The #define directive defines a MACRO -- a name that represents
something else, typically a constant of some kind. The preprocessor
responds to a #define directive by storing the name of the macro
together with its definition. When the macro is used later in the
program, the preprocessor "expands" the macro, replacing it by its
defined value.

The #include directive tells the preprocessor to open a particular
file and "include" its contents as part of the file being compiled.
For example, the line
#include <stdio.h>
instructs the preprocessor to open the file named stdio.h and bring
its contents into the program. It works like this:

C Program -> Preprocessor -> Modified C Program -> Compiler -> Program

The input to the preprocessor is a C program, possibly containing
directives. The preprocessor executes these directives, removing them
in the process. The output of the preprocessor is another C program:
an edited version of the original program, containing no directives.
The preprocessor's output goes directly into the compiler, which checks
the program for errors and translates it to object code (machine
instructions).

To see what the preprocessor does, let's apply it to a sample program:

/* Converts a Fahrenheit temperature to Celcius */
#include <stdio.h>
#define FREEZING_PT 32.0
#define SCALE_FACTOR (5.0 / 9.0)
int main(void)
{
float fahrenheit, celcius;
printf("Enter Fahrenheit temperature: ");
scanf("%f", &fahrenheit);
celcius = (fahrenheit - FREEZING_PT) * SCALE_FACTOR;
printf("Celcius equivalent is: %.1f\n", celcius);
return 0;
}

After preprocessing the program may have the following appearance:

Blank line
Blank line
Lines brought in from stdio.h
Blank line
Blank line
Blank line
Blank line
main()
{
float fahrenheit, celcius;
printf("Enter Fahrenheit temperature: ");
scanf("%f", &fahrenheit);
celcius = (fahrenheit - 32.0) * (5.0 / 9.0);
printf("Celcius equivalent is: %.1f\n", celcius);
return 0;
}

The preprocessor responded to the #include directive by bringing in the
contents of stdio.h, which is not shown here because of its length. The
preprocessor also removed the #define directives and replaced FREEZING_PT
and SCALE_FACTOR wherever they appeared later in the file. Notice that
the preprocessor doesn't remove lines containing directives; instead, it
simply makes them empty.

As this example shows, the preprocessor does a bit more than just execute
directives. In particular, it replaces each comment with a single space
character. Some preprocessors go further and remove unnecessary white-
space characters, including spaces and tabs at the beginning of indented
lines.

On MY Debian system, the preprocessor is called `cpp' and it can be run
on a source code file (or any other file) by simply supplying it with
an input file and an output file:

$ cpp input_file output_file

(The actual source code with the changes was at the very bottom of the
output_file, which was over 3500 lines long! Using vi, I `dd'd all
the extra lines, so just the source code was left.)

You can use this to experiment with what the preprocessor does, as well
as to look at the preprocessor output before compiling a program.

Caution: The C preprocessor is quite capable of creating illegal programs
as it executes directives. Often the original program looks fine, making
errors harder to find. In complicated programs, examining the output of
the preprocessor may prove useful for locating this kind of error.

Most preprocessor directives fall into one of three catgories:

1] Macro definition. The #define directive defines a macro; the #undef
directive removes a macro definition.

2] File inclusion. The #include directive causes the contents of a
specified file to be included in a program.

3] Conditional Compilation. The #if, #ifdef, #ifndef, #elif, #else, and
#endif directives allow blocks of text to be
either included in or excluded from a program,
depending on conditions that can be tested by the
preprocessor.

The remaining directives -- #error, #line, and #pragma -- are more
specialized and therefore used less often.

Let's look at a few rules that apply to ALL directives:

* Directives always begin with the hash (#) symbol. The # symbol does
not need to be at the beginning of a line, as long as only white
space precedes it. After the # comes the name of the directive, followed
by any other information the directive requires.

* Any number of spaces and horizontal tab characters may separate the
tokens in a directive. For example, the following directive is legal:

# define N 100

* Directives always end at the first new-line character, unless explicitly
continued. To continue a directive to the next line, we must end the
current line with a backslash (\) character. For example, the following
directive defines a macro that represents the capacity of a hard disk,
measured in bytes:

#define DISK_CAPACITY (SIDES * \
TRACKS_PER_SIDE * \
SECTORS_PER_TRACK * \
BYTES_PER_SECTOR)

* Directives can appear anywhere in a program. Although we usually put
#define and #include directives at the beginning of a file, other
directives are more likely to show up later, even in the middle of
function definitions.

* Comments may appear on the same line as a directive. In fact, it's good
practice to put a comment at the end of a macro definition to explain
the macro's significance:

#define FREEZING_PT 32.0 /* Freezing point of water */

The definition of a simple macro has the form:

#define identifier replacement-list

replacement-list is any sequence of C tokens; it may include identifiers
keywords, numbers, character constants, string literals, operators, and
punctuation. When it encounters a macro definition, the preprocessor
makes a note that `identifier' represents replacement-list, wherever
identifier appears later in the file, the preprocessor substitutes
replacement-list.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Don't put any extra symbols in a macro definition -- they'll become
part of the replacement list. Putting the = symbol in a macro definition
is a common error:

#define N = 100 /*** WRONG ***/

int a[N]; /* becomes int a[= 100]; */

Ending a macro definition with a semicolon is another popular mistake:

#define N 100; /*** WRONG ***/

int a[N]; /* becomes int a[100;]; */
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Using #define to create names for constants has several significant
advantages:

* It makes programs easier to read. The name of the macro--if well
chosen--helps the reader understanding the meaning of the constant.
The alternative is a program full of "magic numbers" that can easily
mystify the reader.

* It makes programs easier to modify. We can change the value of a
constant throughout a program by modifying a single macro definition.
"Hard-coded" constants are much harder to change, especially since
they sometimes appear in a slightly altered form.

* It helps avoid inconsistencies and typographical errors. If a numerical
constant like 3.14159 appears many times in a program, chances are it
will occasionally be written 3.1416 or 3.14195 by accident.

* Controlling conditional compilation. Macros play an important role
in controlling conditional compilation as we'll see later. For
example, the following line in a program might indicate that it's to be
compiled in "debugging mode", with extra statements included to
produce debugging output:

#define DEBUG

It is legal for a macro's replacement list to be empty.

The definition of a parameterized macro has the form:

#define identifier(x1 , x2 , . . . . xn) replacement-list

where x1,x2,...xn are identifiers (the macros parameters). The parameters
may appear as many times as desired in the replacement list.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
There must be NO SPACE between the macro name and the left parentheses.
If space is left, the preprocessor will assume that we're defining a
simple macro, with (x1,x2,...xn) part of the replacement-list.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Examples:

#define MAX(x,y) ((x)>(y)?(x):(y))
#define IS_EVEN(n) ((n)%2==0)

Now suppose that the following statements appear later in the program:

i = MAX(j+k, m-n);
if (IS_EVEN(i)) i++;

The preprocessor will replace these lines with:

i = ((j+k)>(m-n)?(j+k):(m-n));
if (((i)%2==0)) i++;

As this example shows, parameterized macros often serve as simple functions.
MAX behaves like a function that computes the larger of two values.
IS_EVEN behaves like a function that returns 1 if its argument is an
even number and 0 otherwise. Here's a more complicated macro that
behaves like a function:

#define TOUPPER(c) ('a'<=(c)&&(c)<='z'?(c)-'a'+'A':(c))

This macro tests whether the character c is between 'a' and 'z'. If so,
it produces the upper-case version of c by subtracting 'a' and adding
'A'. If not, it leaves c unchanged.

Here are some more rules for macros:

* A macro's replacewment list may contain invocations of other macros.
For example, we could define the macro TWO_PI in terms of the macro
PI:
#define PI 3.14159
#define TWO_PI (2*PI)

When the preprocessor encounters TWO_PI later in the program, it
replaces it with (2*PI). The preprocessor then RESCANS the replacement
list to see if it contains invocations of other macros (PI in this case).
The preprocessor will scan the replacement list as many times as needed
to eliminate all macro names.

* The preprocessor replaces only entire tokens, not portions of tokens.
As a result, the preprocessor ignores macro names that are embedded
in identifiers, character constants, and string literals. Example:

#define SIZE 256

int BUFFER_SIZE;

if (BUFFER_SIZE > SIZE)
puts("Error: SIZE exceeded");

After preprocessing, these lines look like this:

int BUFFER_SIZE;

if (BUFFER_SIZE > 256)
puts("Error: SIZE exceeded");

* A macro definition normally remains in effect until the end of the
file in which it appears. The preprocessor doesn't obey scope normal
scope rules. A macro defined inside a function definition isn't local
to that function; it remains defined until the end of the file.

* Macros may be "undefined" by the #undef directive. The #undef directive
has the form: #undef identifier
where identifier is a macro name. Example: #undef N
removes the current definition of the macro N. (If N hasn't been
defined as a macro, the #undef directive has no effect.) One use
of the #undef is to remove the existing definition of a macro so
that it can be given a new definition.

-----------------------
CONDITIONAL COMPILATION
-----------------------
The C preprocessor recognizes a number of directives that support
conditional compilation--the inclusion or exclusion of a section
of program text depending on the outcome of a test performed by the
preprocessor.

Suppose we're in the process of debugging a program. We'd like the
program to print the values of certain variables, so we put calls
of printf() in critical parts of the program. Once we've located the
bugs, it's often a good idea to let the printf() calls remain, just
in case we need them later. Conditional compilation allows us to
leave the calls in place, but have the compiler ignore them when we
make the production version.

Here's how we'll proceed. We'll first define a macro and give it
a nonzero value:

#define DEBUG 1

The name of the macro doesn't matter. Next, we'll surround each group
of printf() calls by an #if-#endif pair:

#if DEBUG
printf("Value of i: %d\n, i);
printf("Value of j: %d\n, j);
#endif

During preprocessing, the #if directive will test the value of DEBUG.
Since its value isn't zero, the preprocessor will leave the two calls
of printf() in the program (the #if-#endif lines will disappear, though).
If we change the value of DEBUG to zero and recompile the program, the
preprocessor will remove all four lines from the program. The compiler
won't see the calls of printf(), so they won't occupy any space in the
object code and won't cost any time when the program is run. We can
leave the #if-#endif blocks in the final program, allowing diagnostic
information to be produced later (by recompiling with DEBUG set to 1).

The #ifdef directive tests whether an identifier is currently defined
as a macro:

#ifdef identifier

Using #ifdef is similar to using #if:

#ifdef identifier
lines to be included if identifier is defined as a macro
#endif

The #ifndef directive is similar to #ifdef, but tests whether an
identifier is NOT defined as a macro:

#ifndef identifier

#if, #ifdef, and #ifndef blocks can be nested just like ordinary `if'
statements. When nesting occurs, it's a good idea to use an increasing
amount of indentation as the level of nesting grows. Some programmers
put a comment on each closing #endif to indicate what condition the
matching #if tests:

#if DEBUG
. . . .
. . . .
#endif /* DEBUG */

#elif and #else can be used in conjuction with #if, #ifdef, or #ifndef
to test a series of conditions:

#if expr-1
lines to be included if expr-1 is nonzero
#elif expr-2
lines to be included if expr-1 is zero but expr2 is nonzero
#else
lines to be included otherwise
#endif

Although the #if directive is shown above, the #ifdef or #ifndef
directive can be used instead. Any number of #elif directives--
but at most one #else-- may appear between #if and #endif.

I hope this helps!

Happy Programming!
--
K