03 Floating Point Errors: Addendum

I'm going through the chapter, and I'll try to clarify the start of it.
I was going to do the whole chapter, but I'm not feeling well right now.
Maybe one of the other experienced programmers will go on from where I
left off.

I'm assuming you know how to do basic math with scientific notation (or
whatever your math teacher called numbers like 3.24*10^3), with pen and
paper. If you don't, either say something or find one of your old math
textbooks. It's not something most people do everyday.

If my commentary doesn't help, ask for more detail.

In this section:
Floating-point Addition/Subtraction
-----------------------------------
To ADD two numbers like 2.0 and 0.3, you must perform the following
steps:

... this section is describing what the computer is doing, not what you
must do. It's allowing you to work through in decimal what the computer
works through in binary or octal or hex. The multiplication does the
same, of course.

1. Start with the following numbers.

+2.000E+0 The number 2.0
+3.000E-1 The number 0.3

2. Add guard digits to both numbers.

+2.0000E+0 The number 2.0
+3.0000E-1 The number 0.3

The guard digit is there to improve the accuracy of the operation, and
in case the next step pushes significant digits off the end.

3. SHIFT the number with the smallest exponent to the right one digit
and increase its exponent. Continue until the exponents of the two
numbers match.

+2.0000E+0 The number 2.0
+0.3000E+0 The number 0.3

This is another way of saying 'make the exponents match by multiplying
or dividing by the base number until they do'.

4. ADD the two fractions. The result will have the same exponent as
the two numbers.

+2.0000E+0 The number 2.0
+0.3000E+0 The number 0.3
----------------------------
+2.3000E+0 Result 2.3

5. Normalize the number by shifting it left or right until there is
just one nonzero digit to the left of the decimal point. Adjust
the exponent accordingly. In this example, the result is already
normalized. The result +0.1234E+0 would be nomalized to +1.234E-1.

This is just to make it match to the usual way we represent things in
scientific notation.

6. Finally, if the guard digit is less than or equal to 5, round the
next digit up; otherwise truncate the number.

+2.3000E+1 Round last digit
+2.300E+0 Result 2.3

To return us to our usual size.

7. For floating-point subtraction, change the sign of the second
operand and add.

Standard math, again.

I'm not commenting the multiplication section, because the comments
would be much the same.

--
"Do you ever wonder if there's a whole section of geek culture
you miss out on by being a geek?" - Dancer.
My book 'Essential CVS' will be published by O'Reilly in 2003.
jenn@anthill.echidna.id.au http://anthill.echidna.id.au/~jenn/