Computers represent and process information in chunks of a fixed size. Because computers were first used for numerical calculations, early computer engineers developed a representation for numbers in terms of fixed-size chunks. Programming languages must mediate the gap between these fixed-size representations and the true mathematics. Because using the hardware representation for numbers makes a program's calculations as efficient as possible, most designers and implementors of programming languages adopted the hardware-based choice.
This intermezzo explains the fixed-size representation for numbers and its consequences in some detail. The first subsection introduces a concrete fixed-size representation for numbers, discusses what it implies for the representation of numbers, and shows how to calculate with such numbers. The second and third section illustrate the two most fundamental problems of fixed-size number arithmetic: overflow and underflow, respectively.
Suppose we can use four digits to represent numbers. If we represent natural numbers, the representable range is 0 ...9999. Alternatively we could represent 10,000 fractions between 0 and 1 with that many digits. In either case, this is a rather small range of numbers and not useful for most scientific or business computations.
We can represent a larger range of numbers if we use a different notation for numbers instead. In science, for example, we encounter so-called scientific notation, which represents numbers as two parts:
For pure scientific notation, the base is between 0 and 9. We relax this constraint and just write numbers as
where m is the mantissa and e the exponent. For example, one representation of 1200 with this scheme is
another one is
In general, a number has several equivalents in mantissa-exponent representation.
We can also use negative exponents, which add fractions at the cost of one extra piece of data: the sign of the exponent. For example,
stands for
As before, the fraction has several representations in the new notation.
To use a form of mantissa-exponent notation for our problem, we must decide how many digits we wish to use for the representation of the mantissa and how many for the exponent. Here we use two for each and a sign for the exponent; other choices are possible. Given this decision, we can still represent 0 as
The maximal number we can represent is
which is 99 followed by 99 0's. If we use negative exponents in addition to positive ones, we can also represent
which is a small number close to 0. Thus we can now represent a vastly larger range of numbers with four digits and a sign than before. But this improvement comes with its own problems.
To understand the problems, it is best to agree on a fixed representation schema and to experiment with the number representations. Let's represent a fixed-size number with a structure that has three fields:
(define-struct inex (mantissa sign exponent))
The first and last field contain the mantissa and exponent of the
number, the sign
field is +1
or -1
and
represents the sign of the exponent. This sign field enables us to
represent numbers between 0
and 1
.
Here is the data definition:
An inex
is a structure:
(make-inex m s e)
m
and e
are natural numbers in
[0
,99
] and s
is +1
or
-1
.
Because the conditions on the fields of an inex
structure are so stringent, we use the function create-inex
to
create these structures. Figure 94 contains the
function definition for create-inex
, which is a generalized
constructor, that is, a checked constructor (see
section 7.5). The figure also defines the
function inex->number
, which turns inex
s into numbers
according to the principles of our new notation.
Let's translate the above example, 1200
, into our Scheme representation:
(create-inex 12 +1 2)
The alternative representation, 120 · 101, is illegal in our Scheme world, however. If we evaluate
(create-inex 120 +1 1)
we get an error message because the arguments don't satisfy the stated data
contract. For other numbers, though, we can find two inex
equivalents. One example is 0.0000000000000000005
, which we can
express as
(create-inex 50 -1 20) and (create-inex 5 -1 19)
Confirm the equivalence of these two representations with inex->number
.
The range of inex
numbers is vast:
(define MAX-POSITIVE (make-inex 99 +1 99)) (define MIN-POSITIVE (make-inex 1 -1 99))
That is, we can represent large numbers that consist of up to 101 digits in
the standard decimal notation; we can also represent small positive
fractions smaller than 1
down to the fraction 1 over 10...0
with 99 zeros. The appearances, however, are deceiving. Not all real
numbers in the range between 0 and MAX-POSITIVE
can be translated
into an inex
structure. In particular, any positive number less
than
has no equivalent inex
structure. Similarly, the inex
representation has gaps in the middle. For example, the successor of
(create-inex 12 +1 2)
is
(create-inex 13 +1 2)
The first inex
structure corresponds to 1200
, the second
one to 1300
. Numbers in the middle, such as 1240
or
1260
, can only be represented as one or the other. The standard
choice is to round the number to the closest representable equivalent.
In short, we must approximate such mathematical numbers as we translate
into a chosen representation.
Finally, we must also consider arithmetic operations on inex
structures. Adding two inex
representations with the same exponent
means adding the two mantissas:
(inex+ (create-inex 1 +1 0) (create-inex 2 +1 0)) = (create-inex 3 +1 0)
Translated into mathematical notation, we have
When the addition of two mantissas yields too many digits, we may have to find a suitable representation. Consider the example of adding
to itself. Mathematically we get
but we can't just translate this number naively into our chosen representation because 110 > 99. The proper corrective action is to represent the result as
Or, translated into Scheme, we must ensure that inex+
computes as
follows:
(inex+ (create-inex 55 +1 0) (create-inex 55 +1 0)) = (create-inex 11 +1 1)
More generally, if the mantissa of the result is too large, we must divide
it by 10
and increase the exponent by one.
Sometimes the result contains more mantissa digits than we can
represent. In those cases, inex+
must round to the closest
equivalent in the inex
world. For example:
(inex+ (create-inex 56 +1 0) (create-inex 56 +1 0)) = (create-inex 11 +1 1)
This corresponds to the precise calculation:
Because the result has too many mantissa digits, the integer division of
the result mantissa by 10
produces an approximate result:
This is an example of the many approximations that make INEXACT ARITHMETIC inexact.
We can also multiply numbers represented as inex
structures. Recall that
Thus we get:
or, in Scheme notation:
(inex* (create-inex 2 +1 4) (create-inex 8 +1 10)) = (make-inex 16 +1 14)
As with addition, things are not always straightforward. When the result
has too many significant digits in the mantissa, inex*
has to
increase the exponent:
(inex* (create-inex 20 -1 1) (create-inex 5 +1 4)) = (create-inex 10 +1 4)
In the process, inex*
will introduce an approximation if the true
mantissa doesn't have an exact equivalent in the class of inex
structures:
(inex* (create-inex 27 -1 1) (create-inex 7 +1 4)) = (create-inex 19 +1 4)
Exercise 33.1.1.
Develop the function inex+
, which adds inex
representations that have the same exponent. The function must be able to
deal with examples that increase the exponent. Furthermore, it must signal
its own error if the result is out of range for inex
representations.
Challenge: Extend inex+
so that it can deal with inputs
whose exponents differ by 1
:
(equal? (inex+ (create-inex 1 +1 0) (create-inex 1 -1 1)) (create-inex 11 -1 1))
Do not attempt to deal with larger classes of inputs than that without reading the following subsection. Solution
Exercise 33.1.2.
Develop the function inex*
, which multiplies inex
representations. The function must be able to deal with examples that
increase the exponent. Furthermore, it must signal its own error if the
result is out of range for inex
representations.
Solution
Exercise 33.1.3.
The section illustrated how an inexact representation system for real
numbers has gaps. For example, 1240 was represented as (create-inex
12 +1 2)
by rounding off the last significant digit of the mantissa.
The problem is, round-off errors can accumulate.
Develop the function add
, which adds up n
copies of
#i1/185
. What is the result for (add 185)
? What should it
be? What happens if we multiply the result of the second expression with a
large number?
Develop the function sub
, which counts how often 1/185
can be subtracted from the argument until the argument is 0
. How
often should the evaluation recur before (sub 1)
and (sub
#i1.)
is evaluated? What happens in the second case?
Why?
Solution
While the use of scientific notation expands the range of numbers we can represent with fixed-size chunks of data, it still doesn't cover arbitrarily large numbers. Some numbers are just too big to fit into a fixed-size number representation. For example,
can't be represented, because the exponent 500 won't fit into two digits, and the mantissa is as large as it can be.
Numbers that are too large for our representation schema can arise during a computation. For example, two numbers that we can represent can add up to a number that we cannot represent:
(inex+ (create-inex 50 +1 99) (create-inex 50 +1 99)) = (create-inex 100 +1 99)
which violates the data contract, or
(inex+ (create-inex 50 +1 99) (create-inex 50 +1 99)) = (create-inex 10 +1 100)
which also breaks the contract for inex
structures. When the
result of inex
arithmetic produces numbers that are too large to
be represented, we say (arithmetic) OVERFLOW
occurred.
When overflow occurs, some language implementations signal an error and stop the computation. Others designate some symbol, called infinity, for all numbers that are too large. Arithmetic operations are aware of infinity and propagate it.
Negative Numbers: If our inex
structures had a sign field
for the mantissa, then two negative numbers can add up to one that is so
negative that it can't be represented either. This is also called overflow,
though to emphasize the distinction people sometimes say overflow in the
negative direction.
Exercise 33.2.1.
DrScheme's inexact number system uses an infinity value to deal with
overflow. Determine the integer n
such that (expt
#i10. n)
is still an inexact Scheme number and (expt #i10. (+ n
1))
is approximated with infinity. Hint: Use a function to compute
n
.
Solution
At the opposite end of the spectrum, we have already seen small numbers
that cannot be represented with inex
structures. For example,
10-500 is not 0, but it's smaller than the smallest non-zero number we
can represent. An arithemtic UNDERFLOW
arises when we multiply
two small numbers and the result is too small to fit into our class of
inex
structures:
(inex* (create-inex 1 -1 10) (create-inex 1 -1 99)) = (create-inex 1 -1 109)
which causes an error.
When underflow occurs, some language implementations signal an error;
others use 0 to approximate the result. An approximation with 0 for
underflow is qualitatively different from our ealier kinds of
approximations. In approximating 1250 with (create-inex 12 +1 2)
,
we approximated by dropping significant digits from the mantissa, but we
were left with a non-zero mantissa. The result is within 10% of the number
we wanted to represent. Appromixating on underflow, however, means dropping
the entire mantissa. The result is not within a predictable precentage
range of the true result.
Exercise 33.3.1.
DrScheme's inexact number system uses #i0
to approximate underflow.
Determine the smallest integer n
such that (expt #i10. n)
is still an inexact Scheme number and (expt #i10. (- n 1))
is
approximated with 0
. Hint: Use a function to compute
n
.
Solution
Most programming languages support only inexact number representations (and arithmetic) for both integers and reals. Scheme, in contrast, supports both exact and inexact numbers and arithmetic. Of course, the base of the representation is 2, not 10, because Scheme uses the underlying computer's on-off machinery.
As the note on page 5 explained, DrScheme's teaching
levels interpret all numbers in our programs as exact rationals, unless
they are prefixed with #i
. Some numeric operations, though,
produce inexact numbers. Plain Scheme, which is called Full Scheme in
DrScheme, interprets all numbers with a dot as inexact numbers;66 it also prints inexact reals with just a dot, implying
that all such numbers are inexact and possibly distant from the actual
result.
Scheme programmers can thus choose to use exact arithmetic or inexact arithmetic as necessary. For example, numbers in financial statements should always be interpreted as exact numbers; arithmetical operations on such numbers should be as precise as possible. For some problems, however, we may not wish to spend the extra time to produce exact results. Scientific computations are a primary example. In such cases, we may wish switch to inexact numbers and arithmetic.
Numerical Analysis: When we use inexact numbers and arithmetic, it is natural to ask how much the program's results differs from the true results. Over the past few decades, the study of this complex question has evolved into an advanced topic, called numerical analysis. The discipline has become a subject of its own right in applied mathematics or in computer science departments.
(expt 1.001 1e-12)
in Full Scheme (any variant) and in Intermediate Student Scheme. Explain the observations. Solution
Exercise 33.4.2.
Develop the function my-expt
, which raises one number to the power
of some integer. Using this function, conduct the following experiment. Add
(define inex (+ 1 #i1e-12)) (define exac (+ 1 1e-12))
to the Definitions
window. What is (my-expt inex 30)
?
How about (my-expt exac 30)
? Which answer is more
useful?
Solution
Exercise 33.4.3. When we add two inexact numbers of vastly different orders of magnitude, we may get the larger one back as the result. For example, if we are using only 15 significant digits, then we run into problems when adding numbers which vary by more than a factor of 1016:
but if the number system supports only 15 digits, the closest answer is 1016. At first glance, this doesn't look too bad. After all, being wrong by one part in 1016 (ten million billion) is close enough to the accurate result. Unfortunately, this kind of problem can add up to huge problems.
Consider the following list of inexact numbers:
(define JANUS (list #i31 #i2e+34 #i-1.2345678901235e+80 #i2749 #i-2939234 #i-2e+33 #i3.2e+270 #i17 #i-2.4e+270 #i4.2344294738446e+170 #i1 #i-8e+269 #i0 #i99))
Determine the values (sum JANUS)
and (sum (reverse JANUS))
.
Explain the difference. Can we trust computers?
Solution
66 We can force Full Scheme to interpret numbers with a dot as exact by prefixing the numbers with #e.