Many of our data definitions and function definitions look alike. For example, the definition for a list of symbols differs from that of a list of numbers in only two regards: the name of the class of data and the words ``symbol'' and ``number.'' Similarly, a function that looks for a specific symbol in a list of symbols is nearly indistinguishable from one that looks for a specific number in a list of numbers.
Repetitions are the source of many programming mistakes. Therefore good programmers try to avoid repetitions as much as possible. As we develop a set of functions, especially functions derived from the same template, we soon learn to spot similarities. It is then time to revise the functions so as to eliminate the repetitions as much as possible. Put differently, a set of functions is just like an essay or a memo or a novel or some other piece of writing: the first draft is just a draft. Unless we edit the essay several times, it does not express our ideas clearly and concisely. It is a pain for others to read it. Because functions are read by many other people and because real functions are modified after reading, we must learn to ``edit'' functions.
The elimination of repetitions is the most important step in the (program) editing process. In this section, we discuss similarities in function definitions and in data definitions and how to avoid them. Our means of avoiding similarities are specific to Scheme and functional programming languages; still, other languages, in particular object-oriented ones, support similar mechanisms for factoring out similarities -- or (code) patterns as they are somtimes called.
The use of our design recipes entirely determines a function's template -- or basic organization -- from the data definition for the input. Indeed, the template is an alternative method of expressing what we know about the input data. Not surprisingly, functions that consume the same kind of data look alike.
Take a look at the two functions in figure 52, which
consume lists of symbols (names of toys) and look for specific toys. The
function on the left looks for 'doll
, the one on the right for
'car
in a list of symbols (los). The two functions are nearly
indistinguishable. Each consumes lists of symbols; each function body
consists of a cond-expressions with two clauses. Each produces
false
if the input is empty
; each uses a second, nested
cond-expression to determine whether the first item is the desired
item. The only difference is the symbol that is used in the comparison of
the nested cond-expression: contains-doll?
uses
'doll
and contains-car?
uses 'car
, of course. To
highlight the differences, the two symbols are boxed.
Good programmers are too lazy to define several closely related
functions. Instead they define a single function that can look for both a
'doll
and a 'car
in a list of toys. This more general
function consumes an additional piece of data, the symbol that we are
looking for, but is otherwise like the two original functions:
;;contains? : symbol los -> boolean
;; to determine whetheralos
contains the symbols
(define (contains? s alos) (cond [(empty? alos) false] [else (cond [(symbol=? (first alos) s) true] [else (contains? s (rest alos))])]))
We can now look for 'doll
by applying contains?
to
'doll
and a list of symbols. But contains?
works for any
other symbol, too. Defining the single version has solved many related
problems at once.
The process of combining two related functions into a single definition is
called FUNCTIONAL ABSTRACTION.
Defining abstract versions of
functions is highly beneficial. The first benefit is that a single function
can perform many different tasks. In our first example, contains?
can search for many different symbols instead of just one concrete
symbol.45
|
In the case of contains-doll?
and contains-car?
,
abstraction is uninteresting. There are, however, more interesting cases:
see figure 53. The function on the left consumes a
list of numbers and a threshold and produces a list of all those numbers
that are below the threshold; the one on the right produces all those that
are above a threshold.
The difference between the two functions is the comparison operator. The
left uses <, the right one >. Following the first example, we abstract
over the two functions with an additional parameter that stands for the
concrete relational operator in below
and above
:
(define (filter1 rel-op alon t) (cond [(empty? alon) empty] [else (cond [(rel-op (first alon) t) (cons (first alon) (filter1 rel-op (rest alon) t))] [else (filter1 rel-op (rest alon) t)])]))
To apply this new function, we must supply three arguments: a relational
operator R
that compares two numbers, a list L
of
numbers, and a number N
. The function then extracts all those items
i
in L
for which (R i N)
evaluates to
true. Since we do not know how to write down contracts for functions like
filter1
, we omit the contract for now. We will discuss the problem
of contracts in section 20.2 below.
Let us see how filter1
works with an example. Clearly, as long as
the input list is empty
, the result is empty
, too, no
matter what the other arguments are:
(filter1 < empty 5) = empty
So next we look at a slightly more complicated case:
(filter1 < (cons 4 empty) 5)
The result should be (cons 4 empty)
because the only item of this
list is 4
and (< 4 5)
is true.
The first step of the evaluation is based on the rule of application:
(filter1 < (cons 4 empty) 5) = (cond [(empty? (cons 4 empty)) empty] [else (cond [(< (first (cons 4 empty)) 5) (cons (first (cons 4 empty)) (filter1 < (rest (cons 4 empty)) 5))] [else (filter1 < (rest (cons 4 empty)) 5)])])
That is, it is the body of filter1
with all occurrences of
rel-op
replaced by <
, t replaced by 5
, and
alon
replaced by (cons 4 empty)
.
The rest of the evaluation is straightforward:
(cond [(empty? (cons 4 empty)) empty] [else (cond [(< (first (cons 4 empty)) 5) (cons (first (cons 4 empty)) (filter1 < (rest (cons 4 empty)) 5))] [else (filter1 < (rest (cons 4 empty)) 5)])]) = (cond [(< (first (cons 4 empty)) 5) (cons (first (cons 4 empty)) (filter1 < (rest (cons 4 empty)) 5))] [else (filter1 < (rest (cons 4 empty)) 5)]) = (cond [(< 4 5) (cons (first (cons 4 empty)) (filter1 < (rest (cons 4 empty)) 5))] [else (filter1 < (rest (cons 4 empty)) 5)]) = (cond [true (cons (first (cons 4 empty)) (filter1 < (rest (cons 4 empty)) 5))] [else (filter1 < (rest (cons 4 empty)) 5)]) = (cons 4 (filter1 < (rest (cons 4 empty)) 5)) = (cons 4 (filter1 < empty 5)) = (cons 4 empty)
The last step is the equation we discussed as our first case.
Our final example is an application of filter1
to a list of two
items:
(filter1 < (cons 6 (cons 4 empty)) 5) = (filter1 < (cons 4 empty) 5) = (cons 4 (filter1 < empty 5)) = (cons 4 empty)
The only new step is the first one. It says that filter1
determines that the first item on the list is not less than the threshold,
and that it therefore is not added to the result of the natural recursion.
Exercise 19.1.1. Verify the equation
(filter1 < (cons 6 (cons 4 empty)) 5) = (filter1 < (cons 4 empty) 5)
with a hand-evaluation that shows every step. Solution
Exercise 19.1.2. Evaluate the expression
(filter1 > (cons 8 (cons 6 (cons 4 empty))) 5)
by hand. Show only the essential steps. Solution
The calculations show that (filter1 < alon t)
computes the same
result as (below alon t)
, which is what we expected. Similar
reasoning shows that (filter1 > alon t)
produces the same output
as (above alon t)
. So suppose we define the following:
;; | ;; |
below1
produces the same results as below
when
given the same inputs, and above1
is related to above
in
the same manner. In short, we have defined below
and
above
as one-liners using filter1
.
Better yet: once we have an abstract function like filter1
, we
can put it to other uses, too. Here are three of them:
(filter1 = alon t)
: This expression extracts all those
numbers in alon
that are equal to t
.
(filter1 <= alon t)
: This one produces the list of numbers
in alon
that are less than or equal to t
.
(filter1 >= alon t)
: This last expression computes the list
of numbers that are greater than or equal to the threshold.
In general, filter1
's first argument need not even be one of
Scheme's predefined operations; it can be any function that consumes two
numbers and produces a boolean value. Consider the following example:
;; squared>? : number number -> boolean
(define (squared>? x c)
(> (* x x) c))
The function produces true
whenever the area of a square with side
x
is larger than some threshold c
, that is, the function
tests whether the claim x2 > c holds. We now apply filter1
to
this function and a list of numbers:
(filter1 squared>? (list 1 2 3 4 5) 10)
This particular application extracts those numbers in (list 1 2 3 4
5)
whose square is larger than 10
.
Here is the beginning of a simple hand-evaluation:
(filter1 squared>? (list 1 2 3 4 5) 10) = (cond [(empty? (list 1 2 3 4 5)) empty] [else (cond [(squared>? (first (list 1 2 3 4 5)) 10) (cons (first (list 1 2 3 4 5)) (filter1 squared>? (rest (list 1 2 3 4 5)) 10))] [else (filter1 squared>? (rest (list 1 2 3 4 5)) 10)])])
That is, we apply our standard law of application and calculate otherwise as usual:
= (cond [(squared>? 1 10) (cons (first (list 1 2 3 4 5)) (filter1 squared>? (rest (list 1 2 3 4 5)) 10))] [else (filter1 squared>? (rest (list 1 2 3 4 5)) 10)])
= (cond [false (cons (first (list 1 2 3 4 5)) (filter1 squared>? (rest (list 1 2 3 4 5)) 10))] [else (filter1 squared>? (rest (list 1 2 3 4 5)) 10)])
The last step consists of several steps concerning squared>?
, which
we can skip at this point:
= (filter1 squared>? (list 2 3 4 5) 10) = (filter1 squared>? (list 3 4 5) 10) = (filter1 squared>? (list 4 5) 10)
We leave the remainder of the evaluation to the exercises.
(filter1 squared>? (list 4 5) 10) = (cons 4 (filter1 squared>? (list 5) 10))
with a hand-evaluation. Act as if squared>?
were
primitive.
Solution
Exercise 19.1.4.
The use of squared>?
also suggests that the following
function will work, too:
;; squared10? : number number -> boolean
(define (squared10? x c)
(> (sqr x) 10))
In other words, the relational function that filter1
uses may
ignore its second argument. After all, we already know it and it stays
the same throughout the evaluation of (filter1 squared>? alon t)
.
This, in turn, implies another simplification of the function:
(define (filter predicate alon) (cond [(empty? alon) empty] [else (cond [(predicate (first alon)) (cons (first alon) (filter predicate (rest alon)))] [else (filter predicate (rest alon))])]))
The function filter
consumes only a relational function, called
predicate
, and a list of numbers. Every item i
on the
list is checked with predicate
. If (predicate i)
holds,
i
is included in the output; if not, i
does not appear in
the result.
Show how to use filter
to define functions that are equivalent to
below
and above
. Test the definitions.
Solution
So far we have seen that abstracted function definitions are more flexible
and more widely usable than specialized definitions. A second, and in
practice equally important, advantage of abstracted definitions is that we
can change a single definition to fix and improve many different uses.
Consider the two variants of filter1
in
figure 54. The first variant flattens the nested
cond-expression, something that an experienced programmer may wish
to do. The second variant uses a local-expression that makes the
nested cond-expression more readable.
|
Although both of these changes are trivial, the key is that all
uses of filter1
, including those to define the functions
below1
and above1
, benefit from this change. Similarly,
if the modification had fixed a logical mistake, all uses of the function
would be improved. Finally, it is even possible to add new tasks to
abstracted functions, for example, a mechanism for counting how many
elements are filtered. In that case all uses of the function would benefit
from the new functionality. We will encounter this form of improvement
later.
Abstract the following two functions into a single function:
;; | ;; |
Define mini1
and maxi1
in terms of the abstracted
function. Test each of them with the following three lists:
(list 3 7 6 2 9 8)
(list 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1)
(list 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20)
Why are they slow on the long lists?
Improve the abstracted function. First, introduce a local name for the
result of the natural recursion.
Then introduce a local, auxiliary function
that picks the ``interesting'' one of two numbers. Test mini1
and
maxi1
with the same inputs again.
Solution
Exercise 19.1.6.
Recall the definition of sort
, which consumes a list of numbers
and produces a sorted version:
;;sort : list-of-numbers -> list-of-numbers
;; to construct a list with all items fromalon
in descending order (define (sort alon) (local ((define (sort alon) (cond [(empty? alon) empty] [else (insert (first alon) (sort (rest alon)))])) (define (insert an alon) (cond [(empty? alon) (list an)] [else (cond [(> an (first alon)) (cons an alon)] [else (cons (first alon) (insert an (rest alon)))])]))) (sort alon)))
Define an abstract version of sort
that consumes the comparison
operation in addition to the list of numbers. Use the abstract version to
sort (list 2 3 1 5 4)
in ascending and descending
order.
Solution
Inspect the following two data definitions:
A list-of-numbers is either
- empty
- (cons n l)
where n is a number
and n is a list-of-numbers.A list-of-IRs is either
- empty
- (cons n l)
where n is an IR
and n is a list-of-IRs.
Both define a class of lists. The one on the left is the data definition for lists of numbers; the one on the right describes lists of inventory records, which we represent with structures. The necessary structure and data definitions follow:
(define-struct ir (name price))
(make-ir n p)
n
is a symbol and p
is a number.
Given the similarity between the data definitions, functions that consume
elements of these classes are similar, too. Take a look at the
illustrative example in figure 55. The function on
the left is the function below
, which filters numbers from a list
of numbers. The one on the right is below-ir
, which extracts those
inventory records from a list whose prices are below a certain threshold.
Except for the name of the function, which is arbitrary, the two
definitions differ in only one point: the relational operator.
|
If we abstract the two functions, we obviously obtain filter1
.
Conversely, we can define below-ir
in terms of filter1
:
(define (below-ir1 aloir t) (filter1 <ir aloir t))
It should not surprise us to discover yet another use for
filter1
-- after all, we already argued that abstraction promotes
the reuse of functions for different purposes. Here we see that
filter1
not only filters lists of numbers but lists of arbitrary
things -- as long as we can define a function that compares these arbitrary
things with numbers.
Indeed, all we need is a function that compares items on the list with the
items we pass to filter1
as the second argument. Here is a function
that extracts all items with the same label from a list of inventory
records:
;;find : loIR symbol -> boolean
;; to determine whetheraloir
contains a record fort
(define (find aloir t) (cons? (filter1 eq-ir? aloir t))) ;;eq-ir? : IR symbol -> boolean
;; to compareir
's name andp
(define (eq-ir? ir p) (symbol=? (ir-name ir) p))
This new relational operator compares the name in an inventory record with some other symbol.
Exercise 19.2.1. Determine the values of
(below-ir1 10 (list (make-ir 'doll 8) (make-ir 'robot 12)))
(find 'doll (list (make-ir 'doll 8) (make-ir 'robot 12) (make-ir 'doll 13)))
by hand and with DrScheme. Show only those lines that introduce new
applications of filter1
to values.
Solution
In short, filter1
uniformly works on many shapes of input data.
The word ``uniformly'' means that if filter1
is applied to a list
of X
, its result is also a list of X
-- no matter what
kind of Scheme data X
is. Such functions are called
POLYMORPHIC46
or GENERIC
functions.
Of course, filter1
is not the only function that can process
arbitrary lists. There are many other functions that process lists
independently of what they contain. Here are two functions that determine
the length of lists of numbers and IR
s:
;; | ;; |
length
, the two definitions would be identical.
To write precise contracts for functions such as length
, we need
data definitions with parameters. We call these PARAMETRIC DATA
DEFINITIONS and agree that they
do not specify everything about a class of data. Instead they use
variables to say that any form of Scheme data can be used in a certain
place. Roughly speaking,
a parametric data definition abstracts from a reference to a particular
collection of data in the same manner as a function abstracts from a
particular value.
Here is a parametric definition of lists of ITEM
s:
A list of ITEM is either
empty
or
(cons s l)
where
s
is an ITEM and
l
is a list of ITEM.
The token ITEM
is a TYPE VARIABLE
that stands for any arbitrary collection of Scheme data: symbols, numbers,
booleans, IR
s, etc. By replacing ITEM
with one of these
names, we get a concrete instance of this abstract data definition for
lists of symbols, numbers, booleans, IR
s, etc. To make the
language of contracts more concise, we introduce an additional abbreviation:
(listof ITEM)
We use (listof ITEM)
as the name of abstract data definitions such
as the above. Then we can use (listof symbol)
for the class of all
lists of symbols, (listof number)
for the class of all lists of
numbers, (listof (listof number))
for the class of all lists of
lists of numbers, etc.
In contracts we use (listof X)
to say that a function works
on all lists:
;; length : (listof X) -> number
;; to compute the length of a list
(define (length alon)
(cond
[(empty? alon) empty]
[else (+ (length (rest alon)) 1)]))
The X
is just a variable, a name that stands for some class of
data. If we now apply length
to an element of, say,
(listof symbol)
or (listof IR)
, we get a number.
The function length
is an example of simple polymorphism. It works
on all classes of lists. While there are other useful examples of simple
polymorphic functions, the more common cases require that we define functions
like filter1
, which consume a parametric form of data and functions
that work on this data. This combination is extremely powerful and greatly
facilitates the construction and maintenance of software systems. To
understand it better, we will next discuss a revision of Scheme's grammar
and new ways to write contracts.
Exercise 19.2.2.
Show how to use the abstracted version of sort
from
exercise 19.1.6 to sort a list of IR
s in ascending and
descending order.
Solution
Exercise 19.2.3. Here is a structure definition for pairs
(define-struct pair (left right))
and its parametric data definition:
(make-pair l r)
l
is an X
and r
is a Y
.Using this abstract data definition, we can describe many different forms of pairs:
(pair number number)
, which is the class of pairs that
combine two numbers;
(pair symbol number)
, which is the class of pairs that
combine a number with a symbol; and
(pair symbol symbol)
, which is the class of pairs that
combine two symbols.
Still, in all of these examples, each pair contains two values that are
accessible via pair-left
and pair-right
.
By combining the abstract data definition for lists and pairs we can describe lists of parametric pairs with a single line:
(listof (pair X Y))
. Some concrete examples of this abstract class of data are:
(listof (pair number number))
, the list of pairs of numbers;
(listof (pair symbol number))
, the list of pairs of symbols
and numbers;
(listof (pair symbol symbol))
, the list of pairs of symbols.
Make an example for each of these classes.
Develop the function lefts
, which consumes a list of
(pair X Y)
and produces a corresponding list of X
's; that
is, it extracts the left
part of each item in its
input.
Solution
Exercise 19.2.4. Here is a parametric data definition of non-empty lists:
A (non-empty-listof ITEM)
is either
(cons s empty)
, or
(cons s l)
where l
is a (non-empty-listof ITEM)
and s
is always an ITEM
.
Develop the function last
, which consumes a
(non-empty-listof ITEM)
and produces the last ITEM
in
that list.
Hint: Replace ITEM
with a fixed class of data to develop an
initial draft of last
. When finished, replace the class with
ITEM
throughout the function
development.
Solution
45 Computing borrows the term ``abstract'' from mathematics. A mathematician refers to ``6'' as an abstract number because it only represents all different ways of naming six things. In contrast, ``6 inches'' or ``6 eggs'' are concrete instances of ``6'' because they express a measurement and a count.
46 The word ``poly'' means ``many'' and ``morphic'' means shape.