Section 19

Similarities in Definitions

Many of our data definitions and function definitions look alike. For example, the definition for a list of symbols differs from that of a list of numbers in only two regards: the name of the class of data and the words ``symbol'' and ``number.'' Similarly, a function that looks for a specific symbol in a list of symbols is nearly indistinguishable from one that looks for a specific number in a list of numbers.

Repetitions are the source of many programming mistakes. Therefore good programmers try to avoid repetitions as much as possible. As we develop a set of functions, especially functions derived from the same template, we soon learn to spot similarities. It is then time to revise the functions so as to eliminate the repetitions as much as possible. Put differently, a set of functions is just like an essay or a memo or a novel or some other piece of writing: the first draft is just a draft. Unless we edit the essay several times, it does not express our ideas clearly and concisely. It is a pain for others to read it. Because functions are read by many other people and because real functions are modified after reading, we must learn to ``edit'' functions.

The elimination of repetitions is the most important step in the (program) editing process. In this section, we discuss similarities in function definitions and in data definitions and how to avoid them. Our means of avoiding similarities are specific to Scheme and functional programming languages; still, other languages, in particular object-oriented ones, support similar mechanisms for factoring out similarities -- or (code) patterns as they are somtimes called.

19.1 Similarities in Functions

The use of our design recipes entirely determines a function's template -- or basic organization -- from the data definition for the input. Indeed, the template is an alternative method of expressing what we know about the input data. Not surprisingly, functions that consume the same kind of data look alike.

;; contains-doll? : los  ->  boolean
;; to determine whether alos contains 
;; the symbol 'doll
(define (contains-doll? alos)
  (cond
    [(empty? alos) false]
    [else
      (cond
	[(symbol=? (first alos) 'doll)
	 true]
	[else 
	 (contains-doll? (rest alos))])]))

;; contains-car? : los  ->  boolean
;; to determine whether alos contains 
;; the symbol 'car
(define (contains-car? alos)
  (cond
    [(empty? alos) false]
    [else
      (cond
	[(symbol=? (first alos) 'car)
	 true]
	[else 
	 (contains-car? (rest alos))])]))

Figure 52: Two similar functions

Take a look at the two functions in figure 52, which consume lists of symbols (names of toys) and look for specific toys. The function on the left looks for 'doll, the one on the right for 'car in a list of symbols (los). The two functions are nearly indistinguishable. Each consumes lists of symbols; each function body consists of a cond-expressions with two clauses. Each produces false if the input is empty; each uses a second, nested cond-expression to determine whether the first item is the desired item. The only difference is the symbol that is used in the comparison of the nested cond-expression: contains-doll? uses 'doll and contains-car? uses 'car, of course. To highlight the differences, the two symbols are boxed.

Good programmers are too lazy to define several closely related functions. Instead they define a single function that can look for both a 'doll and a 'car in a list of toys. This more general function consumes an additional piece of data, the symbol that we are looking for, but is otherwise like the two original functions:

;; contains? : symbol los  ->  boolean
;; to determine whether alos contains the symbol s
(define (contains? s alos)
  (cond
    [(empty? alos) false]
    [else (cond
	    [(symbol=? (first alos) s)
             true]
	    [else 
	     (contains? s (rest alos))])]))

We can now look for 'doll by applying contains? to 'doll and a list of symbols. But contains? works for any other symbol, too. Defining the single version has solved many related problems at once.

The process of combining two related functions into a single definition is called FUNCTIONAL ABSTRACTION. Defining abstract versions of functions is highly beneficial. The first benefit is that a single function can perform many different tasks. In our first example, contains? can search for many different symbols instead of just one concrete symbol.⁴⁵

;; below : lon number  ->  lon
;; to construct a list of those numbers 
;; on alon that are below t
(define (below alon t)
  (cond
    [(empty? alon) empty]
    [else
      (cond
	[(< (first alon) t)
	 (cons (first alon)
	   (below (rest alon) t))]
	[else
	 (below (rest alon) t)])]))

;; above : lon number  ->  lon
;; to construct a list of those numbers 
;; on alon that are above t
(define (above alon t)
  (cond
    [(empty? alon) empty]
    [else
      (cond
	[(> (first alon) t) 
	 (cons (first alon)
	   (above (rest alon) t))]
	[else
	 (above (rest alon) t)])]))

Figure 53: Two more similar functions

In the case of contains-doll? and contains-car?, abstraction is uninteresting. There are, however, more interesting cases: see figure 53. The function on the left consumes a list of numbers and a threshold and produces a list of all those numbers that are below the threshold; the one on the right produces all those that are above a threshold.

The difference between the two functions is the comparison operator. The left uses <, the right one >. Following the first example, we abstract over the two functions with an additional parameter that stands for the concrete relational operator in below and above:

(define (filter1 rel-op alon t)
  (cond
    [(empty? alon) empty]
    [else (cond
	    [(rel-op (first alon) t) 
	     (cons (first alon)
	           (filter1 rel-op (rest alon) t))]
	    [else
	     (filter1 rel-op (rest alon) t)])]))

To apply this new function, we must supply three arguments: a relational operator R that compares two numbers, a list L of numbers, and a number N. The function then extracts all those items i in L for which (R i N) evaluates to true. Since we do not know how to write down contracts for functions like filter1, we omit the contract for now. We will discuss the problem of contracts in section 20.2 below.

Let us see how filter1 works with an example. Clearly, as long as the input list is empty, the result is empty, too, no matter what the other arguments are:

  (filter1 < empty 5)
= empty

So next we look at a slightly more complicated case:

  (filter1 < (cons 4 empty) 5)

The result should be (cons 4 empty) because the only item of this list is 4 and (< 4 5) is true.

The first step of the evaluation is based on the rule of application:

  (filter1 < (cons 4 empty) 5)

= (cond
    [(empty? (cons 4 empty)) empty]
    [else (cond
	    [(< (first (cons 4 empty)) 5) 
	     (cons (first (cons 4 empty))
	           (filter1 < (rest (cons 4 empty)) 5))]
	    [else (filter1 < (rest (cons 4 empty)) 5)])])

That is, it is the body of filter1 with all occurrences of rel-op replaced by <, t replaced by 5, and alon replaced by (cons 4 empty).

The rest of the evaluation is straightforward:

  (cond
    [(empty? (cons 4 empty)) empty]
    [else (cond
	    [(< (first (cons 4 empty)) 5) 
	     (cons (first (cons 4 empty))
	           (filter1 < (rest (cons 4 empty)) 5))]
	    [else (filter1 < (rest (cons 4 empty)) 5)])])

= (cond
    [(< (first (cons 4 empty)) 5)
     (cons (first (cons 4 empty))
           (filter1 < (rest (cons 4 empty)) 5))]
    [else (filter1 < (rest (cons 4 empty)) 5)])

= (cond
    [(< 4 5) (cons (first (cons 4 empty))
                   (filter1 < (rest (cons 4 empty)) 5))]
    [else (filter1 < (rest (cons 4 empty)) 5)])

= (cond
    [true (cons (first (cons 4 empty))
              (filter1 < (rest (cons 4 empty)) 5))]
    [else (filter1 < (rest (cons 4 empty)) 5)])

= (cons 4 (filter1 < (rest (cons 4 empty)) 5))
= (cons 4 (filter1 < empty 5))
= (cons 4 empty)

The last step is the equation we discussed as our first case.

Our final example is an application of filter1 to a list of two items:

  (filter1 < (cons 6 (cons 4 empty)) 5)
= (filter1 < (cons 4 empty) 5)
= (cons 4 (filter1 < empty 5))
= (cons 4 empty)

The only new step is the first one. It says that filter1 determines that the first item on the list is not less than the threshold, and that it therefore is not added to the result of the natural recursion.

Exercise 19.1.1. Verify the equation

  (filter1 < (cons 6 (cons 4 empty)) 5)
= (filter1 < (cons 4 empty) 5)

with a hand-evaluation that shows every step. Solution

Exercise 19.1.2. Evaluate the expression

  (filter1 > (cons 8 (cons 6 (cons 4 empty))) 5)

by hand. Show only the essential steps. Solution

The calculations show that (filter1 < alon t) computes the same result as (below alon t), which is what we expected. Similar reasoning shows that (filter1 > alon t) produces the same output as (above alon t). So suppose we define the following:

;; below1 : lon number  ->  lon
(define (below1 alon t)
  (filter1 < alon t))

;; above1 : lon number  ->  lon
(define (above1 alon t) 
  (filter1 > alon t))

Clearly, below1 produces the same results as below when given the same inputs, and above1 is related to above in the same manner. In short, we have defined below and above as one-liners using filter1.

Better yet: once we have an abstract function like filter1, we can put it to other uses, too. Here are three of them:

(filter1 = alon t): This expression extracts all those numbers in alon that are equal to t.
(filter1 <= alon t): This one produces the list of numbers in alon that are less than or equal to t.
(filter1 >= alon t): This last expression computes the list of numbers that are greater than or equal to the threshold.

In general, filter1's first argument need not even be one of Scheme's predefined operations; it can be any function that consumes two numbers and produces a boolean value. Consider the following example:

;; squared>? : number number  ->  boolean
(define (squared>? x c)
  (> (* x x) c))

The function produces true whenever the area of a square with side x is larger than some threshold c, that is, the function tests whether the claim x² > c holds. We now apply filter1 to this function and a list of numbers:

(filter1 squared>? (list 1 2 3 4 5) 10)

This particular application extracts those numbers in (list 1 2 3 4 5) whose square is larger than 10.

Here is the beginning of a simple hand-evaluation:

  (filter1 squared>? (list 1 2 3 4 5) 10)
= (cond
    [(empty? (list 1 2 3 4 5)) empty]
    [else (cond
	    [(squared>? (first (list 1 2 3 4 5)) 10) 
	     (cons (first (list 1 2 3 4 5))
	       (filter1 squared>? (rest (list 1 2 3 4 5)) 10))]
	    [else
	      (filter1 squared>? (rest (list 1 2 3 4 5)) 10)])])

That is, we apply our standard law of application and calculate otherwise as usual:

= (cond
    [(squared>? 1 10) 
     (cons (first (list 1 2 3 4 5))
       (filter1 squared>? (rest (list 1 2 3 4 5)) 10))]
    [else
     (filter1 squared>? (rest (list 1 2 3 4 5)) 10)])

= (cond
    [false 
     (cons (first (list 1 2 3 4 5))
       (filter1 squared>? (rest (list 1 2 3 4 5)) 10))]
    [else
     (filter1 squared>? (rest (list 1 2 3 4 5)) 10)])

The last step consists of several steps concerning squared>?, which we can skip at this point:

= (filter1 squared>? (list 2 3 4 5) 10)
= (filter1 squared>? (list 3 4 5) 10)
= (filter1 squared>? (list 4 5) 10)

We leave the remainder of the evaluation to the exercises.

Exercise 19.1.3. Show that

  (filter1 squared>? (list 4 5) 10)
= (cons 4 (filter1 squared>? (list 5) 10))

with a hand-evaluation. Act as if squared>? were primitive. Solution

Exercise 19.1.4. The use of squared>? also suggests that the following function will work, too:

;; squared10? : number number  ->  boolean
(define (squared10? x c)
  (> (sqr x) 10))

In other words, the relational function that filter1 uses may ignore its second argument. After all, we already know it and it stays the same throughout the evaluation of (filter1 squared>? alon t).

This, in turn, implies another simplification of the function:

(define (filter predicate alon)
  (cond
    [(empty? alon) empty]
    [else (cond
	    [(predicate (first alon)) 
	     (cons (first alon)
	       (filter predicate (rest alon)))]
	    [else
	     (filter predicate (rest alon))])]))

The function filter consumes only a relational function, called predicate, and a list of numbers. Every item i on the list is checked with predicate. If (predicate i) holds, i is included in the output; if not, i does not appear in the result.

Show how to use filter to define functions that are equivalent to below and above. Test the definitions. Solution

So far we have seen that abstracted function definitions are more flexible and more widely usable than specialized definitions. A second, and in practice equally important, advantage of abstracted definitions is that we can change a single definition to fix and improve many different uses. Consider the two variants of filter1 in figure 54. The first variant flattens the nested cond-expression, something that an experienced programmer may wish to do. The second variant uses a local-expression that makes the nested cond-expression more readable.

(define (filter1 rel-op alon t)
  (cond
    [(empty? alon) empty]
    [(rel-op (first alon) t) 
     (cons (first alon)
       (filter1 rel-op (rest alon) t))]
    [else
      (filter1 rel-op (rest alon) t)]))

(define (filter1 rel-op alon t)
  (cond
    [(empty? alon) empty]
    [else
      (local ((define first-item (first alon))
	      (define rest-filtered
		(filter1 rel-op (rest alon) t)))
	(cond
	  [(rel-op first-item t) 
	   (cons first-item rest-filtered)]
	  [else
	   rest-filtered]))]))

Figure 54: Two modifications of filter1

Although both of these changes are trivial, the key is that all uses of filter1, including those to define the functions below1 and above1, benefit from this change. Similarly, if the modification had fixed a logical mistake, all uses of the function would be improved. Finally, it is even possible to add new tasks to abstracted functions, for example, a mechanism for counting how many elements are filtered. In that case all uses of the function would benefit from the new functionality. We will encounter this form of improvement later.

Exercise 19.1.5.

Abstract the following two functions into a single function:

;; mini : nelon  ->  number
;; to determine the smallest number
;; on alon
(define (mini alon)
  (cond
    [(empty? (rest alon)) (first alon)]
    [else (cond
	    [(< (first alon) 
		(mini (rest alon)))
	     (first alon)]
	    [else
	     (mini (rest alon))])]))

;; maxi : nelon  ->  number
;; to determine the largest number
;; on alon
(define (maxi alon)
  (cond
    [(empty? (rest alon)) (first alon)]
    [else (cond
	    [(> (first alon)
		(maxi (rest alon)))
	     (first alon)]
	    [else
	     (maxi (rest alon))])]))

Both consume non-empty lists of numbers and produce a single number. The left one produces the smallest number in the list, the right one the largest.

Define mini1 and maxi1 in terms of the abstracted function. Test each of them with the following three lists:

(list 3 7 6 2 9 8)
(list 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1)
(list 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20)

Why are they slow on the long lists?

Improve the abstracted function. First, introduce a local name for the result of the natural recursion. Then introduce a local, auxiliary function that picks the ``interesting'' one of two numbers. Test mini1 and maxi1 with the same inputs again. Solution

Exercise 19.1.6. Recall the definition of sort, which consumes a list of numbers and produces a sorted version:

;; sort : list-of-numbers  ->  list-of-numbers
;; to construct a list with all items from alon in descending order
(define (sort alon)
  (local ((define (sort alon)
	    (cond
	      [(empty? alon) empty]
	      [else (insert (first alon) (sort (rest alon)))]))
	  (define (insert an alon)
	    (cond
	      [(empty? alon) (list an)]
	      [else (cond
		      [(> an (first alon)) (cons an alon)]
		      [else (cons (first alon) (insert an (rest alon)))])])))
    (sort alon)))

Define an abstract version of sort that consumes the comparison operation in addition to the list of numbers. Use the abstract version to sort (list 2 3 1 5 4) in ascending and descending order. Solution

19.2 Similarities in Data Definitions

Inspect the following two data definitions:

A list-of-numbers is either

empty
(cons n l)
where n is a number
and n is a list-of-numbers.
A list-of-IRs is either

empty
(cons n l)
where n is an IR
and n is a list-of-IRs.

Both define a class of lists. The one on the left is the data definition for lists of numbers; the one on the right describes lists of inventory records, which we represent with structures. The necessary structure and data definitions follow:

(define-struct ir (name price))

An IR is a structure:

(make-ir n p)

where n is a symbol and p is a number.

Given the similarity between the data definitions, functions that consume elements of these classes are similar, too. Take a look at the illustrative example in figure 55. The function on the left is the function below, which filters numbers from a list of numbers. The one on the right is below-ir, which extracts those inventory records from a list whose prices are below a certain threshold. Except for the name of the function, which is arbitrary, the two definitions differ in only one point: the relational operator.

;; below : number lon  ->  lon
;; to construct a list of those numbers
;; on alon that are below t
(define (below alon t)
  (cond
    [(empty? alon) empty]
    [else (cond
	    [(< (first alon) t)
	     (cons (first alon)
	       (below (rest alon) t))]
	    [else
	      (below (rest alon) t)])]))

;; below-ir : number loIR  ->  loIR
;; to construct a list of those records 
;; on aloir that contain a price below t
(define (below aloir t)
  (cond
    [(empty? aloir) empty]
    [else (cond
	    [(<_ir (first aloir) t) 
	     (cons (first aloir)
	      (below-ir (rest aloir) t))]
	    [else
	      (below-ir (rest aloir) t)])]))

;; <_ir : IR number  ->  boolean
(define (<_ir ir p)
    (< (ir-price ir) p))

Figure 55: Marking the differences in similar functions

If we abstract the two functions, we obviously obtain filter1. Conversely, we can define below-ir in terms of filter1:

(define (below-ir1 aloir t)
  (filter1 <_ir aloir t))

It should not surprise us to discover yet another use for filter1 -- after all, we already argued that abstraction promotes the reuse of functions for different purposes. Here we see that filter1 not only filters lists of numbers but lists of arbitrary things -- as long as we can define a function that compares these arbitrary things with numbers.

Indeed, all we need is a function that compares items on the list with the items we pass to filter1 as the second argument. Here is a function that extracts all items with the same label from a list of inventory records:

;; find : loIR symbol  ->  boolean
;; to determine whether aloir contains a record for t
(define (find aloir t)
  (cons? (filter1 eq-ir? aloir t)))

;; eq-ir? : IR symbol  ->  boolean
;; to compare ir's name and p
(define (eq-ir? ir p)
  (symbol=? (ir-name ir) p))

This new relational operator compares the name in an inventory record with some other symbol.

Exercise 19.2.1. Determine the values of

(below-ir1 10 (list (make-ir 'doll 8) (make-ir 'robot 12)))
(find 'doll (list (make-ir 'doll 8) (make-ir 'robot 12) (make-ir 'doll 13)))

by hand and with DrScheme. Show only those lines that introduce new applications of filter1 to values. Solution

In short, filter1 uniformly works on many shapes of input data. The word ``uniformly'' means that if filter1 is applied to a list of X, its result is also a list of X -- no matter what kind of Scheme data X is. Such functions are called POLYMORPHIC ⁴⁶ or GENERIC functions.

Of course, filter1 is not the only function that can process arbitrary lists. There are many other functions that process lists independently of what they contain. Here are two functions that determine the length of lists of numbers and IRs:

;; length-lon : lon  ->  number
(define (length-lon alon)
  (cond
    [(empty? alon) empty]
    [else 
      (+ (length-lon (rest alon)) 1)]))

;; length-ir : loIR  ->  number
(define (length-ir alon)
  (cond
    [(empty? alon) empty]
    [else
      (+ (length-ir (rest alon)) 1)]))

The two functions differ only in their names. If we had chosen the same name, say, length, the two definitions would be identical.

To write precise contracts for functions such as length, we need data definitions with parameters. We call these PARAMETRIC DATA DEFINITIONS and agree that they do not specify everything about a class of data. Instead they use variables to say that any form of Scheme data can be used in a certain place. Roughly speaking, a parametric data definition abstracts from a reference to a particular collection of data in the same manner as a function abstracts from a particular value.

Here is a parametric definition of lists of ITEMs:

A list of ITEM is either

empty or
(cons s l) where
1. s is an ITEM and
2. l is a list of ITEM.

The token ITEM is a TYPE VARIABLE that stands for any arbitrary collection of Scheme data: symbols, numbers, booleans, IRs, etc. By replacing ITEM with one of these names, we get a concrete instance of this abstract data definition for lists of symbols, numbers, booleans, IRs, etc. To make the language of contracts more concise, we introduce an additional abbreviation:

(listof ITEM)

We use (listof ITEM) as the name of abstract data definitions such as the above. Then we can use (listof symbol) for the class of all lists of symbols, (listof number) for the class of all lists of numbers, (listof (listof number)) for the class of all lists of lists of numbers, etc.

In contracts we use (listof X) to say that a function works on all lists:

;; length : (listof X)  ->  number
;; to compute the length of a list 
(define (length alon)
  (cond
    [(empty? alon) empty]
    [else (+ (length (rest alon)) 1)]))

The X is just a variable, a name that stands for some class of data. If we now apply length to an element of, say, (listof symbol) or (listof IR), we get a number.

The function length is an example of simple polymorphism. It works on all classes of lists. While there are other useful examples of simple polymorphic functions, the more common cases require that we define functions like filter1, which consume a parametric form of data and functions that work on this data. This combination is extremely powerful and greatly facilitates the construction and maintenance of software systems. To understand it better, we will next discuss a revision of Scheme's grammar and new ways to write contracts.

Exercise 19.2.2. Show how to use the abstracted version of sort from exercise 19.1.6 to sort a list of IRs in ascending and descending order. Solution

Exercise 19.2.3. Here is a structure definition for pairs

(define-struct pair (left right))

and its parametric data definition:

A (pair X Y) is a structure:

(make-pair l r)

where l is an X and r is a Y.

Using this abstract data definition, we can describe many different forms of pairs:

(pair number number), which is the class of pairs that combine two numbers;
(pair symbol number), which is the class of pairs that combine a number with a symbol; and
(pair symbol symbol), which is the class of pairs that combine two symbols.

Still, in all of these examples, each pair contains two values that are accessible via pair-left and pair-right.

By combining the abstract data definition for lists and pairs we can describe lists of parametric pairs with a single line:

(listof (pair X Y)) .

Some concrete examples of this abstract class of data are:

(listof (pair number number)), the list of pairs of numbers;
(listof (pair symbol number)), the list of pairs of symbols and numbers;
(listof (pair symbol symbol)), the list of pairs of symbols.

Make an example for each of these classes.

Develop the function lefts, which consumes a list of (pair X Y) and produces a corresponding list of X's; that is, it extracts the left part of each item in its input. Solution

Exercise 19.2.4. Here is a parametric data definition of non-empty lists:

A (non-empty-listof ITEM) is either

(cons s empty), or
(cons s l) where l is a (non-empty-listof ITEM)

and s is always an ITEM.

Develop the function last, which consumes a (non-empty-listof ITEM) and produces the last ITEM in that list.

Hint: Replace ITEM with a fixed class of data to develop an initial draft of last. When finished, replace the class with ITEM throughout the function development. Solution

⁴⁵ Computing borrows the term ``abstract'' from mathematics. A mathematician refers to ``6'' as an abstract number because it only represents all different ways of naming six things. In contrast, ``6 inches'' or ``6 eggs'' are concrete instances of ``6'' because they express a measurement and a count.

⁴⁶ The word ``poly'' means ``many'' and ``morphic'' means shape.

Section 19 Similarities in Definitions

19.1 Similarities in Functions

19.2 Similarities in Data Definitions

Section 19

Similarities in Definitions