Section 7

The Varieties of Data

The previous section significantly expands our world of data. We must now deal with a universe that contains booleans, symbols, and structures of many kinds. Let's bring some order to this world.

Up to this point, our functions have always processed subclasses of four different kinds of data:24

numbers:
representations of numeric information;
booleans:
truth and falsity;
symbols:
representations of symbolic information; and
structures:
representations of compounds of information.

On occasion, however, a function must process a class of data that includes both numbers and structures or structures of several different kinds. We learn to design such functions in this section. In addition, we learn how to protect functions from bad uses. Here a bad use means that some user can accidentally apply a function for drawing circles to a rectangle. Although we have agreed that such users violate our data definitions, we should nevertheless know how to protect our functions against such uses, when necessary.

7.1  Mixing and Distinguishing Data

In the preceding section, we used posn structures with exactly two components to represent pixels. If many of the pixels are on the x axis, we can simplify the representation by using plain numbers for those pixels and posn structures for the remaining ones.

Figure 16 contains a sample collection of such points. Three of the five points, namely, C, D, and E, are on the x axis. Only two points require two coordinates for an accurate description: A and B. Our new idea for representing points permits us to describe this class of points succinctly: (make-posn 6 6) for A; (make-posn 1 2) for B; and 1, 2, and 3 for C, D, and E, respectively.

If we now wish to define the function distance-to-0, which consumes such point representations and produces their distance to the origin, we are confronted with a problem. The function may be applied to a number or a posn. Depending on the class to which the input belongs, distance-to-0 must employ a different method to calculate the distance to the origin. Thus we need to use a cond-expression to distinguish the two cases. Unfortunately, we don't have any operations to formulate the appropriate conditions.

[curriculum1ab-Z-G-1.gif]

Figure 16:  A small collection of points

To accommodate this kind of function, Scheme provides PREDICATES, which are operations that recognize a particular form of data. The predicates for the classes of data we know are:

number?, which consumes an arbitrary value and produces true if the value is a number and false otherwise;
boolean?, which consumes an arbitrary value and produces true if the value is a boolean value and false otherwise;
symbol?, which consumes an arbitrary value and produces true if the value is a symbol and false otherwise;
struct?, which consumes an arbitrary value and produces true if the value is a structure and false otherwise.

For each structure definition, Scheme also introduces a separate predicate so that we can distinguish between distinct classes of structures. Suppose the Definitions window contains the following structure definitions:25

(define-struct posn (x y))

(define-struct star (last first dob ssn))

(define-struct airplane (kind max-speed max-load price))

Then, Scheme also knows the following three predicates:

posn?, which consumes an arbitrary value and produces true if the value is a posn structure and false otherwise;
star?, which consumes an arbitrary value and produces true if the value is a star structure and false otherwise;
airplane?, which consumes an arbitrary value and produces true if the value is a airplane structure and false otherwise.

Hence a function can distinguish a structure from a number as well as a posn structure from an airplane structure.

Exercise 7.1.1.   Evaluate the following expressions by hand:

  1. (number? (make-posn 2 3))

  2. (number? (+ 12 10))

  3. (posn? 23)

  4. (posn? (make-posn 23 3))

  5. (star? (make-posn 23 3))

Check the answers in DrScheme.    Solution

Now we can develop distance-to-0. Let's start with a data definition:

A pixel-2 is either

  1. a number, or

  2. a posn structure.

Stating the contract, purpose, and header is straightforward:

;; distance-to-0 : pixel-2  ->  number
;; to compute the distance of a-pixel to the origin
(define (distance-to-0 a-pixel) ...)

As mentioned before, the function must distinguish between its two kinds of inputs, which can be accomplished with a cond-expression:

(define (distance-to-0 a-pixel)
  (cond
    [(number? a-pixel) ...]
    [(posn? a-pixel) ...]))

The two conditions match the two possible inputs of the new distance-to-0 function. If the first one holds, the input is a pixel on the x axis. Otherwise the pixel is a posn structure. For the second cond-line, we also know that the input contains two items: the x and y coordinates. To remind ourselves, we annotate the template with two selector expressions:

(define (distance-to-0 a-pixel)
  (cond
    [(number? a-pixel) ...]
    [(posn? a-pixel) ... (posn-x a-pixel) ... (posn-y a-pixel) ... ]))

Completing the function is easy. If the input is a number, it is the distance to the origin. If it is a structure, we use the old formula for determining the distance to the origin:

(define (distance-to-0 a-pixel)
  (cond
    [(number? a-pixel) a-pixel]
    [(posn? a-pixel) (sqrt
		       (+ (sqr (posn-x a-pixel))
			  (sqr (posn-y a-pixel))))]))

Let us consider a second example. Suppose we are to write functions that deal with geometric shapes. One function might have to compute the area covered by a shape, another one the perimeter, and a third could draw the shape. For the sake of simplicity, let's assume that the class of shapes includes only squares and circles and that their description includes their location (a posn) and their size (a number).

Information about both shapes must be represented with structures, because both have several attributes. Here are the structure definitions:

(define-struct square (nw length))
(define-struct circle (center radius))

and the matching data definition:

A shape is either

  1. a circle structure:

    	   (make-circle p s)
     	  

    where p is a posn and s is a number; or
  2. a square structure:

    	   (make-square p s)
     	  

    where p is a posn and s is a number.

Together, the two classes make up the class of shapes:

The next step of our design recipe requires that we make up examples. Let's start with input examples:

  1. (make-square (make-posn 20 20) 3),

  2. (make-square (make-posn 2 20) 3), and

  3. (make-circle (make-posn 10 99) 1).

To make up examples of input-output relationships, we need to know the purpose of the function. So suppose we need the function perimeter, which computes the perimeter of a shape. From geometry, we know that the perimeter of a square is four times its side, and the perimeter of a circle is [curriculum-Z-G-D-1.gif] times the diameter, which is twice the radius.26 Thus, the perimeter of the above three examples are: 12, 12, and (roughly) 6.28, respectively.

Following the design recipe and the precedent of distance-to-0, we start with the following skeleton of the function:

;; perimeter : shape  ->  number
;; to compute the perimeter of a-shape
(define (perimeter a-shape)
  (cond
    [(square? a-shape) ... ]
    [(circle? a-shape) ... ]))

because the function must first determine to which class a-shape belongs.

Furthermore, each possible input is a structure, so we can also add two selector expressions to each cond-clause:

;; perimeter : shape  ->  number
;; to compute the perimeter of a-shape
(define (perimeter a-shape)
  (cond
    [(square? a-shape) 
     ... (square-nw a-shape) ... (square-length a-shape) ...]
    [(circle? a-shape)
     ... (circle-center a-shape) ... (circle-radius a-shape) ...]))

The selector expressions remind us of the available data.

Now we are ready to finish the definition. We fill the gaps in the two answers by translating the mathematical formulae into Scheme notation:

(define (perimeter a-shape)
  (cond
    [(square? a-shape) (* (square-length a-shape) 4)]
    [(circle? a-shape) (* (* 2 (circle-radius a-shape)) pi)]))

Since the position of a shape does not affect its perimeter, the template's selector expressions for nw and center disappear.

Exercise 7.1.2.   Test perimeter with the examples.    Solution

Exercise 7.1.3.   Develop the function area, which consumes either a circle or a square and computes the area. Is it possible to reuse the template for perimeter by changing the name to area?    Solution

7.2  Designing Functions for Mixed Data

The function development in the preceding section suggests some amendments to our design recipe. Specifically, the data analysis step, the template construction step, and the definition of the function's body require adjustments.

Data Analysis and Design:
When we analyze a problem statement, our first task is to determine whether it mentions distinct classes of data -- which we call MIXED DATA and which is also known as the UNION of data classes. In other words, the data analysis must take into account several aspects now. First, we must determine how many distinct classes of objects are mentioned in the problem and what their important attributes are. If there are several different classes of objects, we are mixing data. Second, we must understand whether the various objects have several properties. If an object has several attributes, we use compound data for its representation. As a result, the resulting data definition may have several clauses that enumerate several possibilities. Indeed, we will see that the data analysis may yield a hierarchy of data definitions.

The example of the preceding section deals with two distinct kinds of shapes, each of which has several properties. We captured this idea with the following data definition:

A shape is either

  1. a circle structure:

    	   (make-circle p s)
     	  

    where p is a posn and s is a number; or
  2. a square structure:

    	   (make-square p s)
     	  

    where p is a posn and s is a number.

It specifies that every shape belongs to one of two subclasses of data.

For a data definition to make sense, it must be possible to formulate conditions that distinguish the various subclasses in a definition. That is, if x stands for a piece of data in the defined class, we must be able to use built-in and user-defined predicates to distinguish the enumerated subclasses from each other. In our running example, the two conditions would be (square? x) and (circle? x).

Template:
Recall that the template is a translation of the input data definition into Scheme. Thus, imagine that we have a data definition that enumerates several distinct possibilities. The first step is to write down a cond-expression with as many clauses as there are enumerated possibilities in the data definition. The second step is to add a condition to each line. Each condition should hold if the input belongs to the corresponding subclass of data mentioned in the data definition.

Here is the template for our running example:

;; f : shape  ->  ???
(define (f a-shape)
  (cond
    [(square? a-shape) ...]
    [(circle? a-shape) ...]))

The output specification and the purpose statement are missing to emphasize that a template has no connection to the output or the purpose of a function.

Once we have formulated the template for the conditional, we refine the template further, cond-line by cond-line. If the purpose of a line is to process atomic information, we are done. If a line processes compound data, we enrich the template with appropriate selector expressions.

Let's illustrate the idea with our running example again:

(define (f a-shape)
  (cond
    [(square? a-shape) 
     ... (square-nw a-shape) ... (square-length a-shape) ...]
    [(circle? a-shape)
     ... (circle-center a-shape) ... (circle-radius a-shape) ...]))

Body:
Using the conditional template, we split the task into simpler tasks. Specifically, we can focus on each cond-line separately, simply considering the question what is the output if we are given this kind of input. All other cases are ignored as we work out one particular clause.

Suppose we want to define a function that computes the perimeter of a shape. Then we start from the template and fill in the gaps:

;; perimeter : shape  ->  number
;; to compute the perimeter of a-shape
(define (perimeter a-shape)
  (cond
    [(square? a-shape) (* (square-length a-shape) 4)]
    [(circle? a-shape) (* (* 2 (circle-radius a-shape)) pi)]))

Figure 17 summarizes the development of our running example.

The remaining steps of the recipes in figures 4, 6, and 12 should be followed on an as-is basis. Figure 18 summarizes the design recipe, with all steps included.

;; Data Definition:
(define-struct circle (center radius))
(define-struct square (nw length))
;; A shape is either
;; 1. a structure: (make-circle p s)
;;    where p is a posn, s a number;
;; 2. a structure: (make-square p s)
;;    where p is a posn, s a number.

;; Contract, Purpose, Header: 
;; perimeter : shape  ->  number
;; to compute the perimeter of a-shape

;; Examples: see tests

;; Template:
;; (define (f a-shape)
;;   (cond
;;     [(square? a-shape)
;;     ... (square-nw a-shape) ... (square-length a-shape) ...]
;;     [(circle? a-shape)
;;     ... (circle-center a-shape) ... (circle-radius a-shape) ...]))

;; Definition: 
(define (perimeter a-shape)
  (cond
    [(circle? a-shape)
     (* (* 2 (circle-radius a-shape)) pi)]
    [(square? a-shape)
     (* (square-length a-shape) 4)]))

;; Tests: (same as examples)
(= (perimeter (make-square ... 3)) 12)
(= (perimeter (make-circle ... 1)) (* 2 pi))

Figure 17:  The design recipe for mixed data: A complete example

Phase Goal                 Activity

Data
  Analysis
  and Design

to formulate a data definition

determine how many distinct classes of ``objects'' make up the classes of problem data [curriculum-Z-G-D-4.gif] enumerate the alternatives in a data definition [curriculum-Z-G-D-4.gif] formulate a data definition for each alternative, if it is a form of compound data

Contract
Purpose and
Header

to name the function;
to specify its classes of
  input data and its
  class of output data;
to describe its purpose;
to formulate a header

name the function, the classes of input data, the class of output data, and specify its purpose:
 ;; name : in1 in2 ...--> out
 ;; to compute ... from x1 ...
 (define (name x1 x2 ...) ...)

Examples         

to characterize the input-
output relationship via examples

create examples of the input-output relationship [curriculum-Z-G-D-4.gif] make sure there is at least one example per subclass

Template

to formulate an outline

introduce a cond-expression with one clause per subclass [curriculum-Z-G-D-4.gif] formulate a condition for each case, using built-in and predefined predicates

Body                 

to define the function

develop a Scheme expression for each cond-line (an answer), assuming that the condition holds

Test                

to discover mistakes
  (``typos'' and logic)

apply the function to the inputs of the examples [curriculum-Z-G-D-4.gif] check that the outputs are as predicted

Figure 18:  Designing a function for mixed data
 (Refines the recipes in figures 4 (pg. 5) and 12 (pg. 9)) 

Even a cursory comparative reading of the design recipes in sections 2.5,  4.4,  6.5, and the current one suggests that the data analysis and the template design steps are becoming more and more important. If we do not understand what kind of data a function consumes, we cannot design it and organize it properly. If, however, we do understand the structure of the data definition and organize our template properly, it is easy to modify or to extend a function. For example, if we add new information to the representation of a circle, then only those cond-clauses related to circles may require changes. Similarly, if we add a new kind of shape to our data definition, say, rectangles, we must add new cond-clauses to our functions.

Exercise 7.2.1.   Develop structure and data definitions for a collection of zoo animals. The collection includes

spiders,
whose relevant attributes are the number of remaining legs (we assume that spiders can lose legs in accidents) and the space they need in case of transport;

elephants,
whose only attributes are the space they need in case of transport;

monkeys,
whose attributes are intelligence and space needed for transportation.

Then develop a template for functions that consume zoo animals.

Develop the function fits?. The function consumes a zoo animal and the volume of a cage. It determines whether the cage is large enough for the animal.    Solution

Exercise 7.2.2.   The administrators of metropolitan transportation agencies manage fleets of vehicles. Develop structure and data definitions for a collection of such vehicles. The collection should include at least buses, limos, cars, and subways. Add at least two attributes per class of vehicle.

Then develop a template for functions that consume vehicles.     Solution

7.3  Composing Functions, Revisited

As we analyze a problem statement, we might wish to develop the data representation in stages. This is especially true when the problem statement mentions several different kinds of objects. It is easier to understand several smaller data definitions than one larger one.

Let's return to our shape problem again. Instead of the class of shapes in a single data definition, we could start with two data definitions, one for each basic shape:

A circle is a structure:

 (make-circle p s) 
where p is a posn and s is a number.

A square is a structure:

 (make-square p s) 
where p is a posn and s is a number.

Once we have developed and understood the basic data definitions, possibly by playing with examples and by writing simple functions, we can introduce data definitions that combine them. For example, we can introduce a data definition for a class of shapes that refers to the two above:

A shape is either

  1. a circle, or

  2. a square.

Now suppose we need a function that consumes shapes. First, we form a cond-expression with conditions for each part of the data definition:

;; f : shape  ->  ???
(define (f a-shape)
  (cond
    [(circle? a-shape) ...]
    [(square? a-shape) ...]))

Given our guideline concerning the composition of functions from section 3.1 and given that the data definition refers to two other data definitions, the natural second step is to pass the argument to auxiliary functions:

(define (f a-shape)
  (cond
    [(circle? a-shape) (f-for-circle a-shape)]
    [(square? a-shape) (f-for-square a-shape)]))

This, in turn, requires that we develop the two auxiliary functions, f-for-circle and f-for-square, including their templates.

;; Data Definition:
(define-struct circle (center radius))
(define-struct square (nw length))
;; A shape is either
;; 1. a structure: (make-circle p s)
;;    where p is a posn, s a number;
;; 2. a structure: (make-square p s)
;;    where p is a posn, s a number.






          
;; Data Definitions:
(define-struct circle (center radius))
;; A circle is a structure:
;;          (make-circle p s)
;;    where p is a posn, s a number;

(define-struct square (nw length))
;; A square is a structure:
;;          (make-square p s)
;;    where p is a posn, s a number.

;; A shape is either
;; 1. a circle, or
;; 2. a square. 


;; Final Definition: 
;; perimeter : shape  ->  number
;; to compute the perimeter of a-shape
(define (perimeter a-shape)
  (cond
    [(circle? a-shape)
     (* (* 2 (circle-radius a-shape)) pi)]
    [(square? a-shape)
     (* (square-length a-shape) 4)]))










          
;; Final Definitions: 
;; perimeter : shape  ->  number
;; to compute the perimeter of a-shape
(define (perimeter a-shape)
  (cond
    [(circle? a-shape)
     (perimeter-circle a-shape)]
    [(square? a-shape)
     (perimeter-square a-shape)]))

;; perimeter-circle : circle  ->  number
;; to compute the perimeter of a-circle
(define (perimeter-circle a-circle)
  (* (* 2 (circle-radius a-circle)) pi))

;; perimeter-square : square  ->  number
;; to compute the perimeter of a-square
(define (perimeter-square a-square)
  (* (square-length a-square) 4))

Figure 19:  Two ways to define perimeter

If we follow this suggestion, we arrive at a collection of three functions, one per data definition. The essential points of the program development are summarized in the right column of figure 19. For a comparison, the left column contains the corresponding pieces of the original program development. In each case, we have as many functions as there are data definitions. Furthermore, the references between the functions in the right column directly match the references among the corresponding data definitions. While this symmetry between data definitions and functions may seem trivial now, it becomes more and more important as we study more complex ways of defining data.

Exercise 7.3.1.   Modify the two versions of perimeter so that they also process rectangles. For our purposes, the description of a rectangle includes its upper-left corner, its width, and its height.     Solution

7.4  Extended Exercise: Moving Shapes

In section 6.6, we developed functions for drawing, translating, and clearing circles and rectangles. As we have just seen, we should think of the two classes of data as subclasses of a class of shapes so that we can just draw, translate, and clear shapes.

Exercise 7.4.1.   Provide a data definition for a general class of shapes. The class should at least subsume the classes of colored circles and rectangles from section 6.6.

Develop the template fun-for-shape, which outlines functions that consume shapes.    Solution

Exercise 7.4.2.   Use the template fun-for-shape to develop draw-shape. The function consumes a shape and draws it on the canvas.    Solution

Exercise 7.4.3.   Use the template fun-for-shape to develop translate-shape. The function consumes a shape and a number delta, and produces a shape whose key position is moved by delta pixels in the x direction.    Solution

Exercise 7.4.4.   Use the template fun-for-shape to develop clear-shape. The function consumes a shape, erases it from the canvas, and returns true.    Solution

Exercise 7.4.5.   Develop the function draw-and-clear-shape. The function consumes a shape, draws it, sleeps for a while, and clears it. If all the effects work out, it produces true.    Solution

Exercise 7.4.6.   Develop move-shape, which moves a shape across the canvas. The function consumes a number (delta) and a shape. The function should draw-and-clear the shape and return a new shape that has been translated by delta pixels. Use this function several times to move a shape across the canvas.    Solution

7.5  Input Errors

Recall our first function:

;; area-of-disk : number  ->  number
;; to compute the area of a disk with radius r
(define (area-of-disk r) 
  (* 3.14 (* r r)))

Clearly, our friends may wish to use this function, especially for some of their geometry homework. Unfortunately, when our friends use this function, they may accidentally apply it to a symbol rather than a number. When that happens, the function stops with a whimsical and uninformative error message:

> (area-of-disk 'my-disk)
*: expects type <number> as 1st argument, given: 'my-disk; ... 

Using predicates, we can do better.

To prevent this kind of accident, we should define checked versions of our functions, when we wish to hand them to our friends. In general, a CHECKED FUNCTION inputs an arbitrary Scheme value: a number, a boolean, a symbol, or a structure. For all those values that are in the class of values for which the original function is defined, the checked version applies the latter; for all others, it signals an error. Concretely, checked-area-of-disk consumes an arbitrary Scheme value, uses area-of-disk to compute the area of the a disk if the input is a number, and stops with an error message otherwise.

Based on the enumeration of Scheme's classes of values, the template for a checked function is as follows:

;; f : Scheme-value  ->  ???
(define (f v)
  (cond
    [(number? v) ...]
    [(boolean? v) ...]
    [(symbol? v) ...]
    [(struct? v) ...]))

Each line corresponds to one possible class of input. If we need to distinguish between the structures, we expand the last line appropriately.

The first clause is the only one where we can use area-of-disk. For the others, however, we must signal an error. In Scheme we use the operation error to do so. It consumes a symbol and a string. Here is an example:

(error 'checked-area-of-disk "number expected")

Hence the full definition of checked-area-of-disk is:

(define (checked-area-of-disk v)
  (cond
    [(number? v) (area-of-disk v)]
    [(boolean? v) (error 'checked-area-of-disk "number expected")]
    [(symbol? v) (error 'checked-area-of-disk "number expected")]
    [(struct? v) (error 'checked-area-of-disk "number expected")]))

Using else we can greatly simplify the function:

;; checked-area-of-disk : Scheme-value  ->  number
;; to compute the area of a disk with radius v, 
;; if v is a number
(define (checked-area-of-disk v)
  (cond
    [(number? v) (area-of-disk v)]
    [else (error 'checked-area-of-disk "number expected")]))

Of course, such a simplification may not always be possible and may require a reordering of the cond-clauses first.

Writing checked functions and simplifying them is important if we distribute the programs to others. Designing programs that work properly, however, is far more important. The book therefore focuses on the design process for the program proper and deemphasizes writing checked versions.

Exercise 7.5.1.   A checked version of area-of-disk can also enforce that the arguments to the function are positive numbers, not just arbitrary numbers. Modify checked-area-of-disk in this way.    Solution

Exercise 7.5.2.   Develop checked versions of the functions profit (figure 5), is-between-5-6? (section 4.2), reply (section 5), distance-to-0 (section 6.1), and perimeter (in the left column of figure 19).    Solution

Exercise 7.5.3.   Take a look at these structure and data definitions:

(define-struct vec (x y))

A speed-vector (vec) is a structure:

  (make-vec x y)  
where both x and y are positive numbers.

Develop the function checked-make-vec, which should be understood as a checked version of the primitive operation make-vec. It ensures that the arguments to make-vec are positive numbers, and not just arbitrary numbers. In other words, checked-make-vec enforces our informal data definition.    Solution


24 We have also discussed images and strings, but we ignore these for now.

25 The posn structure is automatically provided in DrScheme's teaching languages and should never be defined.

26 The perimeter of a circle is also known as circumference.