# Section 14More Self-referential Data Definitions

Lists and natural numbers are two classes of data whose description requires self-referential data definitions. Both data definitions consist of two clauses; both have a single self-reference. Many interesting classes of data, however, require more complex definitions than that. Indeed, there is no end to the variations. It is therefore necessary to learn how to formulate data definitions on our own, starting with informal descriptions of information. Once we have those, we can just follow a slightly modified design recipe for self-referential data definitions.

## 14.1  Structures in Structures

Medical researchers rely on family trees to do research on hereditary diseases. They may, for example, search a family tree for a certain eye color. Computers can help with these tasks, so it is natural to design representations of family trees and functions for processing them.

One way to maintain a family tree of a family is to add a node to the tree every time a child is born. From the node, we can draw connections to the node for the father and the one for the mother, which tells us how the people in the tree are related. For those people in the tree whose parents are unknown, we do not draw any connections. The result is a so-called ancestor family tree because, given any node in the tree, we can find the ancestors of that person if we follow the arrows but not the descendants.

As we record a family tree, we may also want to record certain pieces of information. The birth date, birth weight, the color of the eyes, and the color of the hair are the pieces of information that we care about. Others record different information. Figure 35:  A sample ancestor family tree

See figure 35 for a drawing of an ancestor family tree. Adam is the child of Bettina and Carl; he has yellow eyes and was born in 1950. Similarly, Gustav is the child of Eva and Fred, has brown eyes, and was born in 1988. To represent a child in a family tree is to combine several pieces of information: information about the father, the mother, the name, the birth date, and eye color. This suggests that we define a new structure:

```(define-struct child (father mother name date eyes))
```

The five fields of `child` structures record the required information, which suggests the following data definition:

A `child` is a structure:
`(make-child f m na da ec)`
where `f` and `m` are `child` structures; `na` and `ec` are symbols; and `da` is a number.
While this data definition is simple, it is unfortunately also useless. The definition refers to itself but, because it doesn't have any clauses, there is no way to create a `child` structure. If we tried to create a `child` structure, we would have to write
```(make-child
(make-child
(make-child
(make-child
...
)))
... ... ... ...)
```

without end. It is for this reason that we demand that all self-referential data definitions consist of several clauses (for now) and that at least one of them does not refer to the data definition.

Let's postpone the data definition for a moment and study instead how we can use `child` structures to represent family trees. Suppose we are about to add a child to an existing family tree, and furthermore suppose that we already have representations for the parents. Then we can just construct a new `child` structure. For example, for Adam we could create the following `child` structure:

```(make-child Carl Bettina 'Adam 1950 'yellow)
```

assuming `Carl` and `Bettina` stand for representations of Adam's parents.

The problem is that we don't always know a person's parents. In the family depicted in figure 35, we don't know Bettina's parents. Yet, even if we don't know a person's father or mother, we must still use some Scheme value for the two fields in a `child` structure. We could use all kinds of values to signal a lack of information (`5`, `false`, or `'none`); here, we use `empty`. For example, to construct a `child` structure for Bettina, we do the following:

```(make-child empty empty 'Bettina 1926 'green)
```

Of course, if only one of the two parents is missing, we fill just that field with `empty`.

Our analysis suggests that a `child` node has the following data definition:

A child node is `(make-child f m na da ec)` where
1. `f` and `m` are either

1. `empty` or

2. `child` nodes;

2. `na` and `ec` are symbols;

3. `da` is a number.

This definition is special in two regards. First, it is a self-referential data definition involving structures. Second, the data definition mentions two alternatives for the first and second component. This violates our conventions concerning the shape of data definitions.

We can avoid this problem by defining the collection of nodes in a family tree instead:

A family-tree-node (short: ftn) is either

1. `empty`; or

2. `(make-child f m na da ec)`
where `f` and `m` are ftns, `na`
and `ec` are symbols, and da is a number.

This new definition satisfies our conventions. It consists of two clauses. One of the clauses is self-referential, the other is not.

In contrast to previous data definitions involving structures, the definition of `ftn` is not a plain explanation of what kind of data can show up in which field. Instead, it is multi-clausal and self-referential. Considering that this is the first such data definition, let us carefully translate the example from figure 35 and thus reassure ourselves that the new class of data can represent the information of interest.

The information for Carl is easy to translate into a `ftn`:

```(make-child empty empty 'Carl 1926 'green)
```

Bettina and Fred are represented with similar nodes. Accordingly, the node for Adam is created with

```(make-child (make-child empty empty 'Carl 1926 'green)
(make-child empty empty 'Bettina 1926 'green)
1950
'yellow)
```

As the examples show, a simple-minded, node-by-node transliteration of figure 35 requires numerous repetitions of data. For example, if we constructed the `child` structure for Dave like the one for Adam, we would get

```(make-child (make-child empty empty 'Carl 1926 'green)
(make-child empty empty 'Bettina 1926 'green)
'Dave
1955
'black)
```

Hence it is a good idea to introduce a variable definition per node and to use the variable thereafter. To make things easy, we use `Carl` to stand for the `child` structure that describes Carl, and so on. The complete transliteration of the family tree into Scheme can be found in figure 36.

 ```;; Oldest Generation: (define Carl (make-child empty empty 'Carl 1926 'green)) (define Bettina (make-child empty empty 'Bettina 1926 'green)) ;; Middle Generation: (define Adam (make-child Carl Bettina 'Adam 1950 'yellow)) (define Dave (make-child Carl Bettina 'Dave 1955 'black)) (define Eva (make-child Carl Bettina 'Eva 1965 'blue)) (define Fred (make-child empty empty 'Fred 1966 'pink)) ;; Youngest Generation: (define Gustav (make-child Fred Eva 'Gustav 1988 'brown)) ``` Figure 36:  A Scheme representation of the sample family tree

The structure definitions in figure 36 naturally correspond to an image of deeply nested boxes. Each box has five compartments. The first two contain boxes again, which in turn contain boxes in their first two compartments, and so on. Thus, if we were to draw the structure definitions for the family tree using nested boxes, we would quickly be overwhelmed by the details of the picture. Furthermore, the picture would copy certain portions of the tree just like our attempt to use `make-child` without variable definitions. For these reasons, it is better to imagine the structures as boxes and arrows, as originally drawn in figure 35. In general, a programmer must flexibly switch back and forth between both of these graphical illustrations. For extracting values from structures, the boxes-in-boxes image works best; for finding our way around large collections of interconnected structures, the boxes-and-arrows image works better.

Equipped with a firm understanding of the family tree representation, we can turn to the design of functions that consume family trees. Let us first look at a generic function of this kind:

```;; `fun-for-ftn : ftn  ->  ???`
(define (fun-for-ftn a-ftree) ...)
```

After all, we should be able to construct the template without considering the purpose of a function.

Since the data definition for `ftn`s contains two clauses, the template must consist of a cond-expression with two clauses. The first deals with `empty`, the second with `child` structures:

```;; `fun-for-ftn : ftn  ->  ???`
(define (fun-for-ftn a-ftree)
(cond
[(empty? a-ftree) ...]
[else ; `(child? a-ftree)`
... ]))
```

Furthermore, for the first clause, the input is atomic so there is nothing further to be done. For the second clause, though, the input contains five pieces of information: two other family tree nodes, the person's name, birth date, and eye color:

```;; `fun-for-ftn : ftn  ->  ???`
(define (fun-for-ftn a-ftree)
(cond
[(empty? a-ftree) ...]
[else
... (fun-for-ftn (child-father a-ftree)) ...
... (fun-for-ftn (child-mother a-ftree)) ...
... (child-name a-ftree) ...
... (child-date a-ftree) ...
... (child-eyes a-ftree) ...]))
```

We also apply `fun-for-ftn` to the `father` and `mother` fields because of the self-references in the second clause of the data definition.

Let us now turn to a concrete example: `blue-eyed-ancestor?`, the function that determines whether anyone in some given family tree has blue eyes:

```;; `blue-eyed-ancestor? : ftn  ->  boolean`
;; to determine whether `a-ftree` contains a `child`
;; structure with `'blue` in the `eyes` field
(define (blue-eyed-ancestor? a-ftree) ...)
```

Following our recipe, we first develop some examples. Consider the family tree node for Carl. He does not have blue eyes, and because he doesn't have any (known) ancestors in our family tree, the family tree represented by this node does not contain a person with blue eyes. In short,

```(blue-eyed-ancestor? Carl)
```

evaluates to `false`. In contrast, the family tree represented by `Gustav` contains a node for Eva who does have blue eyes. Hence

```(blue-eyed-ancestor? Gustav)
```

evaluates to `true`.

The function template is like that of `fun-for-ftn`, except that we use the name `blue-eyed-ancestor?`. As always, we use the template to guide the function design. First we assume that `(empty? a-ftree)` holds. In that case, the family tree is empty, and nobody has blue eyes. Hence the answer must be `false`.

The second clause of the template contains several expressions, which we must interpret:

1. `(blue-eyed-ancestor? (child-father a-ftree))`, which determines whether someone in the father's `ftn` has blue eyes;

2. `(blue-eyed-ancestor? (child-mother a-ftree))`, which determines whether someone in the mother's `ftn` has blue eyes;

3. `(child-name a-ftree)`, which extracts the `child`'s name;

4. `(child-date a-ftree)`, which extracts the `child`'s date of birth; and

5. `(child-eyes a-ftree)`, which extracts the `child`'s eye color.

It is now up to us to use these values properly. Clearly, if the `child` structure contains `'blue` in the `eyes` field, the function's answer is `true`. Otherwise, the function produces `true` if there is a blue-eyed person in either the father's or the mother's family tree. The rest of the data is useless.

Our discussion suggests that we formulate a conditional expression and that the first condition is

```(symbol=? (child-eyes a-ftree) 'blue)
```

The two recursions are the other two conditions. If either one produces `true`, the function produces `true`. The `else`-clause produces `false`.

In summary, the answer in the second clause is the expression:

```(cond
[(symbol=? (child-eyes a-ftree) 'blue) true]
[(blue-eyed-ancestor? (child-father a-ftree)) true]
[(blue-eyed-ancestor? (child-mother a-ftree)) true]
[else false])
```

The first definition in figure 37 pulls everything together. The second definition shows how to formulate this cond-expression as an equivalent or-expression, testing one condition after the next, until one of them is `true` or all of them have evaluated to `false`.

 ```;; `blue-eyed-ancestor? : ftn  ->  boolean` ;; to determine whether `a-ftree` contains a `child` ;; structure with `'blue` in the `eyes` field ;; version 1: using a nested cond-expression (define (blue-eyed-ancestor? a-ftree) (cond [(empty? a-ftree) false] [else (cond [(symbol=? (child-eyes a-ftree) 'blue) true] [(blue-eyed-ancestor? (child-father a-ftree)) true] [(blue-eyed-ancestor? (child-mother a-ftree)) true] [else false])])) ``` ```;; `blue-eyed-ancestor? : ftn  ->  boolean` ;; to determine whether `a-ftree` contains a `child` ;; structure with `'blue` in the `eyes` field ;; version 2: using an or-expression (define (blue-eyed-ancestor? a-ftree) (cond [(empty? a-ftree) false] [else (or (symbol=? (child-eyes a-ftree) 'blue) (or (blue-eyed-ancestor? (child-father a-ftree)) (blue-eyed-ancestor? (child-mother a-ftree))))])) ``` Figure 37:  Two functions for finding a blue-eyed ancestor

The function `blue-eyed-ancestor?` is unusual in that it uses the recursions as conditions in a cond-expressions. To understand how this works, let us evaluate an application of `blue-eyed-ancestor?` to `Carl` by hand:

```  (blue-eyed-ancestor? Carl)
= (blue-eyed-ancestor? (make-child empty empty 'Carl 1926 'green))
= (cond
[(empty? (make-child empty empty 'Carl 1926 'green)) false]
[else
(cond
[(symbol=?
(child-eyes (make-child empty empty 'Carl 1926 'green))
'blue)
true]
[(blue-eyed-ancestor?
(child-father (make-child empty empty 'Carl 1926 'green)))
true]
[(blue-eyed-ancestor?
(child-mother (make-child empty empty 'Carl 1926 'green)))
true]
[else false])])
= (cond
[(symbol=? 'green 'blue) true]
[(blue-eyed-ancestor? empty) true]
[(blue-eyed-ancestor? empty) true]
[else false])
= (cond
[false true]
[false true]
[false true]
[else false])
= false
```

The evaluation confirms that `blue-eyed-ancestor?` works properly for `Carl`, and it also illustrates how the function works.

Exercise 14.1.1.   The second definition of `blue-eyed-ancestor?` in figure 37 uses an or-expression instead of a nested conditional. Use a hand-evaluation to show that this definition produces the same output for the inputs `empty` and `Carl`. Solution

Exercise 14.1.2.   Confirm that

```(blue-eyed-ancestor? empty)
```

evaluates to `false` with a hand-evaluation.

Evaluate `(blue-eyed-ancestor? Gustav)` by hand and with DrScheme. For the hand-evaluation, skip those steps in the evaluation that concern extractions, comparisons, and conditions involving `empty?`. Also reuse established equations where possible, especially the one above. Solution

Exercise 14.1.3.   Develop `count-persons`. The function consumes a family tree node and produces the number of people in the corresponding family tree. Solution

Exercise 14.1.4.   Develop the function `average-age`. It consumes a family tree node and the current year. It produces the average age of all people in the family tree. Solution

Exercise 14.1.5.   Develop the function `eye-colors`, which consumes a family tree node and produces a list of all eye colors in the tree. An eye color may occur more than once in the list.

Hint: Use the Scheme operation `append`, which consumes two lists and produces the concatenation of the two lists. For example:

```  (append (list 'a 'b 'c) (list 'd 'e))
= (list 'a 'b 'c 'd 'e)
```

We discuss the development of functions like `append` in section 17. Solution

Exercise 14.1.6.   Suppose we need the function `proper-blue-eyed-ancestor?`. It is like `blue-eyed-ancestor?` but responds with `true` only when some proper ancestor, not the given one, has blue eyes.

The contract for this new function is the same as for the old one:

```;; `proper-blue-eyed-ancestor? : ftn  ->  boolean`
;; to determine whether `a-ftree` has a blue-eyed ancestor
(define (proper-blue-eyed-ancestor? a-ftree) ...)
```

The results differ slightly.

To appreciate the difference, we need to look at Eva, who is blue-eyed, but does not have a blue-eyed ancestor. Hence

```(blue-eyed-ancestor? Eva)
```

is `true` but

```(proper-blue-eyed-ancestor? Eva)
```

is `false`. After all `Eva` is not a proper ancestor of herself.

Suppose a friend sees the purpose statement and comes up with this solution:

```(define (proper-blue-eyed-ancestor? a-ftree)
(cond
[(empty? a-ftree) false]
[else (or (proper-blue-eyed-ancestor? (child-father a-ftree))
(proper-blue-eyed-ancestor? (child-mother a-ftree)))]))
```

What would be the result of `(proper-blue-eyed-ancestor? A)` for any `ftn` `A`?

Fix the friend's solution. Solution

## 14.2  Extended Exercise: Binary Search Trees

Programmers often work with trees, though rarely with family trees. A particularly well-known form of tree is the binary search tree. Many applications employ binary search trees to store and to retrieve information.

To be concrete, we discuss binary trees that manage information about people. In this context, a binary tree is similar to a family tree but instead of `child` structures it contains `node`s:

```(define-struct node (ssn name left right))
```

Here we have decided to record the social security number, the name, and two other trees. The latter are like the parent fields of family trees, though the relationship between a `node` and its `left` and `right` trees is not based on family relationships.

The corresponding data definition is just like the one for family trees: A binary-tree (short: BT) is either

1. `false`; or

2. `(make-node soc pn lft rgt)`
where `soc` is a number, `pn` is a symbol, and `lft` and `rgt` are `BT`s.

The choice of `false` to indicate lack of information is arbitrary. We could have chosen `empty` again, but `false` is an equally good and equally frequent choice that we should become familiar with.

Here are two binary trees:

```   (make-node
15
'd
false
(make-node 24 'i false false))
```

```(make-node
15
'd
(make-node 87 'h false false)
false)
```

Figure 38 shows how we should think about such trees. The trees are drawn upside down, that is, with the root at the top and the crown of the tree at the bottom. Each circle corresponds to a node, labeled with the `ssn` field of a corresponding `node` structure. The trees omit `false`.

Exercise 14.2.1.   Draw the two trees above in the manner of figure 38. Then develop `contains-bt`. The function consumes a number and a `BT` and determines whether the number occurs in the tree. Solution

Exercise 14.2.2.   Develop `search-bt`. The function consumes a number `n` and a `BT`. If the tree contains a `node` structure whose `soc` field is `n`, the function produces the value of the `pn` field in that node. Otherwise, the function produces `false`.

Hint: Use `contains-bt`. Or, use `boolean?` to find out whether `search-bt` was successfully used on a subtree. We will discuss this second technique, called backtracking, in the intermezzo at the end of this part. Solution

 Tree A: Tree B:  Figure 38:  A binary search tree and a binary tree

Both trees in figure 38 are binary trees but they differ in a significant way. If we read the numbers in the two trees from left to right we obtain two sequences: The sequence for tree A is sorted in ascending order, the one for B is not.

A binary tree that has an ordered sequence of information is a BINARY SEARCH TREE. Every binary search tree is a binary tree, but not every binary tree is a binary search tree. We say that the class of binary search trees is a PROPER SUBCLASS of that of binary trees, that is, a class that does not contain all binary trees. More concretely, we formulate a condition -- or data invariant -- that distinguishes a binary search tree from a binary tree:

The BST Invariant

A binary-search-tree (short: BST) is a `BT`:

1. `false` is always a `BST`;

2. `(make-node soc pn lft rgt)` is a `BST` if

1. `lft` and `rgt` are `BST`s,

2. all `ssn` numbers in `lft` are smaller than `soc`, and

3. all `ssn` numbers in `rgt` are larger than `soc`.

The second and third conditions are different from what we have seen in previous data definitions. They place an additional and unusual burden on the construction `BST`s. We must inspect all numbers in these trees and ensure that they are smaller (or larger) than `soc`.

Exercise 14.2.3.   Develop the function `inorder`. It consumes a binary tree and produces a list of all the `ssn` numbers in the tree. The list contains the numbers in the left-to-right order we have used above.

Hint: Use the Scheme operation `append`, which concatenates lists:

```(append (list 1 2 3) (list 4) (list 5 6 7))
evaluates to
(list 1 2 3 4 5 6 7)
```

What does `inorder` produce for a binary search tree? Solution

Looking for a specific `node` in a `BST` takes fewer steps than looking for the same `node` in a `BT`. To find out whether a `BT` contains a node with a specific `ssn` field, a function may have to look at every `node` of the tree. In contrast, to inspect a binary search tree requires far fewer inspections than that. Suppose we are given the `BST`:

```(make-node 66 'a L R)
```

If we are looking for `66`, we have found it. Now suppose we are looking for `63`. Given the above `node`, we can focus the search on `L` because all `node`s with `ssn`s smaller than `66` are in `L`. Similarly, if we were to look for `99`, we would ignore `L` and focus on `R` because all `node`s with `ssn`s larger than `66` are in `R`.

Exercise 14.2.4.   Develop `search-bst`. The function consumes a number `n` and a `BST`. If the tree contains a `node` structure whose `soc` field is `n`, the function produces the value of the `pn` field in that node. Otherwise, the function produces `false`. The function organization must exploit the BST Invariant so that the function performs as few comparisons as necessary. Compare searching in binary search trees with searching in sorted lists (exercise 12.2.2). Solution

Building a binary tree is easy; building a binary search tree is a complicated, error-prone affair. To create a `BT` we combine two `BT`s, an `ssn` number and a `name` with `make-node`. The result is, by definition, a `BT`. To create a `BST`, this procedure fails because the result would typically not be a `BST`. For example, if one tree contains `3` and `5`, and the other one contains `2` and `6`, there is no way to join these two `BST`s into a single binary search tree.

We can overcome this problem in (at least) two ways. First, given a list of numbers and symbols, we can determine by hand what the corresponding `BST` should look like and then use `make-node` to build it. Second, we can write a function that builds a `BST` from the list, one `node` after another.

Exercise 14.2.5.   Develop the function `create-bst`. It consumes a `BST` `B`, a number `N`, and a symbol `S`. It produces a `BST` that is just like `B` and that in place of one `false` subtree contains the `node` structure

```(make-node N S false false)
```

Test the function with `(create-bst false 66 'a)`; this should create a single `node`. Then show that the following holds:

```  (create-bst (create-bst false 66 'a) 53 'b)
= (make-node 66
'a
(make-node 53 'b false false)
false)
```

Finally, create tree A from figure 38 using `create-bst`. Solution

Exercise 14.2.6.   Develop the function `create-bst-from-list`. It consumes a list of numbers and names; it produces a `BST` by repeatedly applying `create-bst`.

The data definition for a list of numbers and names is as follows:

A list-of-number/name is either

1. `empty` or

2. `(cons (list ssn nom) lonn)`
where `ssn` is a number, `nom` a symbol,
and `lonn` is a `list-of-number/name`.

Consider the following examples:

 ``` (define sample '((99 o) (77 l) (24 i) (10 h) (95 g) (15 d) (89 c) (29 b) (63 a))) ``` ``` (define sample (list (list 99 'o) (list 77 'l) (list 24 'i) (list 10 'h) (list 95 'g) (list 15 'd) (list 89 'c) (list 29 'b) (list 63 'a))) ```

They are equivalent, although the left one is defined with the quote abbreviation, the right one using `list`. The left tree in figure 38 is the result of using `create-bst-from-list` on this list. Solution

## 14.3  Lists in Lists

The World Wide Web, or just ``the Web,'' has become the most interesting part of the Internet, a global network of computers. Roughly speaking, the Web is a collection of Web pages. Each Web page is a sequence of words, pictures, movies, audio messages, and many more things. Most important, Web pages also contain links to other Web pages.

A Web browser enables people to view Web pages. It presents a Web page as a sequence of words, images, and so on. Some of the words on a page may be underlined. Clicking on underlined words leads to a new Web page. Most modern browsers also provide a Web page composer. These are tools that help people create collections of Web pages. A composer can, among other things, search for words or replace one word with another. In short, Web pages are things that we should be able to represent on computers, and there are many functions that process Web pages.

To simplify our problem, we consider only Web pages of words and nested Web pages. One way of understanding such a page is as a sequence of words and Web pages. This informal description suggests a natural representation of Web pages as lists of symbols, which represent words, and Web pages, which represent nested Web pages. After all, we have emphasized before that a list may contain different kinds of things. Still, when we spell out this idea as a data definition, we get something rather unusual:

A Web-page (short: WP) is either

1. `empty`;

2. `(cons s wp)`
where `s` is a symbol and `wp` is a Web page; or

3. `(cons ewp wp)`
where both `ewp` and `wp` are Web pages.

This data definition differs from that of a list of symbols in that it has three clauses instead of two and that it has three self-references instead of one. Of these self-references, the one at the beginning of a `cons`tructed list is the most unusual. We refer to such Web pages as immediately embedded Web pages.

Because the data definition is unusual, we construct some examples of Web pages before we continue. Here is a plain page:

```'(The TeachScheme! Project aims to improve the
problem-solving and organization skills of high
school students. It provides software and lecture
notes as well as exercises and solutions for teachers.)
```

It contains nothing but words. Here is a complex page:

```'(The TeachScheme Web Page
Here you can find:
(LectureNotes for Teachers)
(Guidance for (DrScheme: a Scheme programming environment))
(Exercise Sets)
(Solutions for Exercises)
For further information: write to [email protected])
```

The immediately embedded pages start with parentheses and the symbols `'LectureNotes`, `'Guidance`, `'Exercises`, and `'Solutions`. The second embedded Web page contains another embedded page, which starts with the word `'DrScheme`. We say this page is embedded with respect to the entire page.

Let's develop the function `size`, which consumes a Web page and produces the number of words that it and all of its embedded pages contain:

```;; `size : WP  ->  number`
;; to count the number of symbols that occur in `a-wp`
(define (size a-wp) ...)
```

The two Web pages above suggest two good examples, but they are too complex. Here are three examples, one per subclass of data:

```(= (size empty)
0)
```

```(= (size (cons 'One empty))
1)
```

```(= (size (cons (cons 'One empty) empty))
1)
```

The first two examples are obvious. The third one deserves a short explanation. It is a Web page that contains one immediately embedded Web page, and nothing else. The embedded Web page is the one of the second example, and it contains the one and only symbol of the third example.

To develop the template for `size`, let's carefully step through the design recipe. The shape of the data definition suggests that we need three `cond`-clauses: one for the `empty` page, one for a page that starts with a symbol, and one for a page that starts with an embedded Web page. While the first condition is the familiar test for `empty`, the second and third need closer inspection because both clauses in the data definition use `cons`, and a simple `cons?` won't distinguish between the two forms of data.

If the page is not `empty`, it is certainly `cons`tructed, and the distinguishing feature is the first item on the list. In other words, the second condition must use a predicate that tests the first item on `a-wp`:

```;; `size : WP  ->  number`
;; to count the number of symbols that occur in a-wp
(define (size a-wp)
(cond
[(empty? a-wp) ...]
[(symbol? (first a-wp)) ... (first a-wp) ... (size (rest a-wp)) ...]
[else ... (size (first a-wp)) ... (size (rest a-wp)) ...]))
```

The rest of the template is as usual. The second and third `cond` clauses contain selector expressions for the first item and the rest of the list. Because `(rest a-wp)` is always a Web page and because `(first a-wp)` is one in the third case, we also add a recursive call to size for these selector expressions.

Using the examples and the template, we are ready to design `size`: see figure 39. The differences between the definition and the template are minimal, which shows again how much of a function we can design by merely thinking systematically about the data definition for its inputs.

 ```;; `size : WP  ->  number` ;; to count the number of symbols that occur in a-wp (define (size a-wp) (cond [(empty? a-wp) 0] [(symbol? (first a-wp)) (+ 1 (size (rest a-wp)))] [else (+ (size (first a-wp)) (size (rest a-wp)))])) ``` Figure 39:  The definition of size for Web pages

Exercise 14.3.1.   Briefly explain how to define `size` using its template and the examples. Test `size` using the examples from above.

Exercise 14.3.2.   Develop the function `occurs1`. The function consumes a Web page and a symbol. It produces the number of times the symbol occurs in the Web page, ignoring the nested Web pages.

Develop the function `occurs2`. It is like `occurs1`, but counts all occurrences of the symbol, including in embedded Web pages. Solution

Exercise 14.3.3.   Develop the function `replace`. The function consumes two symbols, `new` and `old`, and a Web page, `a-wp`. It produces a page that is structurally identical to `a-wp` but with all occurrences of `old` replaced by `new`. Solution

Exercise 14.3.4.   People do not like deep Web trees because they require too many page switches to reach useful information. For that reason a Web page designer may also want to measure the depth of a page. A page containing only symbols has depth `0`. A page with an immediately embedded page has the depth of the embedded page plus `1`. If a page has several immediately embedded Web pages, its depth is the maximum of the depths of embedded Web pages plus `1`. Develop `depth`, which consumes a Web page and computes its depth. Solution

## 14.4  Extended Exercise: Evaluating Scheme

DrScheme is itself a program that consists of several parts. One function checks whether the definitions and expressions we wrote down are grammatical Scheme expressions. Another one evaluates Scheme expressions. With what we have learned in this section, we can now develop simple versions of these functions.

Our first task is to agree on a data representation for Scheme programs. In other words, we must figure out how to represent a Scheme expression as a piece of Scheme data. This sounds unusual, but it is not difficult. Suppose we just want to represent numbers, variables, additions, and multiplications for a start. Clearly, numbers can stand for numbers and symbols for variables. Additions and multiplications, however, call for a class of compound data because they consist of an operator and two subexpressions.

A straightforward way to represent additions and multiplications is to use two structures: one for additions and another one for multiplications. Here are the structure definitions:

```(define-struct add (left right))
(define-struct mul (left right))
```

Each structure has two components. One represents the left expression and the other one the right expression of the operation.

Scheme expression representation of Scheme expression
`3` `3`
`x` `'x`
`(* 3 10)` `(make-mul 3 10)`
`(+ (* 3 3) (* 4 4))``(make-add (make-mul 3 3) (make-mul 4 4))`
`(+ (* x x) (* y y))``(make-add (make-mul 'x 'x) (make-mul 'y 'y))`
`(* 1/2 (* 3 3))` `(make-mul 1/2 (make-mul 3 3))`

Let's look at some examples:

These examples cover all cases: numbers, variables, simple expressions, and nested expressions.

Exercise 14.4.1.   Provide a data definition for the representation of Scheme expressions. Then translate the following expressions into representations:

1. `(+ 10 -10)`

2. `(+ (* 20 3) 33)`

3. `(* 3.14 (* r r))`

4. `(+ (* 9/5 c) 32)`

5. `(+ (* 3.14 (* o o)) (* 3.14 (* i i)))` Solution

A Scheme evaluator is a function that consumes a representation of a Scheme expression and produces its value. For example, the expression `3` has the value `3`, `(+ 3 5)` has the value `8`, `(+ (* 3 3) (* 4 4))` has the value `25`, etc. Since we are ignoring definitions for now, an expression that contains a variable, for example, `(+ 3 x)`, does not have a value; after all, we do not know what the variable stands for. In other words, our Scheme evaluator should be applied only to representations of expressions that do not contain variables. We say such expressions are numeric.

Exercise 14.4.2.   Develop the function `numeric?`, which consumes (the representation of) a Scheme expression and determines whether it is numeric. Solution

Exercise 14.4.3.   Provide a data definition for numeric expressions. Develop the function `evaluate-expression`. The function consumes (the representation of) a numeric Scheme expression and computes its value. When the function is tested, modify it so it consumes all kinds of Scheme expressions; the revised version raises an error when it encounters a variable. Solution

Exercise 14.4.4.   When people evaluate an application `(f a)` they substitute `a` for `f`'s parameter in `f`'s body. More generally, when people evaluate expressions with variables, they substitute the variables with values.

Develop the function `subst`. The function consumes (the representation of) a variable (`V`), a number (`N`), and (the representation of) a Scheme expression. It produces a structurally equivalent expression in which all occurrences of `V` are substituted by `N`. Solution