Section 14

More Self-referential Data Definitions

Lists and natural numbers are two classes of data whose description requires self-referential data definitions. Both data definitions consist of two clauses; both have a single self-reference. Many interesting classes of data, however, require more complex definitions than that. Indeed, there is no end to the variations. It is therefore necessary to learn how to formulate data definitions on our own, starting with informal descriptions of information. Once we have those, we can just follow a slightly modified design recipe for self-referential data definitions.

14.1 Structures in Structures

Medical researchers rely on family trees to do research on hereditary diseases. They may, for example, search a family tree for a certain eye color. Computers can help with these tasks, so it is natural to design representations of family trees and functions for processing them.

One way to maintain a family tree of a family is to add a node to the tree every time a child is born. From the node, we can draw connections to the node for the father and the one for the mother, which tells us how the people in the tree are related. For those people in the tree whose parents are unknown, we do not draw any connections. The result is a so-called ancestor family tree because, given any node in the tree, we can find the ancestors of that person if we follow the arrows but not the descendants.

As we record a family tree, we may also want to record certain pieces of information. The birth date, birth weight, the color of the eyes, and the color of the hair are the pieces of information that we care about. Others record different information.

Figure 35: A sample ancestor family tree

See figure 35 for a drawing of an ancestor family tree. Adam is the child of Bettina and Carl; he has yellow eyes and was born in 1950. Similarly, Gustav is the child of Eva and Fred, has brown eyes, and was born in 1988. To represent a child in a family tree is to combine several pieces of information: information about the father, the mother, the name, the birth date, and eye color. This suggests that we define a new structure:

(define-struct child (father mother name date eyes))

The five fields of child structures record the required information, which suggests the following data definition:

A child is a structure:

(make-child f m na da ec)
where f and m are child structures; na and ec are symbols; and da is a number.

While this data definition is simple, it is unfortunately also useless. The definition refers to itself but, because it doesn't have any clauses, there is no way to create a child structure. If we tried to create a child structure, we would have to write

(make-child 
  (make-child
    (make-child 
      (make-child 
        ... 
        )))
   ... ... ... ...)

without end. It is for this reason that we demand that all self-referential data definitions consist of several clauses (for now) and that at least one of them does not refer to the data definition.

Let's postpone the data definition for a moment and study instead how we can use child structures to represent family trees. Suppose we are about to add a child to an existing family tree, and furthermore suppose that we already have representations for the parents. Then we can just construct a new child structure. For example, for Adam we could create the following child structure:

(make-child Carl Bettina 'Adam 1950 'yellow)

assuming Carl and Bettina stand for representations of Adam's parents.

The problem is that we don't always know a person's parents. In the family depicted in figure 35, we don't know Bettina's parents. Yet, even if we don't know a person's father or mother, we must still use some Scheme value for the two fields in a child structure. We could use all kinds of values to signal a lack of information (5, false, or 'none); here, we use empty. For example, to construct a child structure for Bettina, we do the following:

(make-child empty empty 'Bettina 1926 'green)

Of course, if only one of the two parents is missing, we fill just that field with empty.

Our analysis suggests that a child node has the following data definition:

A child node is (make-child f m na da ec) where

f and m are either

empty or

child nodes;

na and ec are symbols;

da is a number.

This definition is special in two regards. First, it is a self-referential data definition involving structures. Second, the data definition mentions two alternatives for the first and second component. This violates our conventions concerning the shape of data definitions.

We can avoid this problem by defining the collection of nodes in a family tree instead:

A family-tree-node (short: ftn) is either

empty; or
(make-child f m na da ec)
where f and m are ftns, na
and ec are symbols, and da is a number.

This new definition satisfies our conventions. It consists of two clauses. One of the clauses is self-referential, the other is not.

In contrast to previous data definitions involving structures, the definition of ftn is not a plain explanation of what kind of data can show up in which field. Instead, it is multi-clausal and self-referential. Considering that this is the first such data definition, let us carefully translate the example from figure 35 and thus reassure ourselves that the new class of data can represent the information of interest.

The information for Carl is easy to translate into a ftn:

(make-child empty empty 'Carl 1926 'green)

Bettina and Fred are represented with similar nodes. Accordingly, the node for Adam is created with

(make-child (make-child empty empty 'Carl 1926 'green) 
            (make-child empty empty 'Bettina 1926 'green)
            'Adam 
            1950 
            'yellow)

As the examples show, a simple-minded, node-by-node transliteration of figure 35 requires numerous repetitions of data. For example, if we constructed the child structure for Dave like the one for Adam, we would get

(make-child (make-child empty empty 'Carl 1926 'green) 
            (make-child empty empty 'Bettina 1926 'green)
            'Dave
            1955
            'black)

Hence it is a good idea to introduce a variable definition per node and to use the variable thereafter. To make things easy, we use Carl to stand for the child structure that describes Carl, and so on. The complete transliteration of the family tree into Scheme can be found in figure 36.

;; Oldest Generation:
(define Carl (make-child empty empty 'Carl 1926 'green))
(define Bettina (make-child empty empty 'Bettina 1926 'green))

;; Middle Generation:
(define Adam (make-child Carl Bettina 'Adam 1950 'yellow))
(define Dave (make-child Carl Bettina 'Dave 1955 'black))
(define Eva (make-child Carl Bettina 'Eva 1965 'blue))
(define Fred (make-child empty empty 'Fred 1966 'pink))

;; Youngest Generation: 
(define Gustav (make-child Fred Eva 'Gustav 1988 'brown))

Figure 36: A Scheme representation of the sample family tree

The structure definitions in figure 36 naturally correspond to an image of deeply nested boxes. Each box has five compartments. The first two contain boxes again, which in turn contain boxes in their first two compartments, and so on. Thus, if we were to draw the structure definitions for the family tree using nested boxes, we would quickly be overwhelmed by the details of the picture. Furthermore, the picture would copy certain portions of the tree just like our attempt to use make-child without variable definitions. For these reasons, it is better to imagine the structures as boxes and arrows, as originally drawn in figure 35. In general, a programmer must flexibly switch back and forth between both of these graphical illustrations. For extracting values from structures, the boxes-in-boxes image works best; for finding our way around large collections of interconnected structures, the boxes-and-arrows image works better.

Equipped with a firm understanding of the family tree representation, we can turn to the design of functions that consume family trees. Let us first look at a generic function of this kind:

;; fun-for-ftn : ftn  ->  ???
(define (fun-for-ftn a-ftree) ...)

After all, we should be able to construct the template without considering the purpose of a function.

Since the data definition for ftns contains two clauses, the template must consist of a cond-expression with two clauses. The first deals with empty, the second with child structures:

;; fun-for-ftn : ftn  ->  ???
(define (fun-for-ftn a-ftree)
  (cond
    [(empty? a-ftree) ...]
    [else ; (child? a-ftree)
      ... ]))

Furthermore, for the first clause, the input is atomic so there is nothing further to be done. For the second clause, though, the input contains five pieces of information: two other family tree nodes, the person's name, birth date, and eye color:

;; fun-for-ftn : ftn  ->  ???
(define (fun-for-ftn a-ftree)
  (cond
    [(empty? a-ftree) ...]
    [else
      ... (fun-for-ftn (child-father a-ftree)) ...
      ... (fun-for-ftn (child-mother a-ftree)) ...
      ... (child-name a-ftree) ...
      ... (child-date a-ftree) ...
      ... (child-eyes a-ftree) ...]))

We also apply fun-for-ftn to the father and mother fields because of the self-references in the second clause of the data definition.

Let us now turn to a concrete example: blue-eyed-ancestor?, the function that determines whether anyone in some given family tree has blue eyes:

;; blue-eyed-ancestor? : ftn  ->  boolean
;; to determine whether a-ftree contains a child
;; structure with 'blue in the eyes field
(define (blue-eyed-ancestor? a-ftree) ...)

Following our recipe, we first develop some examples. Consider the family tree node for Carl. He does not have blue eyes, and because he doesn't have any (known) ancestors in our family tree, the family tree represented by this node does not contain a person with blue eyes. In short,

(blue-eyed-ancestor? Carl)

evaluates to false. In contrast, the family tree represented by Gustav contains a node for Eva who does have blue eyes. Hence

(blue-eyed-ancestor? Gustav)

evaluates to true.

The function template is like that of fun-for-ftn, except that we use the name blue-eyed-ancestor?. As always, we use the template to guide the function design. First we assume that (empty? a-ftree) holds. In that case, the family tree is empty, and nobody has blue eyes. Hence the answer must be false.

The second clause of the template contains several expressions, which we must interpret:

(blue-eyed-ancestor? (child-father a-ftree)), which determines whether someone in the father's ftn has blue eyes;
(blue-eyed-ancestor? (child-mother a-ftree)), which determines whether someone in the mother's ftn has blue eyes;
(child-name a-ftree), which extracts the child's name;
(child-date a-ftree), which extracts the child's date of birth; and
(child-eyes a-ftree), which extracts the child's eye color.

It is now up to us to use these values properly. Clearly, if the child structure contains 'blue in the eyes field, the function's answer is true. Otherwise, the function produces true if there is a blue-eyed person in either the father's or the mother's family tree. The rest of the data is useless.

Our discussion suggests that we formulate a conditional expression and that the first condition is

(symbol=? (child-eyes a-ftree) 'blue)

The two recursions are the other two conditions. If either one produces true, the function produces true. The else-clause produces false.

In summary, the answer in the second clause is the expression:

(cond
  [(symbol=? (child-eyes a-ftree) 'blue) true]
  [(blue-eyed-ancestor? (child-father a-ftree)) true]
  [(blue-eyed-ancestor? (child-mother a-ftree)) true]
  [else false])

The first definition in figure 37 pulls everything together. The second definition shows how to formulate this cond-expression as an equivalent or-expression, testing one condition after the next, until one of them is true or all of them have evaluated to false.

;; blue-eyed-ancestor? : ftn  ->  boolean
;; to determine whether a-ftree contains a child
;; structure with 'blue in the eyes field
;; version 1: using a nested cond-expression
(define (blue-eyed-ancestor? a-ftree)
  (cond
    [(empty? a-ftree) false]
    [else (cond
            [(symbol=? (child-eyes a-ftree) 'blue) true]
            [(blue-eyed-ancestor? (child-father a-ftree)) true]
            [(blue-eyed-ancestor? (child-mother a-ftree)) true]
            [else false])]))

;; blue-eyed-ancestor? : ftn  ->  boolean
;; to determine whether a-ftree contains a child
;; structure with 'blue in the eyes field
;; version 2: using an or-expression
(define (blue-eyed-ancestor? a-ftree)
  (cond
    [(empty? a-ftree) false]
    [else (or (symbol=? (child-eyes a-ftree) 'blue)
              (or (blue-eyed-ancestor? (child-father a-ftree))
                  (blue-eyed-ancestor? (child-mother a-ftree))))]))

Figure 37: Two functions for finding a blue-eyed ancestor

The function blue-eyed-ancestor? is unusual in that it uses the recursions as conditions in a cond-expressions. To understand how this works, let us evaluate an application of blue-eyed-ancestor? to Carl by hand:

  (blue-eyed-ancestor? Carl)
= (blue-eyed-ancestor? (make-child empty empty 'Carl 1926 'green))
= (cond
    [(empty? (make-child empty empty 'Carl 1926 'green)) false]
    [else
      (cond
        [(symbol=? 
           (child-eyes (make-child empty empty 'Carl 1926 'green))
           'blue)
         true]
        [(blue-eyed-ancestor?
           (child-father (make-child empty empty 'Carl 1926 'green)))
         true]
        [(blue-eyed-ancestor?
           (child-mother (make-child empty empty 'Carl 1926 'green)))
         true]
        [else false])])
= (cond
    [(symbol=? 'green 'blue) true]
    [(blue-eyed-ancestor? empty) true]
    [(blue-eyed-ancestor? empty) true]
    [else false])
= (cond
    [false true]
    [false true]
    [false true]
    [else false])
= false

The evaluation confirms that blue-eyed-ancestor? works properly for Carl, and it also illustrates how the function works.

Exercise 14.1.1. The second definition of blue-eyed-ancestor? in figure 37 uses an or-expression instead of a nested conditional. Use a hand-evaluation to show that this definition produces the same output for the inputs empty and Carl. Solution

Exercise 14.1.2. Confirm that

(blue-eyed-ancestor? empty)

evaluates to false with a hand-evaluation.

Evaluate (blue-eyed-ancestor? Gustav) by hand and with DrScheme. For the hand-evaluation, skip those steps in the evaluation that concern extractions, comparisons, and conditions involving empty?. Also reuse established equations where possible, especially the one above. Solution

Exercise 14.1.3. Develop count-persons. The function consumes a family tree node and produces the number of people in the corresponding family tree. Solution

Exercise 14.1.4. Develop the function average-age. It consumes a family tree node and the current year. It produces the average age of all people in the family tree. Solution

Exercise 14.1.5. Develop the function eye-colors, which consumes a family tree node and produces a list of all eye colors in the tree. An eye color may occur more than once in the list.

Hint: Use the Scheme operation append, which consumes two lists and produces the concatenation of the two lists. For example:

  (append (list 'a 'b 'c) (list 'd 'e)) 
= (list 'a 'b 'c 'd 'e)

We discuss the development of functions like append in section 17. Solution

Exercise 14.1.6. Suppose we need the function proper-blue-eyed-ancestor?. It is like blue-eyed-ancestor? but responds with true only when some proper ancestor, not the given one, has blue eyes.

The contract for this new function is the same as for the old one:

;; proper-blue-eyed-ancestor? : ftn  ->  boolean
;; to determine whether a-ftree has a blue-eyed ancestor
(define (proper-blue-eyed-ancestor? a-ftree) ...)

The results differ slightly.

To appreciate the difference, we need to look at Eva, who is blue-eyed, but does not have a blue-eyed ancestor. Hence

(blue-eyed-ancestor? Eva)

is true but

(proper-blue-eyed-ancestor? Eva)

is false. After all Eva is not a proper ancestor of herself.

Suppose a friend sees the purpose statement and comes up with this solution:

(define (proper-blue-eyed-ancestor? a-ftree)
  (cond
    [(empty? a-ftree) false]
    [else (or (proper-blue-eyed-ancestor? (child-father a-ftree))
              (proper-blue-eyed-ancestor? (child-mother a-ftree)))]))

What would be the result of (proper-blue-eyed-ancestor? A) for any ftn A?

Fix the friend's solution. Solution

14.2 Extended Exercise: Binary Search Trees

Programmers often work with trees, though rarely with family trees. A particularly well-known form of tree is the binary search tree. Many applications employ binary search trees to store and to retrieve information.

To be concrete, we discuss binary trees that manage information about people. In this context, a binary tree is similar to a family tree but instead of child structures it contains nodes:

(define-struct node (ssn name left right))

Here we have decided to record the social security number, the name, and two other trees. The latter are like the parent fields of family trees, though the relationship between a node and its left and right trees is not based on family relationships.

The corresponding data definition is just like the one for family trees: A binary-tree (short: BT) is either

false; or
(make-node soc pn lft rgt)
where soc is a number, pn is a symbol, and lft and rgt are BTs.

The choice of false to indicate lack of information is arbitrary. We could have chosen empty again, but false is an equally good and equally frequent choice that we should become familiar with.

Here are two binary trees:

   (make-node
     15
     'd 
     false
     (make-node 24 'i false false))

(make-node
  15
  'd
  (make-node 87 'h false false)
  false)

Figure 38 shows how we should think about such trees. The trees are drawn upside down, that is, with the root at the top and the crown of the tree at the bottom. Each circle corresponds to a node, labeled with the ssn field of a corresponding node structure. The trees omit false.

Exercise 14.2.1. Draw the two trees above in the manner of figure 38. Then develop contains-bt. The function consumes a number and a BT and determines whether the number occurs in the tree. Solution

Exercise 14.2.2. Develop search-bt. The function consumes a number n and a BT. If the tree contains a node structure whose soc field is n, the function produces the value of the pn field in that node. Otherwise, the function produces false.

Hint: Use contains-bt. Or, use boolean? to find out whether search-bt was successfully used on a subtree. We will discuss this second technique, called backtracking, in the intermezzo at the end of this part. Solution

Tree A:	Tree B:

Figure 38: A binary search tree and a binary tree

Both trees in figure 38 are binary trees but they differ in a significant way. If we read the numbers in the two trees from left to right we obtain two sequences:

The sequence for tree A is sorted in ascending order, the one for B is not.

A binary tree that has an ordered sequence of information is a BINARY SEARCH TREE. Every binary search tree is a binary tree, but not every binary tree is a binary search tree. We say that the class of binary search trees is a PROPER SUBCLASS of that of binary trees, that is, a class that does not contain all binary trees. More concretely, we formulate a condition -- or data invariant -- that distinguishes a binary search tree from a binary tree:

The BST Invariant

A binary-search-tree (short: BST) is a BT:

false is always a BST;

(make-node soc pn lft rgt) is a BST if

lft and rgt are BSTs,

all ssn numbers in lft are smaller than soc, and

all ssn numbers in rgt are larger than soc.

The second and third conditions are different from what we have seen in previous data definitions. They place an additional and unusual burden on the construction BSTs. We must inspect all numbers in these trees and ensure that they are smaller (or larger) than soc.

Exercise 14.2.3. Develop the function inorder. It consumes a binary tree and produces a list of all the ssn numbers in the tree. The list contains the numbers in the left-to-right order we have used above.

Hint: Use the Scheme operation append, which concatenates lists:

(append (list 1 2 3) (list 4) (list 5 6 7))
evaluates to 
(list 1 2 3 4 5 6 7)

What does inorder produce for a binary search tree? Solution

Looking for a specific node in a BST takes fewer steps than looking for the same node in a BT. To find out whether a BT contains a node with a specific ssn field, a function may have to look at every node of the tree. In contrast, to inspect a binary search tree requires far fewer inspections than that. Suppose we are given the BST:

(make-node 66 'a L R)

If we are looking for 66, we have found it. Now suppose we are looking for 63. Given the above node, we can focus the search on L because all nodes with ssns smaller than 66 are in L. Similarly, if we were to look for 99, we would ignore L and focus on R because all nodes with ssns larger than 66 are in R.

Exercise 14.2.4. Develop search-bst. The function consumes a number n and a BST. If the tree contains a node structure whose soc field is n, the function produces the value of the pn field in that node. Otherwise, the function produces false. The function organization must exploit the BST Invariant so that the function performs as few comparisons as necessary. Compare searching in binary search trees with searching in sorted lists (exercise 12.2.2). Solution

Building a binary tree is easy; building a binary search tree is a complicated, error-prone affair. To create a BT we combine two BTs, an ssn number and a name with make-node. The result is, by definition, a BT. To create a BST, this procedure fails because the result would typically not be a BST. For example, if one tree contains 3 and 5, and the other one contains 2 and 6, there is no way to join these two BSTs into a single binary search tree.

We can overcome this problem in (at least) two ways. First, given a list of numbers and symbols, we can determine by hand what the corresponding BST should look like and then use make-node to build it. Second, we can write a function that builds a BST from the list, one node after another.

Exercise 14.2.5. Develop the function create-bst. It consumes a BST B, a number N, and a symbol S. It produces a BST that is just like B and that in place of one false subtree contains the node structure

(make-node N S false false)

Test the function with (create-bst false 66 'a); this should create a single node. Then show that the following holds:

  (create-bst (create-bst false 66 'a) 53 'b)
= (make-node 66 
             'a
             (make-node 53 'b false false)
             false)

Finally, create tree A from figure 38 using create-bst. Solution

Exercise 14.2.6. Develop the function create-bst-from-list. It consumes a list of numbers and names; it produces a BST by repeatedly applying create-bst.

The data definition for a list of numbers and names is as follows:

A list-of-number/name is either

empty or
(cons (list ssn nom) lonn)
where ssn is a number, nom a symbol,
and lonn is a list-of-number/name.

Consider the following examples:



(define sample

  '((99 o)

    (77 l)

    (24 i)

    (10 h)

    (95 g)

    (15 d)

    (89 c)

    (29 b)

    (63 a)))



(define sample

  (list (list 99 'o)

    (list 77 'l)

    (list 24 'i)

    (list 10 'h)

    (list 95 'g)

    (list 15 'd)

    (list 89 'c)

    (list 29 'b)

    (list 63 'a)))

They are equivalent, although the left one is defined with the quote abbreviation, the right one using list. The left tree in figure 38 is the result of using create-bst-from-list on this list. Solution

14.3 Lists in Lists

The World Wide Web, or just ``the Web,'' has become the most interesting part of the Internet, a global network of computers. Roughly speaking, the Web is a collection of Web pages. Each Web page is a sequence of words, pictures, movies, audio messages, and many more things. Most important, Web pages also contain links to other Web pages.

A Web browser enables people to view Web pages. It presents a Web page as a sequence of words, images, and so on. Some of the words on a page may be underlined. Clicking on underlined words leads to a new Web page. Most modern browsers also provide a Web page composer. These are tools that help people create collections of Web pages. A composer can, among other things, search for words or replace one word with another. In short, Web pages are things that we should be able to represent on computers, and there are many functions that process Web pages.

To simplify our problem, we consider only Web pages of words and nested Web pages. One way of understanding such a page is as a sequence of words and Web pages. This informal description suggests a natural representation of Web pages as lists of symbols, which represent words, and Web pages, which represent nested Web pages. After all, we have emphasized before that a list may contain different kinds of things. Still, when we spell out this idea as a data definition, we get something rather unusual:

A Web-page (short: WP) is either

empty;
(cons s wp)
where s is a symbol and wp is a Web page; or
(cons ewp wp)
where both ewp and wp are Web pages.

This data definition differs from that of a list of symbols in that it has three clauses instead of two and that it has three self-references instead of one. Of these self-references, the one at the beginning of a constructed list is the most unusual. We refer to such Web pages as immediately embedded Web pages.

Because the data definition is unusual, we construct some examples of Web pages before we continue. Here is a plain page:

'(The TeachScheme! Project aims to improve the 
  problem-solving and organization skills of high 
  school students. It provides software and lecture 
  notes as well as exercises and solutions for teachers.)

It contains nothing but words. Here is a complex page:

'(The TeachScheme Web Page
  Here you can find: 
  (LectureNotes for Teachers)
  (Guidance for (DrScheme: a Scheme programming environment))
  (Exercise Sets)
  (Solutions for Exercises)
  For further information: write to scheme@cs)

The immediately embedded pages start with parentheses and the symbols 'LectureNotes, 'Guidance, 'Exercises, and 'Solutions. The second embedded Web page contains another embedded page, which starts with the word 'DrScheme. We say this page is embedded with respect to the entire page.

Let's develop the function size, which consumes a Web page and produces the number of words that it and all of its embedded pages contain:

;; size : WP  ->  number
;; to count the number of symbols that occur in a-wp
(define (size a-wp) ...)

The two Web pages above suggest two good examples, but they are too complex. Here are three examples, one per subclass of data:

(= (size empty)
   0)

(= (size (cons 'One empty))
   1)

(= (size (cons (cons 'One empty) empty))
   1)

The first two examples are obvious. The third one deserves a short explanation. It is a Web page that contains one immediately embedded Web page, and nothing else. The embedded Web page is the one of the second example, and it contains the one and only symbol of the third example.

To develop the template for size, let's carefully step through the design recipe. The shape of the data definition suggests that we need three cond-clauses: one for the empty page, one for a page that starts with a symbol, and one for a page that starts with an embedded Web page. While the first condition is the familiar test for empty, the second and third need closer inspection because both clauses in the data definition use cons, and a simple cons? won't distinguish between the two forms of data.

If the page is not empty, it is certainly constructed, and the distinguishing feature is the first item on the list. In other words, the second condition must use a predicate that tests the first item on a-wp:

;; size : WP  ->  number
;; to count the number of symbols that occur in a-wp
(define (size a-wp) 
  (cond
    [(empty? a-wp) ...]
    [(symbol? (first a-wp)) ... (first a-wp) ... (size (rest a-wp)) ...]
    [else ... (size (first a-wp)) ... (size (rest a-wp)) ...]))

The rest of the template is as usual. The second and third cond clauses contain selector expressions for the first item and the rest of the list. Because (rest a-wp) is always a Web page and because (first a-wp) is one in the third case, we also add a recursive call to size for these selector expressions.

Using the examples and the template, we are ready to design size: see figure 39. The differences between the definition and the template are minimal, which shows again how much of a function we can design by merely thinking systematically about the data definition for its inputs.

;; size : WP  ->  number
;; to count the number of symbols that occur in a-wp
(define (size a-wp) 
  (cond
    [(empty? a-wp) 0]
    [(symbol? (first a-wp)) (+ 1 (size (rest a-wp)))]
    [else (+ (size (first a-wp)) (size (rest a-wp)))]))

Figure 39: The definition of size for Web pages

Exercise 14.3.1. Briefly explain how to define size using its template and the examples. Test size using the examples from above.

Exercise 14.3.2. Develop the function occurs1. The function consumes a Web page and a symbol. It produces the number of times the symbol occurs in the Web page, ignoring the nested Web pages.

Develop the function occurs2. It is like occurs1, but counts all occurrences of the symbol, including in embedded Web pages. Solution

Exercise 14.3.3. Develop the function replace. The function consumes two symbols, new and old, and a Web page, a-wp. It produces a page that is structurally identical to a-wp but with all occurrences of old replaced by new. Solution

Exercise 14.3.4. People do not like deep Web trees because they require too many page switches to reach useful information. For that reason a Web page designer may also want to measure the depth of a page. A page containing only symbols has depth 0. A page with an immediately embedded page has the depth of the embedded page plus 1. If a page has several immediately embedded Web pages, its depth is the maximum of the depths of embedded Web pages plus 1. Develop depth, which consumes a Web page and computes its depth. Solution

14.4 Extended Exercise: Evaluating Scheme

DrScheme is itself a program that consists of several parts. One function checks whether the definitions and expressions we wrote down are grammatical Scheme expressions. Another one evaluates Scheme expressions. With what we have learned in this section, we can now develop simple versions of these functions.

Our first task is to agree on a data representation for Scheme programs. In other words, we must figure out how to represent a Scheme expression as a piece of Scheme data. This sounds unusual, but it is not difficult. Suppose we just want to represent numbers, variables, additions, and multiplications for a start. Clearly, numbers can stand for numbers and symbols for variables. Additions and multiplications, however, call for a class of compound data because they consist of an operator and two subexpressions.

A straightforward way to represent additions and multiplications is to use two structures: one for additions and another one for multiplications. Here are the structure definitions:

(define-struct add (left right))
(define-struct mul (left right))

Each structure has two components. One represents the left expression and the other one the right expression of the operation.

Scheme expression	representation of Scheme expression
`3`	`3`
`x`	`'x`
`(* 3 10)`	`(make-mul 3 10)`
`(+ (* 3 3) (* 4 4))`	`(make-add (make-mul 3 3) (make-mul 4 4))`
`(+ (* x x) (* y y))`	`(make-add (make-mul 'x 'x) (make-mul 'y 'y))`
`(* 1/2 (* 3 3))`	`(make-mul 1/2 (make-mul 3 3))`

Let's look at some examples:

These examples cover all cases: numbers, variables, simple expressions, and nested expressions.

Exercise 14.4.1. Provide a data definition for the representation of Scheme expressions. Then translate the following expressions into representations:

(+ 10 -10)
(+ (* 20 3) 33)
(* 3.14 (* r r))
(+ (* 9/5 c) 32)
(+ (* 3.14 (* o o)) (* 3.14 (* i i))) Solution

A Scheme evaluator is a function that consumes a representation of a Scheme expression and produces its value. For example, the expression 3 has the value 3, (+ 3 5) has the value 8, (+ (* 3 3) (* 4 4)) has the value 25, etc. Since we are ignoring definitions for now, an expression that contains a variable, for example, (+ 3 x), does not have a value; after all, we do not know what the variable stands for. In other words, our Scheme evaluator should be applied only to representations of expressions that do not contain variables. We say such expressions are numeric.

Exercise 14.4.2. Develop the function numeric?, which consumes (the representation of) a Scheme expression and determines whether it is numeric. Solution

Exercise 14.4.3. Provide a data definition for numeric expressions. Develop the function evaluate-expression. The function consumes (the representation of) a numeric Scheme expression and computes its value. When the function is tested, modify it so it consumes all kinds of Scheme expressions; the revised version raises an error when it encounters a variable. Solution

Exercise 14.4.4. When people evaluate an application (f a) they substitute a for f's parameter in f's body. More generally, when people evaluate expressions with variables, they substitute the variables with values.

Develop the function subst. The function consumes (the representation of) a variable (V), a number (N), and (the representation of) a Scheme expression. It produces a structurally equivalent expression in which all occurrences of V are substituted by N. Solution

Section 14 More Self-referential Data Definitions

14.1 Structures in Structures

14.2 Extended Exercise: Binary Search Trees

14.3 Lists in Lists

14.4 Extended Exercise: Evaluating Scheme

Section 14

More Self-referential Data Definitions