"Watch What I Do"

Foreword

Alan Kay

I don't know who first made the parallel between programming a computer and using a tool, but it was certainly implicit in Jack Licklider's thoughts about "man-machine symbiosis" as he set up the ARPA IPTO research projects in the early sixties. In 1962, Ivan Sutherland's Sketchpad became the exemplar to this day for what interactive computing should be like--including having the end-user be able to reshape the tool.

The idea that programming should be set up so it could be metaphorically like writing is harder to track down, but you could see it in Cliff Shaw's JOSS from the same early period. Besides being the first real "end-user" language, and the first attempt at a really "user-friendly" interface, it included a special terminal design adapted from a high-quality IBM electric typewriter that printed in two colors in lower and upper case on drilled fanfold 8*11 paper so that the output was a direct extension of one's notebook.

The vague term "computer literacy" also surfaced in the sixties, and in its strongest sense reflected a belief that the computer was going to be more like the book than a swiss army knife. Being able to "read" and "write" in it would be as universally necessary as reading and writing became after Gutenberg. The Dynabook idea was a prime focus during this time as the kind of thing computers were going to turn into, forced by engineering possibility and sociological necessity.

The analogy to reading was the easiest to see. If "reading" is the skill to be able to understand and use messages represented as gestures in a medium whose conventions are in close agreement between writer and reader, then the equivalent of reading on a computer would require the invention of a user-interface language that could universally frame the works of many thousands of authors whose interests would range far beyond those of the interface designers. "Writing" on the other hand requires the end-users to somehow construct the same kinds of things that they had been reading--a much more difficult skill.

At first this was so analogous to designing a programming language that many "interactive" interfaces were designed--some trying to improve on JOSS' dialogue scheme, while others attempted to build artificially intelligent agents that could turn advice into generalized actions. McCarthy's "advice-taker" idea had a huge influence on everyone's ideas. The notion of "programming by example" arose--perhaps the earliest was Teitelman's PILOT system, in which he tried to build an advice taking system (using a pattern matching production system) that could recapitulate the early AI theses at MIT.

Wally Feurzig and Seymour Papert had a different notion about the place of computer "reading" and "writing": that like the reading and writing of books, it wasn't just about getting and conveying information, but the very act of learning and doing them expands one's horizons and adds new ways of thinking about the world. In other words, programming could be good for people, and thus some effort should be put into designing systems that would have pedagogical benefit for both children and adults.

These ideas resonated strongly with me, partly because of my background in music, biology and mathematics. Computer processes coordinating in time go beyond Kepler's music of the spheres to a "music of metaphysics". A very similar metaphor is that of cellular and developmental biology which goes far beyond the classical Newtonism of 19th century science to a much more involved and inherently nonlinear systems organization. To have a medium that could be read and written at this new level of complexity--a level in which many of the "gotchas" of our civilization and science reside--seemed tremendously important. And still does.

Many such considerations eventually led to the realization that "it wasn't a language, but an environment", and this led directly in the early seventies to the overlapping window and pointing interface coextensive with objects that could send messages to each other and thus model any dynamic system. But the "writing" problem still remained--in part, because it was not even clear what the writing problem was.

The interface design was strongly influenced by the multiple mentality ideas of Jerome Bruner, in which the "middle" mentality, the iconic one, was the bridge between infancy and adolescence. Inspired by a few early examples--such as Paul Rovner's AMBIT-G--we decided to concentrate our research on iconic programming. Still not knowing what it meant, we dealt with it in the traditional manner for handling very difficult problems. Namely, give them to graduate students and tell them they are easy.

The first of these was Dave Smith, and his PYGMALION became the new exemplar for what iconic programming by example might mean. A host of others from our group followed--including Alan Borning's Thinglab, Laura Gould and Bill Finzer's Programming By Rehearsal, Dan Halbert's SmallStar, and Dan Ingalls' Ariel (a later version was called FABRIK). By this time a community had formed with Henry Lieberman's TINKER taking an important new path.

Today, we have windowed interfaces everywhere, and even a number of iconic object-construction kits. We have macro capture systems of every kind, and scripting languages. But we don't have "end-user programming". Nor do we have "programming by example".

One of the problems is range. By this I mean that when we teach children English, it is not our intent to teach them a pidgin language, but to gradually reveal the whole thing: the language that Jefferson and Russell wrote in, and with a few style shifts, the language that Shakespeare wrote in. In other words we want learners of English not just to be able to accomplish simple vocational goals, but to be able to aspire to the full range of expression the language makes possible. In computer terms, the range of aspiration should extend at least to the kinds of applications purchased from professionals. By comparison, systems like HyperCard offer no more than a pidgin version of what is possible on the Macintosh. It doesn't qualify by my standards.

Programming by example adds yet another burden to that of end-user programming. The user's intent as expressed in examples is to be divined by the system and turned into a useful generality. Humans have a very large range of intents, and extreme context restrictions have to be invoked to recognize any but the simplest--note that the windows and other devices of the overlapping window interface serve to restrict context while giving the user the illusion of freedom. Many of the most useful PbyE systems use the highly restricted windowing environment to great advantage.

At some point we can expect to see large complex models of human common sense and goal structures--as predicted by McCarthy long ago and slowly being realized by Doug Lenat's CYC system--getting coupled to user interfaces whose goal it is to produce in the user a style of action that will permit most goals to be recognized and automatically completed. Whether this can actually be done without requiring the user to tell the system what the goal is remains to be seen. Humans are not all that great at recognizing and understanding each other's goals, but perhaps our ego-centeredness can be left out of the artificial system.

In any case, I think the most important issues regarding end-user programming and its subbranch of programming by example are pedagogical and ethical. There is no question that a human with a goal wants to have the sub-goals ready made and at hand. One shouldn't have to learn about Carnot cycles of internal combustion engines--or even just hand cranking it--in order to drive an automobile. And agents that can be told goals and can go off and solve them have been valuable and sought after for as long as humanity has endured.

On the other hand, it takes a very special value system for children and adults to be able to exist as learning creatures--indeed as humans at all--in the presence of an environment that does all for them. 20th century humans that don't understand the hows and whys of their technologies are not in a position to make judgments and shape futures. At some point it is necessary to understand something about thermodynamics and waiting until then to try to learn it doesn't work. Nature's rule is "use it or lose it"--most social systems that have incorporated intelligent slaves or amanuenses have "lost it". In fact most never gained it to lose. In a technopoly in which we can make just about anything we desire, and almost everything we do can be replaced with vicarious experience, we have to decide to do the activities that make us into actualized humans. We have to decide to exercise, to not eat too much fat and sugar, to learn, to read, to explore, to experiment, to make, to love, to think. In short, to exist.

Difficulties are annoying and we like to remove them. But we have to be careful to only remove the gratuitous ones. As for the others--those whose surmounting makes us grow stronger in mind and body--we have to decide to leave those in and face them.



Preface

This book grew out of a workshop on Programming by Demonstration that was held at Apple Computer in March, 1992. The workshop was an opportunity for current researchers to discuss their work with the pioneers in the field. David Smith demonstrated a HyperCard simulation of his Pygmalion system, which was the first system for programming by demonstration and the inspiration for the work that has followed. Henry Lieberman ported his classic Tinker system to the Macintosh so that he could give a live demonstration at the workshop. This was followed by classic videos of the early systems, live demonstrations of the newer systems, and open discussion on topics in the field.

The participants found the workshop to be very rewarding, largely because it had always been difficult to access the relevant papers, articles, books, and videotapes describing these systems. Given the recent widespread interest in end user programming, we felt that a larger audience could benefit from this material, so we decided to republish it in a book. From that original plan, a rather different book has resulted. Instead of simply republishing their original articles, most of the authors have either written completely new chapters or have updated and extended their articles.

Furthermore, we have included two additional sections in the book: Section II discusses particular aspects of programming by demonstration in greater detail, and Section III provides broader perspectives on the field.

The Appendices are particularly valuable: in addition to a chronology and a glossary, there is an extensive test suite which lists a wide variety of specific tasks that researchers feel are amenable to programming by demonstration. In addition to demonstrating the potential of PBD, this test suite can also serve as a standard for measuring the capabilities of a given system.

The March event was called the "Programming by Example Workshop". Several of the participants observed that first-time hearers were more likely to understand what this field was all about when it was termed "Programming by Demonstration", and we have therefore all modernized our vocabulary.

This book is not only intended for individuals who are actively working in the field of programming by demonstration. We have aimed to make this material accessible and interesting to a larger audience: students and researchers with an interest in end user programming, and individuals interested in user interface design and agent-based systems. It is not a book about machine learning or artificial intelligence. Rather, the focus is on ways to create the appropriate human-computer interaction so that end users can gain more control of their personal computers.

I would like to thank the External Research Group at Apple Computer for their generous support of the workshop, and Mark Miller and Rick LeFaivre of the Advanced Technology Group at Apple Computer for their continued and enthusiastic support of this book. I would also like to thank Yin Yin Wong for her help in designing the layout for this book. Finally, I would like to thank CE Software for their QuicKeys program, which allowed me to automate many of the dreary tasks involved in editing 30 chapters.

Allen Cypher



Introduction:
Bringing Programming to End Users

Allen Cypher

The motivation behind Programming by Demonstration is simple and compelling: if a user knows how to perform a task on the computer, that should be sufficient to create a program to perform the task. It should not be necessary to learn a programming language like C or BASIC. Instead, the user should be able to instruct the computer to "Watch what I do", and the computer should create the program that corresponds to the user's actions. This book investigates the various issues that arise in trying to make this idea practical. The first section of the book describes 18 computer implementations of Programming by Demonstration, and the second section discusses the problems and opportunities for Programming by Demonstration (PBD) in more general terms.

Why Would Users Want to Program?

The first system for Programming by Demonstration, David C. Smith's PYGMALION, was written in 1975, but it was not until the more recent emergence of a large population of personal computer "end users" that the approach began to attract widespread attention. In the 1960's, computer users were either programmers themselves, or they had programmers who wrote programs specifically for them. This meant that users had their own custom applications -- programs designed specifically for their task. Large insurance companies had their own custom accounting programs, and large businesses had custom inventory programs.

With the advent of personal computing, this all changed. Now the local dry cleaners' uses a personal computer to handle sales with SalesPoint and to prepare advertisements with MacDraw. Parents are using personal computers at home to manage their finances with Quicken, and their children are writing essays in Word. Instead of using a custom application developed by a nearby programmer for a very specific task, people use a generic application developed by a distant, unknown and unreachable programmer to handle tasks similar to theirs.

Contemporary computer users are "end users", meaning that they are at the end of the process of computer programming, far removed from the programmer. The programs they are using were not written with their particular needs in mind. The programmer who created Quicken doesn't know that you work at a second job on the weekends, or that you keep a separate checkbook for household expenses. The programmer who created Word doesn't know that your English teacher requires book titles to be underlined in your bibliographies. As a result, end users must map their activities into the capabilities of the generic applications. It is inevitable that this mapping process will involve tedious steps that could be automated if only the end user were the programmer who had created the application.

Every week or so, I record my latest credit card charges in Dollars and Sense, a home accounting program. Before I can enter any charges, I have to do the following: I select the "Edit Transactions" command from a menu. A dialog asks me for the Funding Account and the date range. I scroll through a list of about 20 accounts to find "MasterCard". Then I type in the first day of the current month as the "From" date. Had this program been written specifically for me, it would have had a button labeled "Add MasterCard Charges" that would perform all of these steps. But since this is a generic program, intended for thousands of people with varying accounting needs, an "Add MasterCard Charges" button would not make sense. As a result, I am stuck performing a whole sequence of actions instead of just one, because I am using a generic program to perform a specific task.

Figure 1. The "Preferences" dialog for MacWrite II.


End User Programming

This leaves personal computer users in an ironic situation. It is a truism that computers are good at performing repetitive activities. So why is it that we are the ones performing all of the repetition, instead of the computer? Solutions are needed which enable end users to create their own custom commands. The various techniques for achieving this goal are generically referred to as "end user programming". Note that these techniques need not be programming per se: rather, they need to achieve effects that can currently only be achieved through programming. The current approaches to end user programming can be lumped into four categories: Preferences, Scripting Languages, Macro Recorders, and Programming by Demonstration.

Preferences (see Figure 1) are pre-defined alternatives supplied by an application designer to accommodate the varying needs of several different types of users. By choosing one of the pre-defined alternatives offered as a preference, a user can get the application to respond in the way that is most appropriate to his or her working style. Preferences are necessarily restricted to situations that the application designer is able to foresee, and they cannot accommodate the highly idiosyncratic needs of individual users. For instance, there will never be a preference option to "send invoices to all California customers on the 1st of each month, and to all other customers on the 15th". Another limitation of preferences is that at some point, with an overabundance of options, any system which offered all conceivable options would end up with option sheets with hundreds of entries, and it would be untenable for users to find and select their desired choices.

Since preferences offer only fixed alternatives, they are not really general enough to be considered a form of programming.

Scripting Languages (see Figure 2) are currently a popular approach to end-user programming. A scripting language is a small, simple programming language whose vocabulary is specifically tailored to the objects and actions of a particular application domain. The hope is that such a language will not be too difficult for end users to learn. Scripting languages are indeed less daunting than the standard general-purpose programming languages. For instance, in the HyperCard application, the scripting language description for the location of the message box is "the location of the message box", while the corresponding description in Pascal is

"GlobalToLocal(messageWindow^.portBits.bounds.topLeft)".

However, many of the standard programming difficulties remain. To make the box go away, you must use the command Hide the message box. If you write Close the message box, you get the error message "Can't close that window". You can write Hide the message box or Hide "the message box", but not Hide the "message box". And of course, for all of these commands, you have to know that this object happens to be called the "message box".

on JumpToStack
put the name of this stack into NameOfStack
go to card 1 of stack Home
doMenu "New Button"
set the style of button "New Button" to roundRect
put "on mouseUp" & return into jumpScript
put "go to " & NameOfStack & return after jumpScript
put "end mouseUp" after jumpScript
set the script of button "New Button" to jumpScript
set the name of button "New Button" to "Go To " & ~
NameOfStack
choose browse tool
end JumpToStack

Figure 2. A HyperCard script.

In summary, the basic failing of scripting is that it is still programming. That is, 1) users have to learn the arcane syntax and vocabulary conventions of the language, and 2) they have to learn the standard computer science concepts of variables, loops and conditionals. For a significant number of contemporary computer users, be they history students, real estate agents, or shop owners, the hurdle of learning a scripting language is simply too high.


Figure 3. A recorded QuicKeys macro to add page numbering to a Word document. The macro consists of a menu selection followed by three mouse clicks. Executing the "Print Preview..." menu command displays a page of the document as it will appear when it is printed. The first Click selects the "Page Number" tool. The second Click places the page number in the desired location on the page. The third Click is on the "Close" button, removing the "Print Preview" display.

Macro Recorders (see Figure 3) provide users with a way to record their actions. These recorders are a basic implementation of "Watch what I do". The user issues the "Start Recording" command, performs a series of actions, and then issues the "Stop Recording" command. All of the user's actions are saved as a sequence, and the user can then invoke a "Redo" command to replay the entire sequence. Many spreadsheet programs and telecommunications programs have built-in macro recorders. For example, the spreadsheet user can automate selecting cells Al through A15 and copying them to cells E20 through E34. And the telecommunications user can automate the sequence of dialing an on-line service, typing in the login name and password, and selecting a particular bulletin board. There are also some system-wide macro recorders. KeyWatch, on IBM PC's, will detect and automate a repetitive sequence of keystrokes. QuicKeys, on the Macintosh, can automate a sequence of mouse selections, mouse drags, menu selections, and keystrokes. For instance, by recording the action of dragging the icon "November Advertisement" to the icon of a "BackUp" disk, the user can automate the process of making a backup copy of this document.

The main failing of macro recorders is that they are too literal. They replay a sequence of keystrokes and mouse clicks, hereas most repetitive activities are repetitive at a somewhat higher level of abstraction. Rather than selecting cells A1 through A15, the user may actually want to select this month's employee sales, which may now occupy cells A1 through A16 if a new employee has joined the department. After logging on to an on-line service, the user may want to download all of the new messages related to Bay Area restaurants. But the specific actions required to accomplish this download will vary as the contents of the bulletin board varies from day to day. Finally, as illustrated in Figure 4, when the user records the command Drag from Screen Location (161, 90) to Screen Location (209, 201) to make a backup of the document "May Advertisement", it may happen that a replay of that precise action instead moves the document "Phone Numbers" into the "Price List" folder, because the windows and icons on the screen are now in different locations.

Record Replay

Figure 4. Replaying a macro can have unexpected results.

Programming by Demonstration (see Figure 5) is an elaboration of the idea behind macro recorders. Once again, the user instructs the system to "Watch what I do", but with programming by demonstration, the system creates generalized programs from the recorded actions. Instead of selecting cells A1 through A15, a generalized program can select all of the rows before row "Total". Instead of downloading messages 1, 4, and 7, a generalized program can download all messages marked "unread". Instead of dragging from one screen location to another, a generalized program can make a copy of this month's Advertisement document, regardless of where its icon is located on the screen.

Furthermore, generalized programs can contain iterative loops and conditional branches. It is possible to demonstrate how to transfer one address from an old address book into a new address book and have a programming by demonstration system transfer all of the addresses in the book. It is possible for a PBD system to create a program that only transfers address cards where the state is "CA" or "California".


Figure 5. Metamouse assists the user in repositioning lines.

The greatest advantage of Programming by Demonstration over conventional programming is that it is "Programming in the User Interface" - a term coined by Dan Halbert. Conventional programming requires the programmer to map from the visual representation of objects being moved about the screen into a completely different textual representation of those actions. By Programming in the User Interface, users can refer to an action by simply performing the action, something they already know how to do. They are programming in the same environment in which they perform the actions. With conventional programming, they must learn a second, abstract, technical and alien way of referring to objects and actions.

What Do Users Want to Program?

There is a wide range of user-programming needs that can potentially be satisfied by programming by demonstration. The simplest types of activities involve making little changes in an application so that it more closely fits one's personal needs. These changes may be called tweaking, and they often correspond to setting preferences that the application designer did not foresee.


Figure 6. A perennial dialog box.

For instance, whenever I open a "package" in my mail program, the program displays a dialog box which asks me whether I want to delete the package (see Figure 6). I always want to delete it, but every time I have to click "Yes". I would like my mail program to delete packages without asking. Since the designer did not have the foresight to include an "Always delete packages?" preference, I must resign myself to this annoying "feature". I would like to be able to record the action of clicking "Yes" in the "Delete Package?" dialog box, and have this program automatically run whenever the dialog appears. This would be almost as good as not being asked at all. A macro recorder could record the action of clicking "Yes", but there is no way to automatically invoke the macro whenever this particular dialog box appears.

Probably the largest potential use for programming by demonstration is for automating repetitive activities. Users commonly have to perform iterative activities, such as renumbering a long list when a new entry is inserted in the middle, and they also have to perform periodic activities, such as backing up recently changed files. A graphic designer showed me a good example of a task that needs automating. This designer is responsible for a large computer reference book, and she had recently decided to use a different font for the figures in the book (see Figure 7). The new font was somewhat larger, so the labels -- words with a box around them -- no longer fit in their boxes. The designer had to reshape all of the boxes in more than 120 figures in the book. She would have been much happier to create a program by demonstrating how to reshape a few of the labels, and then use that program to automatically reshape the rest.


Figure 7. A small part of a figure containing text in boxes. All of the boxes containing text have to be reshaped.

The third potential use for programming by demonstration is for building mini-applications. Sometimes users' needs are unique enough that commercially available applications are not appropriate. HyperCard has been successful in addressing some of these needs, but it requires that users know how to write programs. I know a programmer who wanted an address book that would allow him to keep several addresses and phone numbers for each person -- work and home addresses, for instance. He used HyperCard to create his own custom address book (see Figure 8). It would be wonderful if PBD permitted non-programmers to create custom applications like this.

What Makes PBD Difficult?

This book is dedicated to taking advantage of the fact that users know how to perform tasks themselves. This ability should be very helpful in creating programs that perform those tasks. But there is more to a program than meets the recording. Users are quietly and imperceptibly making decisions. They are searching by eye. They are reading and understanding natural language. They are interpreting the interface in terms of their goals.


Figure 8. A custom Address book.

Inferring Intent

The main challenge confronting Programming by Demonstration is how to infer the user's intent. In order to convert a recorded action into a program to perform that action, the system needs to determine the user's intent in performing the action. When the program is executed in the future, the context will be somewhat different, and it will be necessary to perform the action that is the equivalent of the recorded action in this new context. The process of inferring a user's intent in selecting a particular object has been described by Dan Halbert as creating a "data description".

When the user selects Re: meeting next week in the mail message in Figure 9, the user's intent could be to 1) select the subject of the message, 2) select the first four words of the subject, 3) select all but the first word of the third line, 4) select any subject beginning with Re:, 5) select all subjects related to meeting next week, and so on.

If this text is a memo written in a word processor, the application will have no special knowledge about memos. When the user selects Re: meeting next week, there is no indication that the preceding word, which was not selected or referenced in any way, is important in inferring the intent behind the selection. A word processor will not have knowledge that Subject: is an important part of a memo, or that Re: indicates a memo in response to a previous memo. The selection will most likely be interpreted as "the 2nd through last words of line 3", or possibly even "characters 34 through 54."

Given the paucity of information in a recording of user actions, how can a PBD system get the additional information it needs to make the correct generalizations? The systems described in this book will present a variety of solutions to this problem. For instance, SmallStar has the user select from a fixed list of alternatives, Eager compares multiple examples, Turvy lets the user point to relevant information, and Peridot asks the user to verify its interpretations.

In addition to inferring data descriptions, the other main type of inference that PBD systems must make is about flow of control, since repetitive activities often include special cases that must be handled differently. This presents two complications for PBD systems. First, a single recording can only show one of the multiple paths of action, so branching flow of control will necessarily require the user to redo the activity a number of times. Second, the process that users go through in deciding which branch to perform is almost always a hidden mental process, and it is difficult to acquire information about how that decision is made.


Figure 9. Inferring intent in a selection

The systems described in this book present a variety of solutions to the problem of flow of control. For instance, Tinker has users write programming language expressions to determine which path to follow, and Metamouse infers branches automatically from multiple examples.

User Centered Systems

The authors of this book care about empowering end users. All of the systems described here approach PBD from a user-centered perspective. The common thesis that informs their work is that the success of a PBD system depends far more on the user experience of interacting with the system than it does on the induction algorithms used to create the users' programs. I hope you find the ideas presented in the coming chapters to be provocative and enlightening.


back to ... Table of Contents Watch What I Do