Papert, S. (1977). Concepts and Artificial Intelligence. In Macnamara, J. (Ed.). (1977). Language learning and thought. Academic Press.
The Katherine Nelson paper Seymour Papert refers to in this chapter follows for context (below).
By Seymour Papert
Massachusetts Institute of Technology
There are lots of things I would like to say about Katherine Nelson’s paper, particularly about the fine detail of the observations, but I shall have to pass over them in favor of the central topic of the paper, concepts and their formation. Here it strikes me as curious and disappointing that Nelson does not advert to the whole enterprise that goes under the name of artificial intelligence, because it has been so involved with concepts. So perhaps I shall spend some time pointing out the links with artificial intelligence.
First, I think that Nelson has placed her finger on the important issue in understanding linguistics and certainly the most important in the acquisition of language. Paradoxically, the tremendous development in technical and formal devices for representing language has not been accompanied by a corresponding growth in the devices for representing meanings. So, there is a problem relating language to meaning. I would like to confine myself to the meaning end of that comparison and say something about what a concept is. In doing so, I find myself somewhat hesitant lest I should find myself committed to the “concept of concept.” Nevertheless, I hope I can say something without having to deal with that issue.
What is a concept, then, and what can it be made of? Our attempts to answer this question will resemble discussions of what genes were 50 years ago. People could say a lot about genes, but they had very little idea of what they actually were. Today we know pretty well what genes are and we can define them in terms of DNA, which is probably very different from Mendel’s idea of a gene. Our knowledge of concepts is like the earlier knowledge of genes. But I can enhance my knowledge by attempting to build a model of concepts and see what sort of formal entities we will require. In this context one of the questions that can be raised is: Do they have parts? Nelson tells us they do, and in particular that the parts form two major subdivisions. One of these defines the essence of the concept, the other corresponds to the features that we use to recognize an object that exemplifies the concept. Our experience is such that the distinction is impossible to maintain. One might be able to manage it with a concept like ball, but when one tries to carry it through to more complicated concepts, it does not work. Moreover, I think it gratuitous to identify the core concept with a set of functions. How in this account could a child form a concept of the moon?
I would like to illustrate a somewhat different approach with an example from an early program which was drawn up by Patrick Winston of M.I.T. One of the interesting aspects of his program was that it could learn to a limited extent, and in particular it could learn what an arch is. Because Winston was dealing with a computer, he could not be satisfied merely to list all the properties of an arch. As an aside, I get the impression that the features in Nelson’s concepts are joined together by some sort of association. That would never do for a computer program. Let us see what Winston’s program had.
Winston’s machine lives in a world of blocks which will be familiar to those who have read Winograd’s thesis. The machine would like to learn what an arch is, and it is being taught by being told that this is an arch and this is not. I will not go into detail, but it somehow recognizes that there are three parts or blocks in an arch, and for each it creates a node in its own internal representation system. In itself this is a very important primitive operation; it is very different, for example, from anything one finds in traditional logic. What the machine does not do is to take some data and combine them to make a new object. Instead, it models the objects in the environment. In addition, the machine can say things about the objects that the nodes represent. It does this by attaching a pointer from the node to one of its concepts. In the example, it attached a pointer from each node to the concept block, and the pointer itself was labeled “is-a.” This is a special relation which can be equated with class inclusion. In drawing these pointers, the machine was saying to itself, each of the objects is a block. But there is more to say, because it also recognizes relations among blocks and it sees that one of the blocks is supported by the Other two. So, it draws a line between the appropriate blocks and labels it Supports. Now it has a first approximation to a concept of an arch.
Winston then shows it a structure made from three blocks; in it, one block rests on the other two, but the other two are touching each other. He asks the machine, and it compares the design of this structure with its concept of arch and answers “yes.” Then Winston says, “No, it’s not an arch.” The machine then thinks, and it thinks by comparing its design for the new structure with its design for the structure that was an arch. It notices that in the arch the support blocks do not touch, and so it enters another annotational link between the two supporting blocks which can be paraphrased by “must not touch.” It can continue to modify the concept with annotational links to say certain things must be so, certain things may be so, certain things must not be so, and so on.
I think you will agree that all this is different from the inert list that Nelson is proposing as a concept. The machine is fitted with a set of procedures for building nodes which are then interrelated in the appropriate manner. It does not simply make a list of facts and leave it at that. Indeed, the concept of arch will be related to a number of other concepts in such a manner that the machine’s entire conceptual system may come into play in constructing a new concept. This is an important point for which I do not see room in Nelson’s account. For example, if a child sees an apple rolling, whether he calls it a ball or not is likely to depend on whether he already has a concept of an apple. So, concepts interact in the use we make of them. If Nelson were woodenly to provide for this in her system, she would probably enter with each concept, say of an apple, “is not a ball,” “is not an orange,” etc., etc. This would, of course, be a rather bizarre way of doing it.
What has work like Winston’s done for us, and how is it an advance on what has gone before? To answer this, I would like to draw an analogy with the history of linguistics, though I fear I do not know that history well. However, the way I see it is that the older grammarians, with a good deal of success, knew how to parse a sentence, how to divide it into its constituent noun phrases and verb phrases; and they knew about nouns and verbs and the other parts of speech. In more recent times, there has been a vast growth of additional machinery which increases the analytic and processing power of grammar. Today we have deep structures and surface structures and transformations and markers, and we can describe syntax in a manner that seems beyond the scope of traditional grammar.
What in “conceptology” corresponds to traditional grammar and what has been added? To my mind, what corresponds to the older grammar is logic, the predicate calculus, and such systems. The key addition to that is the idea of a general data structure which together with an “interpretor” program brings into play an enormous power of computation. The point is subtle, because there is a formal sense in which the information represented in Nelson’s concept and Winston’s is equivalent. They are equivalent from the point of view of the essential complexity of computation. This is a metric that compares difficulty of computation independent of the mechanisms with which one has to compute with. So functions computed on the basis of Nelson’s and Winston’s representations of concept may be equivalent from the standpoint of essential computability. This does not mean that they are equivalent for psychology. It seems to me that programs, working with other data structures represent information in a form which is much more flexible, and better organized for psychological purposes than Nelson does. I might add in passing that we in AI do not attempt nearly so clear a distinction between a recognitional part and a functional part of a concept.
A computational representation is not a definition as Nelson’s is but an instrument to be employed in manipulating blocks. It carries none of the overtones of logical definitions. It does not replace the external object, it is not an abbreviation of the information in that object, it does not seek to eliminate inessential information. Quite literally, it is to be employed in manipulating objects in all their complexity in their complex environment I think this a qualitative difference which advances our understanding of concepts. It is almost impossible to represent concepts, their development, and functioning by algebraic or logical formulas. One needs a more dynamic representation along the lines suggested by Artificial Intelligence.