Mix-and-Match Creativity

Attribute-Centered Simulation of Adjectives

Mar 01, 2022

When you look at a red rose, do you see a single object (a red rose), or an object (a rose) and a property (redness)? Within the brain, objects are recognized by shape in one part of the visual cortex while color is processed in another part, leading some researchers several decades ago to propose that the brain needed a special, as yet unknown mechanism tying redness to roseness to generate the complete idea of a red rose. This question of what the unknown mechanism might be was called the binding problem, that is, the problem of how to identify when two distinct perceptions are cued by the same fact out in the world. In the case of an object and its color, the growing consensus is that the binding problem is not a hard problem after all. The brain encodes the location in the visual field of both shape and color, and when redness and roseness cooccur at the same time in the same place, then a red rose is perceived rather than a yellow rose, even when the red rose is lying on the hood of a yellow taxi.

Although the binding problem may not be such a problem, the fact remains that we use two words, red and rose, to describe a single object. Plainly, efficiency plays a role; having separate words for red roses, yellow roses, and pink roses would be wasteful. But this efficiency is not merely a choice of language, and language could override it by using a single word for an object in a particular color. Quite apart from language, the separation of color and shape occurs at the level of neurobiology as mentioned above: color is processed in one place, shape in another.

Thus the representation of a red rose in the brain has at least two portions, one to represent the shape and another to represent the color. Of course, there are more than just these two. The rose also has a scent and a size as well, and we could clarify these dimensions by adding more adjectives: a small and fragrant red rose. We could also talk about how the rose grows, as in a budding red rose, but this requires us to think about how the rose changes over time. In this post I will discuss adjectives that describe objects at a particular point in time, leaving the subject of transition and change for the next post. So let us discuss how we might represent objects at a particular point in time.

Within the field of artificial intelligence today, it is common to represent concepts by high-dimensional vectors, that is, by a long list of numbers, generated by a deep neural networks. Such a representation is called an embedding, and there are word embeddings (Figure 1(b) -- Mikolov et al., 2013), sentence embeddings (Arora et al., 2017), contextual embeddings (Devlin et al., 2018), object embeddings (Wu et al., 2017), and many more. The term embedding is misleading for mathematicians; these are not actually embeddings in the traditional, topological sense. The name was based on an early idea in data science, where each piece of data must first be translated into a vector of features, with each feature representing some aspect of the data. The idea was that data in natural datasets lie on a lower-dimensional topological manifold embedded in the high-dimensional vector space of features (Figure 1(a)). Data manifolds are still an active topic of discussion, but the embeddings generated by deep neural networks are actually feature vectors, not data manifolds.

The sense in which embeddings actually do embed something is as follows. Imagine we have an enumerable (countable) set of objects (words, sentences, car parts, acorns, etc.), and we assign a feature vector (list of numbers) to each object in the set such that similar objects are encoded to similar vectors, where the definition of similarity for objects depends on the task. In this context, words are usually considered similar if they are interchangeable, that is, if they tend to occur around the same words. Sentences are similar if they contain similar words. Car parts could be similar if they are interchangeable, or if they are used in the same types of cars, or if some customer tends to buy both parts at the same time, and so on. As for how acorns are similar, you'll have to ask the squirrels. In that sense, embeddings take a discrete set of concepts and translate them into a continuous feature space. The power of such feature spaces is that they are in a sense combinatorial. We can take two unrelated objects and ask what it would be like if we mixed them. One way to mix these embeddings is by averaging each entry in the vectors for the two objects. The result is a new embedding that is half-way between the sources. So, if we took embeddings for cat and dog and averaged them, we should get a cat-dog, and if we averaged the embeddings for yellow and rose we should get a yellow rose (Figure 2(a)).

As an operation on embeddings, averaging is holistic; every entry in the embedding changes freely. Another example of a holistic operation would be to take a single source embedding and make a small change to each and every element of the embedding. The resulting embedding would still be close to the source embedding, and so it would be judged as similar to it. By accumulating such small changes, one could move along any path between any two embeddings; this is what continuous space means. For the most part, such holistic operations are typically applied to embeddings in deep learning, with averaging commonly used to represent the combination of two objects.

But there is another way to combine two object embeddings, less used in deep learning but perhaps more important for understanding the brain. Imagine that we have an embedding with 8 entries, and then we break the vector up into four groups of two entries each, so that we have four banks of information, each represented by two numbers. Given two representations, we can combine them by taking three banks of information from the first and one from the second. The result is a new embedding that shares features of each one, as shown in Figure 2(b). A similar technique, called crossover is used by genetic algorithms to rapidly search high-dimensional spaces, but genetic algorithms do not usually segment representations into fixed banks and only allow crossover over these elements. Yet these fixed banks are more representative of how computer scientists and computational linguists have tended to think about data structures, as in database tables, classes, ontologies, semantic frames, and theta roles.

It is natural for humans to think at a high-level about an object as having certain properties with interchangeable values; it is unnatural to imagine simultaneous changes to all properties. If a person imagines combining a man and a bull, the result is not a creature that is everywhere part man and part bull, but rather a creature that has some body parts from the bull and some from the man, hence the minotaur with the head of bull and the body of a man, or similarly, the centaur with the legs and tail of a horse but the chest and head of a man, or the satyr with the hind legs and horns of a goat but the torso and head of a man. These fusions do not only involve choices, but can involve duplication as well, as in the centaur with the torso of both man and horse, or in the cherubim of the prophet Ezekiel with the four faces of a man, lion, ox, and eagle. The way that humans combine concepts is not holistic, being oriented instead towards decompositions into attributes.

Back to adjectives. In the case of the red rose, two separate neural regions process the color red and the shape of a rose. We might consider color to be distinguished from shape as an attribute by the fact that they are processed in different areas of the visual cortex. What other attributes are distinguished by their neural region of origin? There are many such attributes; I will refer to these as basic attributes. I cannot give an exhaustive list, but I can list several of them.

Many basic attributes come from the sensory system. In the visual cortex there are shape, color, size, number, location, and position relative to other objects. In the auditory cortex we have sound generally, subdivided at least into pitch, loudness, timbre, rhythm, and location. Taste offers dimensional distinctions of sweet, bitter, sour, salty, and savory. Touch provides temperature, pain, pressure, and location on the body. Odor offers a wide array of attributes that are difficult to categorize independently; odor can also be localized through head movement. There are additional senses for balance and body configuration. Beyond these sensory attributes, there are emotional attributes such as fear, anger, sadness, surprise, disgust, anticipation, joy, and trust (Plutchik, 1980). There are motivational attributes such as hunger, thirst, lust, sociability, and curiosity. These attributes are shown in Figure 3.

I imagine a basic object representation in the brain as a bundle of values, one for each basic attribute, including a null value that might be called quiescent to represent an absence of stimulus for an object. The external senses — vision, hearing, smell, touch — can each perceive more than one object at a time. Each of these systems has the ability to localize percepts, and these localizations are coordinated to form the allocentric notion of space. Objects are bounded off from each other by their different locations in the perceptual field and separated from background by their relevance to behavior (you can climb a mountain, pick up a stone, listen to a birdsong, etc.). Localized values for basic attributes are bound together to form an object representation.

But the words of a language do not correspond directly to object representations. Rather, the nouns of a language represent extracted commonalities among object representations. Such commonalities might be learned by cross-predicting attribute values. That is, given a shape, predict the color; given a sound, predict a shape; given an odor, predict an emotion; and so on. An object class might then be a bundle of correlated attributes that are stable and predictive across many episodes and environments; outside of this invariant bundle, the other attribute values vary within ranges. Object classes generally correspond to nouns. Thus the noun rose aggregates a set of experience characterized by a particular scent together with a tight range of shapes and a broader range of colors. Such an object class represents direct sensory experience, and so we might loosely call it a basic noun.

Within the aggregated set of object experiences labeled by the name rose, there are certain experiences that are more common than others. These experiences are precisely the ones that come to your mind when I ask you to think about a rose. In my mind, such a rose is red with an open bud, sparkling with dew, about three fingers wide, and resides on a long green stem with thorns. Each basic noun has a prototypical representation; we might think of it as the expected value of all representations of that noun. From the perspective of simulatory semantics, this prototype of a basic noun can be called its core meaning. It is the image that we mentally simulate when we think of that noun.

Of course, meaning is always contextual rather than prototypical, and with appropriate priming, a variation of the prototype can be retrieved. The ultimate version of priming is the use of an adjective, which systematically alters the prototype, usually along its known dimensions of variations. When I speak of a yellow rose, the adjective yellow systematically replaces the prototypical representation of a rose with the color yellow. As is shown in Figure 2(b), this method of combination produces a very different outcome from the averaging shown in Figure 2(a). The mind has no difficulty comprehending this pairing of noun and adjective, because yellow roses are well within the set of experiences aggregated under the name rose. A basic adjective, then, specifies the value of just one basic attribute, and the combination of a basic adjective with a basic noun adjusts the prototypical simulation of the noun in one attribute to a particular value of shape, color, location, relative position, smell, sound, pitch, emotion, or so on.

The meaning of a basic adjective is the value of the attribute that it specifies. If we imagine the overall experience of an object to be the aggregate of the neurons that fire or quiesce together at a particular location of the perceptual field, then the embodied experience of a basic adjective is just the neural activation state of one or more attributes, by which I mean a specific cortical region within which that attribute is detected or expressed.

Basic nouns represent an aggregation of experiences represented by a set of one or more prototypes. And adjective whose value that falls within this aggregate range will be easily generated and understood. In fact, adjectives will be interpreted to keep an object within this range when possible. If a speaker mentions a giant rose, the listener is more likely to imagine a large but still real rose, perhaps the size of a tennis ball. One does typically does not imagine a rose the size of a house. The adjective giant is interpreted relative to the size of typical roses, not relative to Jack's beanstalk.

When the adjective-noun pairing generates a collection of attributes that outside the range of experience, the interpretation is unclear and must be simulated as a novel experience. In cases where the adjective refers to a core attribute of a noun but with an unobserved value, this simulation is easy. A grey rose can be visualized. A stinky rose might have had the misfortune to be colocated with a scared skunk. In these two cases, roses have colors and smells that can be swapped for any value.

In cases where the adjective refers to a attribute not normally associated with the object, then its meaning must be interpreted at a cognitive rather than a sensory level, involving complex associational inferences or literary techniques such as metaphor, metonymy, or analogy. A poet might evoke a silent rose to emphasis the subtle consistency of mature love or a full-voiced rose to emphasize how the gift of a rose speaks explicitly of affection and desire. Or an irrelevant attribute may be interpreted as a metaphor for core attributes, as when one interprets a loud jacket as one that has bright colors or gaudy patterns, reinterpreting the meaning of loud from an audible feature to a visual feature.

Enough of basic adjectives and basic nouns. Nouns more generally may refer to multiple distinct types of experiences. In this case, adjectives or other nouns can be used to identify which type of experience is intended, as in river bank versus financial bank. The exact allowances that determine how nouns and adjectives modify each other is a matter of syntax, which I leave for elsewhere. But the semantic issue is that modifiers must narrow a scope many possible experiences down to an acceptable range of experiences so that the object in question can be mentally simulated. This narrowing of meaning provides the motivation for using adjectives and noun modifiers as well as many other aspects of language.

The higher cognitive system abstracts away from basic nouns by assembling ideas into successive cognitive maps. All objects ultimately rest on an embodied foundation, but the embodied nature of objects is eventually buried as abstraction cascades upon abstraction. Thus a convocation is a particular abstraction of a meeting. Like a meeting, it has a time and place of meeting, a set of attendees, and a purpose. Unlike a meeting, the purpose is more formal or ceremonial; so the convocation has a schedule and perhaps an associated ritual. A large convocation has many attendees and a correspondingly large forum. A significant convocation presumably serves an important purpose. A distant convocation is either far away in time or space.

Does a convocation have attributes like a rose did? It should be apparent from the preceding examples that it does not have its own basic attributes as such. The convocation was large because the group of people was large or the place of meeting was large. If convocation had its own basic attributes, I would not be able to clarify. Imagine saying that a rose was red because its petals were red. Though strictly true, it seems unnecessary to clarify, whereas the clarification seems pertinent when the noun is abstract.

A convocation can be understood as a concept diagram as introduced in the previous post. In that diagram are the place, time, attendees, purpose, schedule, and optional ritual. Adjectives that describe concept diagrams descend through the diagram searching for a place to land, that is, for a component of the diagram that benefits from the attribute expressed by the adjective. In the phrase large convocation, the adjective large descends to either the place, or the number of attendees, or both. The adjective significant descends either to the purpose or the social status of the attendees, and distant descends to the time or the place. If we turn the rose into a concept diagram of its parts, of its petals, stamen, stalk, and stem, then the same process of descending search can assign redness primarily to the petals as well.

The adjectives large, significant, and distant express basic attributes (significance is salience, which we perceive at an embodied level as the attraction of our attention). Adjectives are not limited to basic attributes, but can describe more abstract properties, as with the adjectives partial, masculine, quintessential, pesky, obdurate, and so on. Such adjectives cannot be simulated as direct operations on experience. Instead, they are interpreted by blending experiences through analogy and metaphor.

Consider the adjectives formal and ceremonial, used above to define a convocation as a formal or ceremonial meeting. These adjectives are denominal; that is, they are built from nouns. Their meaning is understood by taking aspects of the noun from whence they come in order to replace aspects of the noun they describe. A ceremony is a cultural concept involving a structured series of actions and movements that transforms the substance or status of an individual or object, metaphorically or literally. The ritual form often prescribes particular clothing or tools that must be used, often incorporating a requirement that such clothing or tools must be in some sense pure, that is, of special type or quality. For a meeting to be ceremonial, it must have rules that govern who says and does what, and it generally results in a metaphorical transformation of some kind. Thus a graduation ceremony, also known as a convocation, transforms students into graduates through the symbolic process of crossing a stage and receiving a paper from an official while wearing a robe, mortarboard, and tassel, which is moved from one side of the head to the other at the end of the ceremony, representing the completed transformation. This is indeed a ceremonial meeting in that it blends aspects of a ceremony with those of a meeting.

So in abstract adjectives we see the same process as with basic adjectives, whereby entire subcomponents of a default experience are systematically replaced without disturbing other subcomponents of the experience. Notice that in the more abstract adjectives, more than one attribute of an experience may be replaced, and the method of replacement may be ambiguous and may require choices to be made by the listener. But what abstract and basic adjectives have in common is that both blend experiences together to make a synthetic experience.

The blending process for partial replacement of components is not limited to adjectives; you may have noticed the phrase graduation ceremony above, which is a sequence of nouns but blends together in the same way as a noun and adjective do. The syntactic expression of this blending process varies from language to language. In Chinese and Biblical Hebrew, adjectives can also serve as main verbs in many cases. In German and Classical Sanskrit, nouns can be combined to form compound words on the fly. In French, there is a tendency to use prepositions to blend nouns (esp. à and de), much as we use of in English. Chinese, like English, frequently juxtaposes nouns to create blended concepts. Most languages have a set of suffixes or particles that are appended or prepended to nouns and verbs to form new adjectives, such as -y, -ful, -er, -ing, -able, -like, and -ish in English.

The mechanisms of blending are distinct from the syntax used to express the blends; these mechanisms will form the basis of several future posts, and so I defer that topic for the time being except to say that the meaning of an abstract lexical adjective can often be tied to a systematic type of blending, just as the polysemous term ceremonial can emphasize that a process either has a ritualized form or is an inauthentic display. In the first case, it targets the form of the actions and in the second case it targets the effect of the action. In either case the target of the adjective and its semantic effects are systematic.

Formal definitions provide an extreme case of this systematicity. Define a splarg as consisting of a single roglut and at least one bartlish, and then say that the splarg is flubulous if it has exactly one bartlish. In so doing, I have used language to create in your mind a new, unordered concept diagram with two elements, a roglut and a group of bartlishes. I have also created a new adjective, flubulous, that applies to the number of bartlishes in a splarg. It may take a few repetitions of this paragraph for you to assimilate the information, but ultimately you will have no trouble doing so. You will then notice that flubulous is a basic adjective, and that as defined it descends through the concept diagram of splarg in order to apply to the number of bartlishes in the splarg. This example may seem ridiculous, but it is not so different from the methods of formal mathematics, science, philosophy, and computer science that have given us words like linear, conductive, qualitative, and Turing-equivalent.

Finally, you may have noticed that adjective-noun pairings are almost complete simulations in themselves. No additional elements are required to turn the red rose into a mental picture, at least, if you ignore the extra word the that I snuck in. And in fact we can turn it into a sentence, as in the rose is red. This kind of sentence uses the linking verb is, which in this instance communicates only slightly more information than the rose is red, namely, that the rose is actually red right now as opposed to being red at some other time or in some other situation. Unlike sentences in the subject-verb-object paradigm previously discussed, a simulation of this sentence has the interesting property that it only engages the perceptual system, leaving the behavioral system out of the picture and hence needing no agent either. Readers familiar with grammar will recognize the subject of intransitive as opposed to transitive verbs.

To sum up, the mind processes many attributes from its perceptual field in parallel and, I claim, aggregates them into object representations using shared location. Object classes are extracted from these representations, probably based on statistical correlations and shared response to behaviors, and nouns can serve as labels for these classes. Each class defaults to a context-sensitive prototypical experience, and adjectives replace the value of one or more of the base attributes from the default. Abstract nouns refer to concept diagrams, and an adjective descends through the concept diagram searching for one or more components of the diagram to which it can be applied. Complex adjectives and nouns modify other nouns through a systematic blending process whose mechanisms we will eventually study. A particular instance of an object representation is a complete idea and can be encoded in a sentence whose meaning is a simulation of activity in or from the perceptual system.

In the next post, I will address the topic of intransitive verbs more generally, distinguishing linking verbs, observational processes, and actions that require no object. Until then, I look forward to your comments and discussion!

References

Text within this block will maintain its original spacing when published

T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean
    Distributed representations of words and phrases and their compositionality
    NIPS, 2013 https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf

S. Arora, Y. Liang, T. Ma
    A Simple but Tough to Beat Baseline for Sentence Embeddings
    ICLR, 2017
    https://openreview.net/pdf?id=SyK00v5xx
    
J. Devlin, M. Chang, K. Lee, K. Toutanova
    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    Google, 2018
    https://arxiv.org/abs/1810.04805
    
L. Wu, A. Fisch, S. Chopra, K. Adams, A. Bordes, J. Weston
    StarSpace: Embed All The Things!
    FAIR, 2017
    https://arxiv.org/abs/1709.03856

R. Plutchik
    A general psychoevolutionary theory of emotion. 
    In R. Plutchik & H. Kellerman (Eds.), Emotion: Theory, research and experience, 
    Theories of emotion (Vol. 1, pp. 3–33). 
    New York: Academic Press, 1980

Jason

Mar 1, 2022

The idea of parallel processing brings to mind the idea of 'situational awareness', of which the hippocampus is an important component per last week's post. Such awareness will focus on the red rose as it is given or received in context, or fall into the background as the flower in the vase becomes peripheral in the observer's vision where color is no longer perceived.

If we presume the mind can process 7 bits of information at any given time (the total possible bandwidth of our situational awareness), perhaps attention is the relative accounting of this bandwidth on any particular thing at a given time. Looking at a rose in full focus utilizes all 7 bits, smell, orientation, color, feeling, etc etc ... as the rose is pushed into the periphery, perhaps it blends together with the wall, table, and other 'background' compressing into less than a single bit.

Perhaps such attention mechanisms are at play as a simulation when engaged in an experience of language to facilitate fuller, deeper understanding.

Expand full comment

1 reply by Alan J Lockett

1 more comment...

Embodied Language and Cognition

Discussion about this post