The past several posts we have focused on the perceptual system: how sensory inputs are streamed into a multitude of core attributes, how these attributes change over time, how attributes are bundled into objects and placed in a space, and how pairs of objects are modeled in spatial relationships. But the perceptual system exists to serve an agent that is structured to act in the world, and it is action, not perception, that satisfies the agent's motivational drives. In this post, we will look at the basic structure of human actions to build up towards a complete model that joins perception and behavior.
The human motor system is built up hierarchically, and most levels of this hierarchy will not concern us. At a basic level, neural systems are structured in an arc between stimulus and response. Sensory neurons throughout the body provide the stimulus by firing under certain conditions, such as when a hair moves, or when heat, pressure, or cold is applied, or when pheromones, hormones, or other chemicals are detected or fail to be detected, or when light hits the neuron, or in a regular rhythm. In this way, we can sense temperature, touch, proximity, hunger, thirst, odor, and more. Motor neurons provide the response by tensing and relaxing muscles or closing valves, and so on. In between sensory and motor neurons lie the ganglia, clusters of decision-making neurons that connect sensory neurons to motor neurons. These ganglia are stacked in hierarchies joining stimulus to response in progressively wider arcs, integrating ever more stimulus and regulating ever wider responses, until they culminate in the series of overgrown ganglia we call the brain.
Not all stimulus-response arcs pass through the motor cortex. The unconscious actions that we call reflexes, as when the doctor taps the knee and the leg moves, are controlled by ganglia far from the brain. Other operations, such as heart rhythms, blood pressure, or digestion are regulated by the brainstem. There are sensorimotor loops that enter the brain but bypass the neocortex. All sensory pathways except odor pass through the thalamus on the dorsal (lower) side of the brain. The thalamus routes sensory inputs to the sensory cortices, but also to the hypothalamus, the ancient controller of the so-called reptilian brain, and the amygdala, which handles emotional processing and in turn connects back to the hypothalamus. These pathways enable complex motor responses that are completely unconscious; our conscious mind only perceives them through the sensory percepts that they generate, as when the hair on the back of our neck stands up or when our eyes widen in surprise. One of the more elaborate reactions happens when we perceive a curling line and jump back from it as though it were a snake, a response which is regulated by through the amygdala without cortical involvement. Arachnophobes who see any spider-like shape experience a fear response even if their visual cortex correctly perceives a tangle a string rather than a spider.
Conscious or unconscious, all motor control, in fact, passes through the brain stem. From there, signals go out to control the eyes, face, and tongue. Signals for other motor system are passed from the brain stem through the cerebellum, located behind the brain stem and below the occipital lobe of the cortex. The cerebellum plays a key role of elaborating and refining motor signals based on inputs from the ear for balance and from the entire muscular system for accurate position information. Without the cerebellum, people can still move any muscle in the body, but these movements are less stable and somewhat off balance.
The command of the motor system is organized somewhat like a military command. At the top, the prefrontal cortex plays the role of the General Staff, receiving heavily processed intelligence about the state of the world and issuing high-level objectives to the top generals in the premotor cortex, who in turn convert these objectives into sequenced strategies, disseminating them to the brigadiers and colonels in the motor cortex, organized in their part-focused departments in accordance with the motor homunculus. The colonels coordinate to convert strategies to specific tactics for each body part, issuing orders to captains with various muscle groups under their command. These orders are clarified by the majors and lieutenant colonels in the cerebellum to add detail based on exact field reports, and are further adapted to exact conditions by the ganglion captains in the spinal cord before being implemented by lieutenants, sergeants and their subordinates in the muscles themselves.
Though every analogy carries inaccuracies with it, the analogy between motor control and military structure is quite apt. The higher the level of control, the more information is aggregated into a big picture at the cost of detail. Each lower level of control has better information about local conditions, but less information about the overall context. The role of a superior is to provide sufficient context and direction to enable good planning at lower levels. A good commander gives subordinates flexibility where possible and communicates in terms of objectives rather than directives. This allows subordinate commanders to make use of their superior knowledge of local conditions to accomplish their part of the strategy or tactic being implemented.
So too in the motor system there is a hierarchical organization in which higher levels operate on idealized and stylized plans with abstract objectives that are implemented in detail by lower systems. Although the motor cortex can in theory control almost every voluntary muscle independently, such independent control is not the typical mode of operation. Multiple muscles within a group tend to move together. You don't usually flex your left zygomaticus; instead you snarl, which involves not just the zygomaticus (which connects the upper lip to the side of the forehead) but also uses other muscles to scrunch the forehead and pull the eyebrows together. Try to move just your ring finger without moving any other fingers. You can train yourself to do it over time, as violinists or guitarists do, but it takes training, since most activities that use your ring finger also use the neighboring fingers as well.
Muscles are organized topographically into groups as shown in the motor homunculus below. It is considerably easier to synchronize activities within groups than across groups. There is also the left-right hemispheric divide, in which the left motor cortex controls the right side of the body and the right motor cortex controls the left. The crossing of nerves from left to right (decussation) is a quirk of evolution, possibly the result of a somatic twist in the body plan of early vertebrates. Apparently all vertebrates are asymmetrical to some degree, but the asymmetry in humans is significant, with one side of the brain (usually the left) being especially adapted for fine motor control. Left-right motor coordination requires neural signals to travel across hemispheres, which makes complex tasks such as playing an instrument or typing quickly on a keyboard difficult to learn.
https://commons.wikimedia.org/wiki/Category:Cortical_homunculus#/media/File:Motor_homunculus.svg
So how are muscles controlled? Here I apply a simple cognitive principle that language expresses conscious thought, and conscious thought, at least the left-brain version of it, is best observed through language. To expand this claim, at the highest level, common actions verbs exactly reflect the highest level of motor control. And thus, at a conscious level, we walk, run, jump, reach, grasp, take, give, hug, smile, sing, dance, beat, greet, explore, relax, and many more. I was going to say that we have to divorce these actions from the intent or cultural context, but in fact I do not think I would be correct to say so. The intent or objective should be part of the motor command. To embrace is distinct from to crush, even if both might be implemented by the same sequence of actions, because each one implies different responses to feedback. Someone who wishes to embrace might lighten their grip if the target grunts in pain, where someone who wishes to crush would tighten their grip instead.
Now many of the actions above, such as singing, dancing, giving, and greeting, require long sequences of action. Such sequences must be coordinated at the cognitive level through some sort of concept diagram or its right-brain analogue. But actions such as walking, jumping, reaching, and grasping are likely the kinds of primitives that are handled in the motor cortex. Thus we might imagine some sort of primitive task modules that implement them. These modules are typically parameterized in some way. We cannot walk without following some path. If we jump, we must jump in a certain direction and to some height. We reach towards a goal and grasp an object. In each case, a task module has many implementations that are determined by environment and intent.
Note that there are even more primitive actions, such as laughing, smiling, grimacing, and various other emotional responses. As suggested above, such actions are only partially under control of the motor cortex; we are not usually particularly good at laughing without there being some stimulus worth laughing at. We can successfully block these reactions with effort. We can exert conscious motor control to stop laughing or smiling, the mechanism for which is an inhibitory signal from the frontal cortex to the amygdala. Even these primitive actions are contingent on a stimulus, but the internal structure of these actions is different from the structure of actions controlled from the motor cortex. We will focus now on the structure of these latter actions as task modules
We have already talked at length in Subject, Verb, Object about object-orientation, but the implementation of a task module must additionally specify how the object is to be interacted with and where the action must occur in space. These details are implemented by the task module without further intervention from above unless required. Thus if I walk to the kitchen, my conscious cognitive systems are free to think about other things during the transit. The motor system handles following this well-known path and makes minor adjustments as needed without interruption. Now if I find that my children have left toys in my path for me to trip on, then my conscious attention will be drawn to the toys, and I may countermand the walk instruction to pick up the toys and clear the path with muttered imprecations.
Although you might prefer to hear otherwise, driving a car follows the same unconscious pattern. Once the motor cortex has received the command to drive, conscious control only interrupts to respond to change points or unexpected events. And thus while driving hundreds of miles on the highway, you can listen to music or audiobooks, have a conversation, or muse idly about philosophy, religion, or mathematics. All your driving module has to do is to keep you in your lane at a safe speed and a safe distance from other cars until the inevitable child asks for a bathroom break or until the car ahead suddenly slams on its brakes due to construction activities somewhere well past the horizon ahead.
Task modules are not universally available at all times and in all places. Each task module has limiting start conditions in which it can apply, and it may have ending conditions for when the task is complete. Thus we cannot walk while sitting down, drive while laying on the couch, or grasp an object out of reach. In order to initiate a task module, the start conditions must be satisfied, which can be accomplished by finding a second task module whose end conditions match the desired start conditions and whose start conditions are currently met.
The start and end conditions of a task module are therefore critical for action planning. Given a goal, a plan to achieve that goal can be formulated by working backwards from the goal and searching for a sequence of tasks such that the start conditions and end conditions of each task match and the final set of end conditions match the goal. I will illustrate with a simple example. If I become hungry while sitting at my desk, then I set a goal to obtain food. I eat food by grasping it, which sets a subgoal to bring food within reach. That means I must go to where the food is, and if the food is in the kitchen, then I can get there by walking. In order to walk, I must get up from my desk. This planning process is rather obvious, in part because it is a conscious activity. But of course the plans can become much more complicated, and a search of this form is not guaranteed to succeed.
We can imagine all of these tasks laid out in a diagram, with the available tasks on one side and the end goal on the other. In this case, task planning reduces to the problem of searching for a route from one point on a map to another. This analogy brings us back to space; as we have seen time and again, cognition in general seems to reduce to spatial reasoning in one form or another.
There is much more to say about how the conscious planning process might be implemented (say, by using a content-addressable memory populated with vector representations of start and end conditions to implement a fuzzy graph), but I reserve this topic for a later post. Instead, I now wish to look inside individual task modules to ask how they are composed and how they function.
Firstly, although the plans described above are discrete, the motor actions that implement them are continuous. When I get up from my desk, I do not stop at a standing position with both legs together before walking. Rather, I turn and begin to walk as I stand. The motor cortex takes the sequence of commands from the cognitive system and blends them smoothly together. Thus implementation of motor plans is always contextual and variable. Furthermore, one might look at the consequent action as a kind of intent. If I stand up so that I can walk, then there is no need to wait for the end condition of standing to arrive. I can transition to walking as soon as I satisfy a feasible start condition for walking. Coarticulation of phonemes in linguistics follows a similar pattern.
Primitive motor tasks can be organized in one of several ways. The task may be organized around an object or percept to be manipulated, in which case task assumes a binary structure relating a body part to an object, as discussed extensively in the last post on egocentric spatial relationships. Or, the task can be organized with respect to a path, which then involves allocentric notions of space, as discussed in Places and Maps. Examples of object-organized tasks include reaching, grasping, throwing, fleeing and approaching. Examples of path-organized tasks include walking, running, tracing, and writing. One interesting note in English is that path-organized tasks can take either a direct object that represents the path, as in to walk a trail or run a route, or prepositional phrase for the same purpose, as in to walk on a trail or run along a route. In addition to object- and path-organized tasks, there are tasks require neither objects nor paths, such as smiling or laughing.
Tasks that are organized with respect to objects can still specify a path, either implicitly, when there is only one path for the action that need not be stated, or explicitly when multiple paths are available. In general, if unstated, the path is assumed. If someone approaches a person, they usually adopt a path that puts them face-to-face with the person. To approach a person from behind specifies the path and also implies ill intent.
Specification of both path and target requires the task module to integrate allocentric and egocentric representations of space, whereas the specification of only one may does not directly require such integration. Integration may nonetheless be required. For example, to change a lightbulb in a car's headlight usually requires selection of a complex path for the hands to avoid all of the other car parts in the way.
The start and end conditions form a critical part of the task, perhaps the definitive part. Whereas features such as path, target, instrument, and any cooperating partners parameterize the task, the start and end conditions not only serve as parameters but also determine how the task enters into the motor planning process.
Let us consider the task of grasping with the hand. The start condition for a grasp is that a hand must be near the proposed object and the object must have a graspable surface, one that is appropriately sized or shaped for a grasp (either hand-sized or with a handle or neck). The end condition is that the object must be securely attached to the hand at one of its graspable surfaces. This task is object-organized and proceeds in stages by first opening the hand and rotating it so that the longer axis of the graspable surface (notice the 2-D conceptualization) is perpendicular to the fingers of the hand. Then the hand approaches the object with the fingers and thumb closing around the graspable surface and tightening sufficiently to attach object.
The performance of this task requires monitoring egocentric spatial relationships between the hand and the object to be grasped. Typically, grasping is proceeded by reaching, and the opening and rotation of hand is performed during the reach. Once the hand is open and rotated, then the monitored spatial relationships proceed from towards to around. Once the hand is around the object, then the fingers and thumb can close. The interesting observation here is that the focus of control for an egocentric task is a spatial relationship. So long as perception can extract these relationships, the task can be performed through sensory feedback on the status of that relationship. The exact motor control for accomplishing the task can be computed directly as a function of the current body pose and the spatial relationship in focus. As stated above, perception enables behavior, and the evolutionary structure of perception is precisely that which is needed to enable behavior.
In the case of a path-organized task, the same type of monitored spatial feedback is present, but with the allocentric map engaged rather than the egocentric map. Thus to walk along a path, the task module must simply choose motor controls that move the torso along the path in the same direction. Walking as a task is performed by engaging oscillating circuits at the base of the spinal cord to move the legs in sequence while balancing. The act of walking will move the torso in the direction that the person is facing. Tracking the progress in an allocentric requires turning the body slightly as walking proceeds to keep the body on the path.
An alternative conceptualization of the task of walking on a path could use egocentric body positions rather than allocentric map position by maintaining the egocentric spatial relationship that the body faces the center of the path. Even if some path-organized tasks use egocentric rather than allocentric space, the existence of this alternative does not change the core fact stated above, that task control primarily involves maintaining a specific spatial relationship. And many path-organized tasks must use the allocentric system. For example, it would be impossible to walk home when home is out of sight without the allocentric map.
Thus the performance of most task modules essentially maintains either a specific spatial relationship or a specific path through allocentric space, and sometimes both, until a given end condition is satisfied. Task modules not organized around paths or objects, such as laughing and smiling, maintain some other perceived attribute (a smile) or change process (a laugh) over some span of time. In every case, the bottom level of motor control from the perspective of the cortex and language is about maintaining perceived qualities, including qualities pertaining to motion or change, such as reaching towards an object or oscillating in cycle between left and right legs while walking.
Concluding, motor control provides the purpose of perception and is accomplished through the processing of stimulus-response loops implemented throughout the nervous system. The structure of motor control in the nervous system is hierarchical, with the highest cognitive levels engaging mostly in an abstract, stylized way. The basic tasks of the motor cortex involve maintaining the state of perceived states, processes, or relationships, especially spatial relationships. These tasks are feasible under certain conditions, which become the start conditions for the task, and they are complete when either the state under maintenance can no longer be maintained (the end condition) or when a separate perceptual goal is satisfied.
Task planning can be performed by searching for a path between the current conditions and a desired end goal in a search whose essence is not dissimilar from a traditional AI search, but with fuzzy, continuous vectors representing the range of start and end conditions. A higher-order task plan is sequence of subordinate tasks whose start and end conditions match, and whose first start condition matches the current circumstance while the final end condition matches the goal. The aspects of motor control that are directly accessible to the conscious mind and hence to language include the basic tasks, their start and end conditions, the objects they manipulate and paths they follow, the perceived states, processes, and relationships that make up the task itself, and all operations of the higher order planning system.
Next week, I will present a basic cognitive architecture that can account at least for the basic tasks. After that, I will perhaps study basic task acquisition and execution more deeply before turning to the cognitive system that superintends over the basic architecture. Thanks for reading, and please leave questions and comments below!
Speaking of hierarchical structures, perhaps perception itself has some hierarchical nature to it - i.e. attention changes in bandwidth and focus on settings and objects that are most relevant to fitness and/or fecundity. What do you think?
Very interested to read more about the conscious planning process! I'm sure you've already seen this from our friends at OpenAI (https://openai.com/blog/emergent-tool-use/) but it might be relevant to such a discussion - interesting how the RL agents figure out how to "hack" the game as part of their strategy.