Integrating Neural, Behavioral and Developmental
Aspects of Domain-Specificity
Pascal Boyer[*] & Clark Barrett[Ý]
Forthcoming, in David Buss (Ed.),
Handbook of Evolutionary Psychology
Traditionally, psychologists have assumed that people come equipped only with a set of relatively domain-general faculties, such as ³memory² and ³reasoning,² which are applied in equal fashion to diverse problems. Recent research has begun to suggest that human expertise about the natural and social environment, including what is often called ³semantic knowledge², is best construed as consisting of different domains of competence. Each of these corresponds to recurrent evolutionary problems, is organised along specific principles, is the outcome of a specific developmental pathway and is based on specific neural structures. What we call a ³human evolved intuitive ontology² comprises a catalogue of broad domains of information, different sets of principles applied to these different domains as well as different learning rules to acquire more information about those objects. All this is intuitive in the sense that it is not the product of deliberate reflection on what the world is like.
This notion of an intuitive ontology as a motley of different domains informed by different principles was first popularised by developmental psychologists (R. Gelman, 1978; R. Gelman & Baillargeon, 1983) who proposed distinctions between physical-mechanical, biological, social and numerical competencies as based on different learning principles (Hirschfeld & Gelman, 1994). In the following decades, this way of slicing up semantic knowledge received considerable support both in developmental and neuro-psychology. For example, patients with focal brain damage were found to display selective impairment of one of these domains of knowledge to the exclusion of others (Caramazza, 1998). Neuro-imaging and cognitive neuroscience are now adding to the picture of a federation of evolved competencies that has grown out of laboratory work with children and adults.
The detection and recognition of faces by human beings provides an excellent example of a specialised system. Humans are especially good at identifying and recognising large numbers of different faces, automatically and effortlessly, from infancy. This has led many psychologists to argue that the standard human cognitive equipment includes a special system to handle faces.
Convergent
evidence for specialization comes from many different sources. In contrast to
other objects, the way facial visual information is treated is configural,
taking into account the overall arrangement and relations of parts more than
the parts themselves (Young, Hellawell, & Hay, 1987; Tanaka & Sengco,
1997). This
is strikingly demonstrated by the finding that inverting faces makes them much
more difficult to recognize, compared to objects requiring less configural
processing (Farah, Wilson, Drain,
& Tanaka, 1995)(Farah et al., 1995). Developmentally, newborn
infants quickly orient to faces rather than other stimuli (Morton & Johnson,
1991) and
recognise different individuals early (Pascalis, de Schonen, Morton, Deruelle, & et
al., 1995; Slater & Quinn, 2001)(Pascalis et al.,
1995; Slater & Quinn, 2001). Neuro-psychology has documented
many cases of prosopagnosia or selective impairment of face-recognition (Farah, 1994) where the structural processing
of objects, object-recognition and even imagination for faces can be preserved
while face recognition remains intact (Duchaine, 2000; Michelon & Biederman, 2003). Finally, neuro-imaging
studies have reliably shown a specific pattern of activation (in particular,
modulation of areas of the fusiform gyrus in the temporal lobe) during
identification or passive viewing of faces (Kanwisher, McDermott, & Chun,
1997). Specialised
systems may handle the invariant properties of faces (that allow recognition)
while other networks handle changing aspects such as gaze, smile and emotional
expression (Haxby,
Hoffman, & Gobbini, 2002).
Despite
this impressive evidence, some psychologists argue that the specificity of
face-perception is an illusion, and that human beings simply become expert
recognisers of faces by using unspecialised visual capacities. In this view,
the newborns¹ skill in the face-domain may be the result of a special interest
in conspecifics that simply makes faces more ecologically important than other
objects (Nelson,
2001). Also,
one can observe the inversion effect (Diamond & Carey, 1986; Gauthier, Williams,
Tarr, & Tanaka, 1998)(Diamond & Carey,
1986; Gauthier et al., 1998) and fusiform gyrus activation (Gauthier, Tarr,
Anderson, Skudlarski, & Gore, 1999; Tarr & Gauthier, 2000)(Gauthier
et al., 1999; Tarr & Gauthier, 2000) when testing trained experts in such
domains as birds, automobiles, dogs or even abstract geometrical shapes (see (Kanwisher, 2000) for a detailed discussion).
In our view, this demonstrates the importance of gradual development, the crucial contribution of relevant experience and that of environmental factors. These are all crucial aspects of functional specialization from evolutionary origins. These points will become clearer as we compare the face-system to other kinds of specialised inferential devices typical of human intuitive ontology.
The face-recognition system provides us with a good template for the features we encounter in other examples of domain-specific systems.
[1] Semantic knowledge comprises specialized inference systems.
It is misleading to think of semantic knowledge in terms of a declarative data-base. Most of the knowledge that drives behavior stems from tacit inferential principles, that is, specific ways of handling information.
In the case of face-recognition, configural processing seems to be a computational solution to the problem of recognizing individuals across time while tracking a surface (the face) that constantly changes in small details, with different lighting or facial expressions.
More generally, we will describe intuitive ontology as a set of computational devices, each characterised by a specific input format, by specific inferential principles and by a specific type of output (which may in turn be input to other systems). Given information that matches the input format of one particular system, activation of that system and production of the principled output are fairly automatic.
[2] Domains are not given by reality but are cognitively delimited.
Faces are not a physically distinct set of objects that would be part of ³the environment² of any organism. Faces are distinct objects only to an organism equipped with a special system that pays attention to the top front surface of conspecifics as a source of person-specific information.
Moreover,
inferential systems are focused not necessarily on objects but on particular
aspects of
objects which is why a single physical object can trigger concurrent activation
of several distinct inference systems. For instance, although faces invariably
come with a particular expression, distinct systems handle the Who-is-that? and In-What-Mood? Questionsquestions (Haxby et al., 2002).[references for these claims?]
To coin a phrase, the
human brain¹s intuitive ontology is philosophically incorrect. That is, the distinct
cognitive domains
different classes
of objects in our cognitive environment as distinguished by our intuitive
ontology
do do not
always correspond to real This implies a clear
distinction real ontological
categories different kinds of
(the real kinds of ³stuff² out there. For instance, the human mind does not draw the
line between living and non-living things, or between agents and objects, in the same way as
a scientist or a philosopher would do, as we will illustrate below.
[3] Evolutionary design principles suggest the proper domain of a system.
The domain of operation of the system is best circumscribed by evolutionary considerations. Natural selection resulted in genetic material that normally results in human brains with a specific capacity for face-recognition. But why should we describe it as being about ³faces²? It may seem more accurate to say that it is specialised in ³fine-grained, intra-categorical distinctions between grossly similar visual representations of middle-sized objects,² as some have argued. But consider this. We observe that the stimuli in question only trigger specific processing if they include a central (mouth-like) opening and two brightly contrasted (eye-like) points above that opening. We should then add these features to our description. The system would then be described as especially good at ³fine-grained, intra-categorical [] with a central opening and [etc.]². We could add more and more features to this supposedly ³neutral² description of the system.
Such semantic contortions are both redundant and misleading. Inasmuch as the only stimuli corresponding to our convoluted re-description actually encountered during evolution were conspecifics¹ faces, the re-description is redundant. But it also blurs the functional features of the system, for there are indefinitely many inferences one could extract from presentations of ³fine-grained, intra-categorical etc.² (face-like) stimuli, only some of which are relevant to distinctions between persons[1]. A description in terms of functional design provides the best explanation for the system¹s choice of what is and what is not relevant in faces.
[4] Evolutionary and actual domains do not fully overlap.
Without effortful training, the face-recognition system identifies and recognizes what it was designed to expect in its environment. As we saw above, the system may be re-trained, with more effort, to provide identification of objects other than faces, such as birds or cars. In the same way, our evolved walking, running and jumping motor routines can be re-directed to produce ballet dancing. But it is nevertheless the case that they evolved in order to move us closer to resources or shelter and away from predators. The fact that some cognitive system is specialised for a domain D does not entail that it invariably or exclusively handles D, nor does it mean that the specialization cannot be coopted for evolutionarily novel activities. It means that ancestors of the present organism encountered objects that belong to D as a stable feature of the environments where the present cognitive architecture was selected, and that handling information about such objects enhanced fitness.
There
may be indeed, there very often is a difference between the proper (evolutionary) and actual
domains
of a system (Sperber,
1994). On the one hand, the specialised system
evolved to represent and react to a , that is, between the
set of objects, facts, and properties that
the specialised system
evolved to represent and react to (for instance, flies to for the
insect-detection system in the frog¹s visual system). On the other hand, the system actually
reacts to a , and the set of objects, facts, and properties that
it does react to (e.g. flies as well as any small object zooming
across the visual field).[sentence
a bit long; perhaps break into two if possible] Proper and actual
domains are often different. Mimicry and camouflage use this non-congruence.
Non-poisonous butterflies may evolve the same bright colours as poisonous ones
to avoid predation by birds. The proper (evolved) domain of the birds¹
bright-coloured bug avoidance system is the set of poisonous insects, the
actual domain is that of all insects that look like them (Sperber, 1994).
[5] In evolution, you can only learn more if you already know more.
The face-recognition system does not need to store a description of each face in each possible orientation and lighting condition. It only stores particular parameters for an algorithm that connects each sighting of a face with a person¹s ³face-entry².
Turning to other domains, we find the same use of vast information stores in the environment, together with complex processes required to find and use that information. The lexicon of a natural language (15 to 100 thousand distinct items) is extracted through development from the utterances of other speakers. This constitutes an impressive economy for genetic transmission, as human beings can develop complete fluency without the lexicon being stored in the genome. But this external data-base is available only to a mind with complex phonological and syntactic predispositions (Pinker & Bloom, 1990; Jackendoff, 2002). In a similar way, the diversity and similarities between animal species are inferred from a huge variety of available natural cues (color, sound, shape, behavior, etc.) but that information is relevant only to a mind with a disposition for natural taxonomies (Atran, 1990).
In general, the more an inference system exploits external sources of information and stable aspects of the cognitive environments, the more computational power is required to home in on that information and derive inferences from it. There is in evolution a general coupling between the evolution of more sophisticated cognitive equipment and the use of more extensive information stored in environments.
[6] Each inferential system has a specific learning logic.
Infants pay attention to faces and quickly recognise familiar faces because they are biased to pay attention to small differences in this domain that they would ignore in other domains.
More generally, knowledge acquisition is informed by domain-specific learning principles (R. Gelman, 1990), that we will review in the following pages. Also, different systems have different developmental schedules, including ³windows² of development before or after which learning of a particular kind is difficult. These empirical findings have led developmental psychologists to cast doubt on the notion of a general, all-domain ³learning logic² that would govern cognitive development in various domains (Hirschfeld & Gelman, 1994).
[7] Development follows evolved pathways
[del: To
introduce this it may be of help to use] Consider
the notion of a ballistic
process. This is a process (e.g. kicking a ball) where one has influence over
initial conditions (e.g. direction and energy of the kick) but this influence
stops there and then, as the motion is influenced only by external factors
(e.g. friction). If brain development was one such ballistic system, the genome
would assemble a brain with a particular structure and then stop working on it.
From the end of organogenesis, the only functionally relevant brain changes
would be brought about by interaction with external information. But that is
clearly not the case. Genetic influence on many organic structures is pervasive
throughout the life span and that is true of the brain too.
We
must insist on this, because
discussions of evolved mental structures often imply that genetic
influence on brain structures is indeed ballistic, so that one can draw a line
between function that is specified at birth (supposedly the result of evolution)
and function that emerges during development (supposedly the effect of external
factors unrelated to evolution). Indeed, this seems to be the starting point of
many discussions of ³innateness² (Elman,
Bates, Johnson, & Karmiloff-Smith, 1996)(Elman et al., 1996) even though the
assumption is biologically implausible[2]. [the ballistic notion is fantastic to get the
point across; your points are excellent and I find widely misunderstood by
developmental psychologists, who assume ³evolved² must mean ballistic in the sense you
describe; you might consider expanding on this a sentence or two longer]
Evolution results not just in a specific set of adult capacities but also in a specific set of developmental pathways that lead to such capacities. This is manifest in the rather circuitous path to adult competence that children follow in many domains. For instance, young children do not build syntactic competence in a simple-to-complex manner, starting with short sentences and gradually adding elements. They start with a one-word stage, then proceed to a two-word stage, then discard that structure to adopt their language¹s phrase grammar. Such phenomena are present in other domains too, as we will discuss in the rest of this chapter.
[8] Development requires a normal environment
Face recognition probably would not develop in a context where people always changed faces or all looked identical. Language acquisition requires people interacting with a child in a fairly normal way. Mechanical-physical intelligence requires a world furnished with some functionally-specialised man-made objects. In this sense inference systems are similar to teeth and stomachs, which need digestible foods rather than intra-venous drips for normal development, or to the visual cortex that needs retinal input for proper development.
What is ³normal² about these normal features of the environment is not that they are inevitable or general (food from pills and I.V. drips may become common in the future, dangerous predators have vanished from most human beings¹ environments) but that they were generally present in the environment of evolution. Children a hundred thousand years ago were born in an environment that included natural language speakers, man-made tools, gender roles, predators, gravity, chewable food and other stable factors that made certain mental dispositions useful adaptations to those environmental features.
[9] Inferential systems orchestrate finer-grain neural structures
The example of face-recognition also shows how our understanding of domain-specificity is crucially informed by what we know about neural structures and their functional specialization. However, the example is perhaps misleading in suggesting a straightforward mapping from functional specialization onto neural specialization.
Cognitive domains correspond to recurrent fitness-related situations or problems (e.g. predators¹, competitors¹, tools¹, foraging techniques¹, mate selection¹, social exchange¹, interactions with kin¹, etc.). Should we expect to find neural structures that are specifically activated by information pertaining to one of these domains?
There are empirical and theoretical reasons to expect a rather more complex picture. Neural specificity should not be confused with easily tracked anatomical localisation. Local activation differences, salient though they have become because of the (literally) spectacular progress in neuro-imaging techniques, are not the only index of neural specialization. A variety of crucial differences in brain function consist in time-course differences (observed in ERPs), in neuro-transmitter modulation and in spike-train patterns that are not captured by fMRI studies (Posner & Raichle, 1994; Cabeza & Nyberg, 2000).
In
the current state of our knowledge of functional neuro-anatomy, it would seem
that most functionally separable neural systems are more specific than the fitness-related
domains, so that high-level domain-specificity requires the joint or
coordinated activation of different neural systems, and indeed in many cases
consists largely of the specific coordination of distinct systems. We illustrate this
point presently, when we consider the difference between living and non-living
things, or the different systems involved in detecting agency. [these points are
important, but are highly abstract; can you elaborate a bit more on why this arrangement
is likely?]
Let us start with the distinction between animal and other living beings on the one hand, and man-made objects on the other. It would seem that the human mind must include some assumptions about this difference. Indeed, developmental and cognitive evidence suggests that one can find profound differences between these two domains.
Animal species are intuitively construed in terms of species-specific ³causal essences² (Atran, 1998). That is, their typical features and behavior are interpreted as consequences of possession of an undefined, yet causally relevant quality particular to each identified species. A cat is a cat, not by virtue of having this or that external features even though that is how we recognise it but because it possesses some intrinsic and undefined quality that one only acquires by being born of cats. This assumption appears early in development (Keil, 1986) so that pre-schoolers consider the ³insides² a crucial feature of identity for animals even though they of course only use the ³outside² for identification criteria (S. A. Gelman & Wellman, 1991). Also, all animals and plants are categorised as members of a taxonomy. The specific feature here is not just that categories (e.g. snake¹) are embedded in other, more abstract ones (reptiles¹) and include more specific ones (adder¹), but also that the categories are mutually exclusive and jointly exhaustive, which is not the case in other domains. Although animal and plant classifications vary between human cultures, the hierarchical ranks (e.g. varietals, genus, family, etc.) are found in all ethno-biological systems and carry rank-specific expectations about body-plan, physiology and behavior (Atran, 1998).
By
contrast, man-made objects are principally construed in terms of their functions. Although children may
sometimes seem indifferent to the absence of some crucial functional features
in artefacts (e.g. a central screw in a pair of scissors) (Gentner & Rattermann,
1991), young
children are sensitive to such functional affordances (physical features that
support function) when they actually use tools, either familiar or novel (Kemler Nelson, 1995) and when they try to
understand the use of novel objects (Richards, Goldfarb, Richards, & Hassen, 1989)(Richards
et al., 1989). Young children construe functional features in teleological
terms, explaining for instance that scissors have sharp blades so they cut (Keil, 1986). Artefacts seem to be
construed by adults in terms of their designers¹ intentions as well as actual
use (Bloom,
1996) and pre-schoolers
too consider intentions as relevant to an artefact¹s genuine¹ function (S. A. Gelman & Bloom,
2000),
although they are more concerned with the current user¹s intentions rather than
the original creator¹s.
These differences between domains illustrate what we call inferential principles. The fact that an object is identified as either living or man-made leads to [a] paying attention to different aspects of the object; [b] producing different inferences from similar input; [c] producing categories with different internal structures (observable features index possession of an essence [animals] or presence of a human intention [artefacts]; [d] assembling the categories themselves in different ways (there is no hierarchical, nested taxonomy for artefacts, only juxtaposed kind-concepts).
Neuro-psychological
evidence supports this notion of distinct principles. Some types of brain
damage result in impaired content or retrieval of linguistic and conceptual
information in either one of the two domains. The first cases to appear in the
clinical literature showed selective impairment of the living thing domain, in
particular knowledge for the names, shapes or associative features of animals
(Warrington &
McCarthy, 1983; Sartori, Job, Miozzo, Zago, & et al., 1993; Sheridan &
Humphreys, 1993; Sartori, Coltheart, Miozzo, & Job, 1994; Moss & Tyler,
2000)(Warrington & McCarthy, 1983; Sartori et al.,
1993; Sheridan & Humphreys, 1993; Sartori et al., 1994; Moss & Tyler,
2000).
But there is also evidence for double dissociation, for the symmetrical
impairment in the artefact domain with preserved knowledge of living things (Warrington &
McCarthy, 1987; Sacchett & Humphreys, 1992). This suggests two levels of organisation
of semantic information, one comprising modality-specific or
modality-associated stores and the other comprising distinct category-specific
stores (Caramazza
& Shelton, 1998).
There may be an over-simplification in any account of semantic knowledge that remains at the level of such broad ontological categories as ³living² and ³man-made². For instance, it is not clear that children really develop domain-specific understandings at the level of the ³living thing² and ³man-made² categories. All the evidence we have concerns their inferences on medium-size animals (gradually and only partly extended to bugs, plants) and on manipulable tools with a direct, observable effect on objects (not houses or dams or lampposts).
Evolutionary
consideration would suggest that specificity of semantic knowledge will be
found at a more specific level, corresponding to situations that carry [specific] particular fitness
consequences. In evolutionary terms, one should consider not just the
categories of objects that are around an organism but also the kinds of
interaction likely to impinge on the organism¹s fitness. From that standpoint,
humans certainly do not interact with ³living things² in general. Living things
comprise plants, bacteria, and middle-sized animals including
human beings. Human beings interact very differently with predators, prey,
potential foodstuffs, competitors, parasites. Nor do humans handle ³artefacts²
in general. Man-made objects include foodstuffs, tools and weapons, buildings,
shelters, visual representations, as well as paths, dams and other modifications
of the natural environment. Tools, shelters and decorative artefacts are
associated with distinct activities and circumstances. So we should expect the
input format and activation cues of domain-specific inference systems to
reflect this fine-grained specificity.
Indeed,
this hypothesis of a set of finer-grained systems receives some support from behavioral
and developmental studies and most importantly from the available
neuro-functional evidence. A host of neuro-imaging studies, using both PET and
fMRI scans, with either word- or image-recognition or generation, has showed
that living things and artefacts trigger significantly different cortical
activations (Martin et al., 1994;
Perani et al., 1995; Spitzer, Kwong, Kennedy, & Rosen, 1995; Martin, Wiggs,
Ungerleider, & Haxby, 1996; Spitzer et al., 1998; Moore & Price, 1999;
Gerlach, Law, Gade, & Paulson, 2000)(Martin et al., 1994;
Perani et al., 1995; Spitzer et al., 1995; Martin et al., 1996; Spitzer et al.,
1998; Moore & Price, 1999; Gerlach et al., 2000). However, the results are
not really straightforward or even consistent[3]. Despite many difficulties,
what can be observed is that [a] activation in some areas (pre-motor in
particular) is modulated by artefacts more clearly than by other stimuli, [b]
there is a more diffuse involvement of temporal areas for both categories, [c]
one finds distinct activation maps rather than privileged regions.
The
naming of artefacts, or even simple viewing of pictures of artefacts, seems to
result in pre-motor activation. Viewing an artefact-like object automatically
triggers the search for (and simulation of) motor plans that involve the object
in question. Indeed, the areas activated (pre-motor cortex, anterior cingulate,
orbito-frontal) are all consistent with this interpretation of a motor plan
that is both activated and inhibited. This suggests that ³man-made object² is probably
not the right criterion here. Houses are man-made but do not afford motor plans
that include handling. If motor plans are triggered, they are about tools rather than man-made objects
in general (Moore
& Price, 1999).
A direct confirmation can be found in a study of manipulable versus non-manipulable artefacts, which finds
the classical left ventral frontal (pre-motor) activation only for the former
kind of stimuli (Mecklinger,
Gruenewald, Besson, Magnie, & Von Cramon, 2002)(Mecklinger et al.,
2002).
Neuro-imaging evidence for the animal domain is less straightforward. Some PET studies found specific activation of the lingual gyrus for animals, but this is also sometimes activated by artefact naming tasks (Perani et al., 1995). Some infero-temporal areas (BA20) are found to be exclusively activated by animal pictures (Perani et al., 1995), as are some occipital areas (left medial occipital) (Martin et al., 1996). The latter activation would only suggest higher modulation of early visual processing for animals. This is consistent with the notion, widespread in discussions of domain-specific selective impairment, that identification of different animal species requires finer-grained distinctions than that of artefacts: animals of different species (cat, dog) often share a basic Bauplan (trunk, legs, head, fur) and differ in details (shape of head, limbs, etc.), while tools (e.g. screwdriver, hammer) differ in overall structure. Animal-specific activations of the posterior temporal lobe seem to vanish when the stimuli are easier to identify (Moore & Price, 1999) which would confirm this interpretation as an effect of fine-grained, relatively effortful processing[4].
Neuro-imaging findings and developmental evidence converge in supporting the evolutionarily plausible view, that inference systems are not about ontological categories like ³man-made object² or ³living thing² but about types of situations, such as ³fast identification of potential predator-prey² or ³detection of possible use of tool or weapon².
A central assumption of human intuitive ontology is that some objects in the world are driven by internal states, in particular by goals and other representational states such as desires and beliefs. This has received great attention in developmental models of ³theory of mind². The term designates the various tacit assumptions that govern our intuitive interpretation of other agents¹ (and our own) behavior as the outcome of invisible states like beliefs and intentions.
On the basis of tasks such as the familiar ³false-belief² tasks, developmental psychologists suggested that the understanding of belief as representational and therefore possibly false did not emerge in normal children before the age of four (Perner, Leekam, & Wimmer, 1987), and did not develop in a normal way in autistic individuals (Baron-Cohen, Leslie, & Frith, 1985). More recently, other paradigms that avoided some difficulties of classical tasks have demonstrated a much earlier-developed appreciation of false belief or mistaken perception (Leslie & Polizzi, 1998).
Having
a rich explanatory psychological model of other agents¹ behavior is a clear example
of a cognitive adaptation (Povinelli & Preuss, 1995). Indeed, above a certain
degree of complexity, it is difficult to predict the behavior of a complex
organisms without taking the ³intentional stance², that is, describing it in
terms of unobservable entities like intentions and beliefs (Dennett, 1987). The difference in
predictive power is enormous even in the simplest of situations. A judgement
like ³So-and-so tends to share resources² may be based on observable
regularities (So-and-so sometimes leaves aside a share of her food for me to
pick up). By contrast, a judgement like ³So-and-so is generous² can provide a
much more reliable prediction of future behavior, by interpreting past conduct in the light
of intentions and beliefs and also knowing in what cases evidence counts or not
towards a particular generalisation (e.g. ³So-and-so did not leave me have a
share of her food yesterday but that¹s because she had not seen I was there², ³She is
generous only with her kin², ³She is
generous with friends², etc.)behaviorbehaviour [it seems
that there is an issue of the generality v. specificity of the thing inferred
in terms of predictive value that at some level can only be decided by
empirical evidence; i.e. ³generous
with kin² ³generous with
friends² might offer more
reliable prediction than ³generous.²;
this may be too distracting to get in to here, but to me it¹s in interesting
issue, since all such inferred internal states are likely also to be erroneous
in many of the predictions based on them, but we use them anyway because they give us added
predicatability, albeit far from perfect], by
interpreting past
conduct in the light of intentions and beliefs and also knowing in what cases
evidence counts or not towards a particular generalisation
(e.g. ³So-and-so did not leave me have a share of her food yesterday
because she had not seen I was there²) .
As in other cases where apparently broad domains are actually more fine-grained, we might ask whether the convenient term ³theory of mind² actually refers to a single inference system or rather a collection of more specialised systems, whose combination produces typically human ³mind-reading². The salience of one particular experimental paradigm (false-belief tasks) together with the existence of a specific pathology of mind-reading (autism) might suggest that ³theory of mind² is a unitary capacity, in many ways akin to a scientific account of mind and behavior (Gopnik & Wellmann, 1994). This also led to speculation as to which species did or did not have ³theory of mind² and at what point in evolution it appeared in humans (Povinelli & Preuss, 1995).
There are two distinct origin scenarios for our capacity to understand intentional agency, to create representations of other agents¹ behavior, beliefs and intentions. A widely accepted ³social intelligence² scenario is that higher primates evolved more and more complex intentional psychology systems to deal with social interaction. Having larger groups, more stable interaction, and more efficient co-ordination with other agents all bring out, given the right circumstances, significant adaptive benefits for the individual. But they all require finer and finer grained descriptions of other agents¹ behaviors. Social intelligence triggers an arms-race resulting from higher capacity to manipulate others and a higher capacity to resists such manipulation (Whiten, 1991). It also allows the development of coalitional alliance, based on a computation of other agents¹ commitments to a particular purpose (hunting, warfare) (Kurzban & Leary, 2001), as well as the development of friendship as an insurance policy against variance in resources (Tooby & Cosmides, 1996).
Another
possible account is that (at least some aspects of) theory of mind evolved in
the context of predator-prey interaction (Barrett, 1999), this volume). A
heightened capacity to remain undetected by either predator or prey, as well as
a better sense of how these other animals detect us, are of obvious adaptive
significance for survival
problems such as eating and avoiding being eaten. Indeed, some primatologists have
speculated that detection of predators may have been the primary context for
the evolution of agency concepts
(Van Schaik
& Van Hooff, 1983). In the archaeological record, changes towards more flexible
hunting patterns in modern Humans suggest a richer, more intentional representation
of the hunted animal (Mithen, 1996). Hunting and predator-avoidance become much better when they
are more flexible, that is, informed by contingent details about the situation
at hand, so that one does not react to all predators or prey in the same way. These
interpretations are complementary, if we remember that ³theory of mind² is
probably not a unitary capacity to produce mentalistic accounts of behavior,
but a suite of distinct capacities. Humans throughout evolution did not
interact with generic intentional agents. They interacted with predators and
prey, with other animals and with conspecifics. The latter consisted of helpful
parents and siblings, potentially helpful friends, helpless offspring,
dangerous rivals, attractive mates. Also, successful interaction in such
situations requires predictive models for general aspects of human behavior (a
model of motivation and action, as it were) as well as particular features of each
individual (a model of personality differences).[excellent section, and well argued from start to finish]
These different, situation-specific models themselves orchestrate a variety of lower-level neural capacities, all of which focus on particular features of animate agents and take some form of ³intentional stance², that is, describe these features in terms of stipulated beliefs and intentions.
One
of the crucial systems is geared at detecting animate motion. For some time now, cognitive
psychologists have been able to describe the particular physical parameters
that makes motion seem animate. This system takes as its input format [a]
particular patterns of motion(Michotte, 1963; Schlottman & Anderson, 1993; Tremoulet
& Feldman, 2000)
and delivers as output an automatic interpretation of motion as animate. The
system seems to develop early in infants (Rochat, Morgan, & Carpenter, 1997; Baldwin,
Baird, Saylor, & Clark, 2001)(Rochat, Morgan, &
Carpenter, 1997; Baldwin et al., 2001). These inferences are sensitive to
category-specific information, such as the to the kind of object that is moving
and the context (R.
Gelman, Durgin, & Kaufman, 1995; Williams, 2000).
Animates
are also detected in another way, by tracking distant reactivity. If a rock rolls down a
hill, the only objects that will react contingently to this event at a distance
without direct contact - are the animates that turn their gaze or their head
to the object, jump in surprise, run away, etc.). There is evidence that
infants can detect causation at a distance (Schlottmann & Surian, 1999). This would provide them
with a way of detecting as ³agents² those objects that react to other objects¹ motion.
In experimental settings, infants who have seen a shapeless blob reacting to
their own behavior then follow that blob¹s orienting as if the (eyeless,
faceless) blob was gazing in a particular direction (Johnson, Slaughter, & Carey,
1998). There
is also evidence that detection of reactivity modulates particular neural activity,
distinct from that involved in the interpretation of intentions and beliefs (Blakemore, Boyer,
Pachot-Clouard, Meltzoff, & Decety, 2003)(Blakemore et al.,
2003).
A
related capacity is goal-ascription. Animates act in ways that are
related to particular objects and states in a principled way (Blythe, Todd, &
Miller, 1999).
For instance, their trajectories make sense in terms of reaching a particular object of
interest and avoiding non-relevant obstacles. Infants seem to interpret the
behavior of simple objects in that way. Having seen an object take a detour in
its trajectory towards a goal to avoid an obstacle, they are surprised if the
object maintains the same trajectory once the obstacle is removed (Csibra, Gergely,
Biro, Koos, & Brockbank, 1999)(Csibra et al., 1999), an anticipation that is
also present in chimpanzees (Uller & Nichols, 2000)[5].
A very different kind of process may be required for intention-ascription. This is the process whereby we interpret some agent¹s behavior as efforts towards a particular state of affairs, e.g. seeing the banging of the hammer as a way of forcing the nail though the plank. There is evidence that this capacity develops early in children. For instance, young children imitate successful rather than unsuccessful gestures in the handling of tools (Want & Harris, 2001) and can use actors¹ apparent emotions as a clue to whether the action was successful or not (Phillips, Wellman, & Spelke, 2002). Young children can choose which parts of an action to imitate even if they did not observe the end result of the action (Meltzoff, 1995)[6]. The capacity is particular important for humans, given a history of tool-making that required sophisticated perspective-taking abilities (Tomasello, Kruger, & Ratner, 1993).
The capacity to engage in joint attention is another crucial foundation for social intelligence (Baron-Cohen, 1991). Again, we find that human capacities in this respect are distinct from those of other primates, and that they have a specific developmental schedule. The most salient development occurs between 9 and 12 months and follows a specific order: first, joint engagement (playing with an object and expecting a person to cooperate); second, communicative gestures (such as pointing); third, attention following (i.e. following people¹s gaze) and more complex skills like gaze alternation (going back and forth between the object and the person) (Carpenter, Nagell, & Tomasello, 1998). In normal adults, following gaze and attending to other agents¹ focus of attention are automatic and quasi-reflexive processes (Friesen & Kingstone, 1998). The comparative evidence shows that chimpanzees take gaze as a simple clue to where objects of interest may be, as opposed to taking it as indicative of the gazer¹s state and intentions, as all toddlers do (Povinelli & Eddy, 1996a, 1996b).
A capacity for relating facial cues to emotional states is also early developed and seems to achieve similar adult competence in human cultures (Ekman, 1999; Keltner et al., 2003). Five-month old infants react differently to displays of different emotions on a familiar face (D'Entremont & Muir, 1997). It seems that specific neural circuitry is involved in the detection and recognition of specific emotion types (Kesler/West et al., 2001), distinct from the general processing of facial identity. These networks partly overlap with those activated by the emotions themselves. For instance, the amygdala is activated both by the processing of frightening stimuli and frightened faces (Morris et al., 1998). The detection of emotional cues presents autistic patients with a difficult challenge (Adolphs, Sears, & Piven, 2001; Nijokiktjien et al., 2001), compounded by their difficulty in understanding the possible reasons for other people¹s different emotions. Williams syndrome children seem to display a dissociation between preserved processing of emotion cues and impaired understanding of goals and beliefs (³theory of mind² in the narrow sense), which would suggest that these are supported by distinct structures (Tager-Flusberg & Sullivan, 2000).
This survey is certainly not exhaustive but should indicate the variety of systems engaged in the smooth operation of higher ³theory of mind² proper, that is, the process of interpreting other agents¹ (or one¹s own) behavior in terms of beliefs, intentions, memories and inferences. Rudimentary forms of such mind-reading capacities appear very early in development (Meltzoff, 1999) and develop in fairly similar forms in normal children. Although familial circumstances can boost the development of early mind-reading (Perner, Ruffman, & Leekam, 1994), this is only a subtle influence on a developmental schedule that is quite similar in many different cultures (Avis & Harris, 1991).
These various systems are activated by very different cues, they handle different input formats and produce different types of inferences. They are also, as far as we can judge given the scarce evidence, based on distinct neural systems. Early studies identified particular areas of the medial frontal lobes as specifically engaged in ³theory-of-mind² tasks (Happe et al., 1996). There is also neuro-psychological evidence that right-hemisphere damage to these regions results in selective impairment of this capacity (Happe, Brownell, & Winner, 1999). Note, however, that in both cases we are considering false-belief tasks, that is, the explicit description of another agent¹s mistaken beliefs. Actual mind-reading requires other associated components, many of which are associated with distinct neural systems. The detection of gaze and attentional focus jointly engages STS and parietal areas (Allison, Puce, & McCarthy, 2000; Haxby et al., 2002). The detection of various other types of socially relevant information also activates distinct parts of STS (Allison et al., 2000). The identification of agents as reactive objects depends on selective engagement of superior parietal areas (Blakemore et al., 2003). The simple discrimination between animate and inanimate motion is probably related to joint specific activation of some MT/MST structures as well as STS (Grossman & Blake, 2001).
Different
kinds of encounters with intentional agents provide contexts in which different cognitive adaptations
result in increased fitness. Predator-avoidance places a particular premium on
biological motion detection and the detection of reactive objects. Social
interaction requires the early development of a capacity to read emotions on
faces, but also the later development of a sophisticated simulation of other
agents¹ thoughts.[yes;
has anyone extended this line
of argument to adults and how the adaptive problems differ from those of
children; or men v. women?
That is, simulation of, say, sexual intention, is different from other kinds of
intention, and may have specialized features that come on line only after
puberty; and be different in
men and women; men in fact have sexual intentions that differ in kind from
those of women, at least in some circumstances; this may be beyond scope of
this chapter, but you may want to allude to it if you think it¹s the case]
Dependence on hunting favours enhanced capacities for deception. The collection
of neural systems that collectively support mind-reading is the result of
several distinct evolutionary paths.
We argued that domain-specific inference systems are not so much focused on a specific kind of object (ontological category) as on a certain aspect of objects (cognitive domain). A good example of this is the set of inferential principles that help make sense of the physical properties and behavior of solid objects what is generally called an ³intuitive physics² in the psychological literature (Kaiser, Jonides, & Alexander, 1986).
The main source of information for the contents and organisation of ³intuitive physics² comes from infant studies (Spelke, 1988; Baillargeon, Kotovsky, & Needham, 1995; Spelke, 2000) that challenged the Piagetian assumption, that the development of physical intuitions followed motor development (Piaget, 1930). The studies have documented the early appearance of systematic expectations about objects as units of attention (Scholl, 2001) [7], in terms of solidity (objects collide, they do not go through one another) continuity (an object has continuous, not punctuate existence in space and time) or support (unsupported objects fall) (Spelke, 1990; Baillargeon et al., 1995). Also, a distinction between the roles of agent and patient in causal events seems accessible to infants (Leslie, 1984). Action at a distance is not intuitively admitted as relevant to physical events (Spelke, 1994).
However, the picture in terms of evolved systems may be slightly more complicated than that. The fact that many species manipulate the physical world in relatively agile and efficient ways does not necessarily entail that they do that on the basis of similar intuitive physics. In a series of ingenious experiments, Povinelli and colleagues have demonstrated systematic differences between chimpanzees and human infants, (Povinelli, 2000). The chimpanzees¹ physical assumptions are grounded in perceptual generalisations, while those of infants seem based on assumption of underlying, invisible qualities, such as force or centre of mass (Povinelli, 2000). Also, human beings interact with different kinds of physical objects. In our cognitive environment, we find inert objects (like rocks), objects that we make (food, tools) and living bodies (of conspecifics or other animals). Interaction with these is likely to pose different problems and result in different kinds of principles.
The development of coherent action-plans and motor behavior is crucial in terms of brain development the infant brain undergoes massive change in that respect, and the energy expended in motor training is enormous in the first year of life and in evolutionary terms too. The effects of such development and the underlying systems are somewhat neglected in models of ³intuitive physics². This is all the more important, as neural and behavioral evidence suggests that the development of action-oriented systems and their neural implementation may be distinct from that of intuitive physics in general. That is to say, it may well be the case that young children and adults develop, not one general intuitive physics that spans the entire ontological category of medium-sized solid objects, but two quite distinct systems: one focused on these solid objects, their statics and dynamics, and the other one focused on biological motion. An interesting possible consequence is neural systems¹ representations of physical processes are somewhat redundant, as the same physical event is represented in two distinct ways, depending on the kind of object involved.
So far, there is little direct evidence for dedicated neural systems handling representations of the physical behavior of solid objects. Many systems are involved, most of which are not exclusively activated by intuitive physical principles. There are few neuro-imaging studies of physical or mechanical violations of the type used in developmental paradigms, but the few we have find involvement of such general structures as MT/V5 (generally involved in motion processing) and parietal attentional systems (Blakemore et al., 2001).
That biological motion is a special cognitive domain is not really controversial. In the same way as configural information is specially attended to in faces and ignored in other displays, specific processes track biological motion, that is, natural movements of animate beings (people and animals) such as walking, grasping, etc. (Johansson, 1973; Ahlstrom, Blake, & Ahlstrom, 1997; Bellefeuille & Faubert, 1998). There is now some evidence that dedicated neural structures track biological motion (see review in (Decety & Grezes, 1999)), with specific activation in STS, as well as medial cerebellum, on top of the regular activation of MT-MST for coherent motion (Grezes, Costes, & Decety, 1998; Grossman & Blake, 2001). These systems trigger specific inferences about the behavior of biological objects (Heptulla-Chatterjee, Freyd, & Shiffrar, 1996).
The evidence also suggests that inferences about living bodies are grounded in motor-planning systems. Recent neuro-imaging evidence has given extensive support to the notion that perception of other agents¹ motion, own motor imagery and motor planning, as well as interpretation of goals from this motor imagery, are all tightly integrated (Blakemore & Decety, 2001). That is, perception of biological motion triggers the formation of equivalent motor plans that are subsequently blocked, probably by inhibitory influences from such structures as the orbito-frontal cortex. Now motor plans include specific expectations about the behavior of bodies and body-parts. In this sense they may be said to include a separate domain of intuitive physics.
Numerical cognition too illustrates how cognitive domains can diverge from ontological categories. Numerical processes could in principle consist in a single ³numerosity perception² device. In fact, different processes are in charge of different aspects of number in different situations.
Numerical competence is engaged in a whole variety of distinct behaviors. Children from an early age can estimate the magnitude or continuous ³numerousness² of aggregates (e.g. they prefer more sugar to less); they also estimate relative quantities of countable objects (a pile of beads is seen as ³bigger² than another); they count objects (applying a verbal counting routine, with number tags and recursive rules, to evaluate the numerosity of a set); they produce numerical inferences (e.g. adding two numbers); they retrieve stored numerical facts (e.g. the fact that two times six is twelve).
This
variety of behaviors is reflected in a diversity of underlying processes.
Against the parsimonious but misleading vision of a unitary, integrated
numerical capacity, many findings in behavioral, developmental,
neuro-psychological and neuro-imaging studies converge to suggest a variety of
representations of numbers and a variety of processes engaged in numerical
inference (Dehaene, Spelke,
Pinel, Stanescu, & Tsivkin, 1999)(Dehaene et al., 1999). In particular, one must
distinguish between a pre-verbal, analogue representation of numerosities on
the one hand and the verbal system of number-tags and counting rules on the
other (Gallistel
& Gelman, 1992)[8].
This division is confirmed by neuro-psychological and neuro-imaging studies (Dehaene et al., 1999). One system is principally modulated by exact computation, recall of mathematical facts and explicit application of rules, engaging activation of (mostly left hemisphere) inferior prefrontal cortex as well as areas typically activated in verbal tasks. The engagement of parietal networks in number estimation suggests a spatial representation of magnitudes, supported by the fact that magnitude estimation is impaired in subjects with spatial neglect, and can be disrupted by transcranial magnetic stimulation of the angular gyrus[9]. The analogue magnitude system encodes different numerosities as different points (or, less strictly, fuzzy locations) along a ³number line², an analogical and incremental representation of magnitudes. The other network is engaged in approximation tasks and comparisons, activating bilateral inferior parietal cortex.
The distinction between systems is also relevant to development of the domain. To produce numerical inferences, children need to integrate the representations delivered by the two different systems. The first one is the representation of numerosity provided by magnitude estimation. The second one is the representation of object identity. Individuated objects allow inferences such as (1-1=0) or (2-12) which are observed in infants in dishabituation studies (Wynn, 1992, 2002). The acquisition process requires a systematic mapping or correspondence between two distinct representations of the objects of a collection (R. Gelman & Meck, 1992).