Evolved Intuitive Ontology:

Integrating Neural, Behavioral and Developmental

Aspects of Domain-Specificity

 

Pascal Boyer[*] & Clark Barrett[Ý]

 

Forthcoming, in David Buss (Ed.),

Handbook of Evolutionary Psychology

 

 

 

Traditionally, psychologists have assumed that people come equipped only with a set of relatively domain-general faculties, such as ³memory² and ³reasoning,² which are applied in equal fashion to diverse problems. Recent research has begun to suggest that human expertise about the natural and social environment, including what is often called ³semantic knowledge², is best construed as consisting of different domains of competence. Each of these corresponds to recurrent evolutionary problems, is organised along specific principles, is the outcome of a specific developmental pathway and is based on specific neural structures. What we call a ³human evolved intuitive ontology² comprises a catalogue of broad domains of information, different sets of principles applied to these different domains as well as different learning rules to acquire more information about those objects. All this is intuitive in the sense that it is not the product of deliberate reflection on what the world is like.

            This notion of an intuitive ontology as a motley of different domains informed by different principles was first popularised by developmental psychologists (R. Gelman, 1978; R. Gelman & Baillargeon, 1983) who proposed distinctions between physical-mechanical, biological, social and numerical competencies as based on different learning principles (Hirschfeld & Gelman, 1994). In the following decades, this way of slicing up semantic knowledge received considerable support both in developmental and neuro-psychology. For example, patients with focal brain damage were found to display selective impairment of one of these domains of knowledge to the exclusion of others (Caramazza, 1998). Neuro-imaging and cognitive neuroscience are now adding to the picture of a federation of evolved competencies that has grown out of laboratory work with children and adults.

 

An illustration: What is specific about faces

            The detection and recognition of faces by human beings provides an excellent example of a specialised system. Humans are especially good at identifying and recognising large numbers of different faces, automatically and effortlessly, from infancy. This has led many psychologists to argue that the standard human cognitive equipment includes a special system to handle faces.

            Convergent evidence for specialization comes from many different sources. In contrast to other objects, the way facial visual information is treated is configural, taking into account the overall arrangement and relations of parts more than the parts themselves (Young, Hellawell, & Hay, 1987; Tanaka & Sengco, 1997). This is strikingly demonstrated by the finding that inverting faces makes them much more difficult to recognize, compared to objects requiring less configural processing (Farah, Wilson, Drain, & Tanaka, 1995)(Farah et al., 1995). Developmentally, newborn infants quickly orient to faces rather than other stimuli (Morton & Johnson, 1991) and recognise different individuals early (Pascalis, de Schonen, Morton, Deruelle, & et al., 1995; Slater & Quinn, 2001)(Pascalis et al., 1995; Slater & Quinn, 2001). Neuro-psychology has documented many cases of prosopagnosia or selective impairment of face-recognition (Farah, 1994) where the structural processing of objects, object-recognition and even imagination for faces can be preserved while face recognition remains intact (Duchaine, 2000; Michelon & Biederman, 2003). Finally, neuro-imaging studies have reliably shown a specific pattern of activation (in particular, modulation of areas of the fusiform gyrus in the temporal lobe) during identification or passive viewing of faces (Kanwisher, McDermott, & Chun, 1997). Specialised systems may handle the invariant properties of faces (that allow recognition) while other networks handle changing aspects such as gaze, smile and emotional expression (Haxby, Hoffman, & Gobbini, 2002).

            Despite this impressive evidence, some psychologists argue that the specificity of face-perception is an illusion, and that human beings simply become expert recognisers of faces by using unspecialised visual capacities. In this view, the newborns¹ skill in the face-domain may be the result of a special interest in conspecifics that simply makes faces more ecologically important than other objects (Nelson, 2001). Also, one can observe the inversion effect (Diamond & Carey, 1986; Gauthier, Williams, Tarr, & Tanaka, 1998)(Diamond & Carey, 1986; Gauthier et al., 1998) and fusiform gyrus activation (Gauthier, Tarr, Anderson, Skudlarski, & Gore, 1999; Tarr & Gauthier, 2000)(Gauthier et al., 1999; Tarr & Gauthier, 2000) when testing trained experts in such domains as birds, automobiles, dogs or even abstract geometrical shapes (see (Kanwisher, 2000) for a detailed discussion).

            In our view, this demonstrates the importance of gradual development, the crucial contribution of relevant experience and that of environmental factors. These are all crucial aspects of functional specialization from evolutionary origins. These points will become clearer as we compare the face-system to other kinds of specialised inferential devices typical of human intuitive ontology.

 

Features of domain-specific inference systems

            The face-recognition system provides us with a good template for the features we encounter in other examples of domain-specific systems.

 

[1] Semantic knowledge comprises specialized inference systems.

            It is misleading to think of semantic knowledge in terms of a declarative data-base. Most of the knowledge that drives behavior stems from tacit inferential principles, that is, specific ways of handling information.

            In the case of face-recognition, configural processing seems to be a computational solution to the problem of recognizing individuals across time while tracking a surface (the face) that constantly changes in small details, with different lighting or facial expressions.

            More generally, we will describe intuitive ontology as a set of computational devices, each characterised by a specific input format, by specific inferential principles and by a specific type of output (which may in turn be input to other systems). Given information that matches the input format of one particular system, activation of that system and production of the principled output are fairly automatic.

 

[2] Domains are not given by reality but are cognitively delimited.

            Faces are not a physically distinct set of objects that would be part of ³the environment² of any organism. Faces are distinct objects only to an organism equipped with a special system that pays attention to the top front surface of conspecifics as a source of person-specific information.

            Moreover, inferential systems are focused not necessarily on objects but on particular aspects of objects which is why a single physical object can trigger concurrent activation of several distinct inference systems. For instance, although faces invariably come with a particular expression, distinct systems handle the Who-is-that? and In-What-Mood? Questionsquestions (Haxby et al., 2002).[references for these claims?]

            To coin a phrase, the human brain¹s intuitive ontology is philosophically incorrect. That is, the distinct cognitive domains ­ different classes of objects in our cognitive environment as distinguished by our intuitive ontology ­ do do not always correspond to real This implies a clear distinction real ontological categories ­ different kinds of (the real kinds of ³stuff² out there. For instance, the human mind does not draw the line between living and non-living things, or between agents and objects, in the same way as a scientist or a philosopher would do, as we will illustrate below.

 

[3] Evolutionary design principles suggest the proper domain of a system.

            The domain of operation of the system is best circumscribed by evolutionary considerations. Natural selection resulted in genetic material that normally results in human brains with a specific capacity for face-recognition. But why should we describe it as being about ³faces²? It may seem more accurate to say that it is specialised in ³fine-grained, intra-categorical distinctions between grossly similar visual representations of middle-sized objects,² as some have argued. But consider this. We observe that the stimuli in question only trigger specific processing if they include a central (mouth-like) opening and two brightly contrasted (eye-like) points above that opening. We should then add these features to our description. The system would then be described as especially good at ³fine-grained, intra-categorical [Š] with a central opening and [etc.]². We could add more and more features to this supposedly ³neutral² description of the system.

            Such semantic contortions are both redundant and misleading. Inasmuch as the only stimuli corresponding to our convoluted re-description actually encountered during evolution were conspecifics¹ faces, the re-description is redundant. But it also blurs the functional features of the system, for there are indefinitely many inferences one could extract from presentations of ³fine-grained, intra-categoricalŠ etc.² (face-like) stimuli, only some of which are relevant to distinctions between persons[1]. A description in terms of functional design provides the best explanation for the system¹s choice of what is and what is not relevant in faces.

 

[4] Evolutionary and actual domains do not fully overlap.

            Without effortful training, the face-recognition system identifies and recognizes what it was designed to expect in its environment. As we saw above, the system may be re-trained, with more effort, to provide identification of objects other than faces, such as birds or cars. In the same way, our evolved walking, running and jumping motor routines can be re-directed to produce ballet dancing. But it is nevertheless the case that they evolved in order to move us closer to resources or shelter and away from predators. The fact that some cognitive system is specialised for a domain D does not entail that it invariably or exclusively handles D, nor does it mean that the specialization cannot be coopted for evolutionarily novel activities. It means that ancestors of the present organism encountered objects that belong to D as a stable feature of the environments where the present cognitive architecture was selected, and that handling information about such objects enhanced fitness.

            There may be ­ indeed, there very often is ­ a difference between the proper (evolutionary) and actual domains of a system (Sperber, 1994). On the one hand, the specialised system evolved to represent and react to a , that is, between the set of objects, facts, and properties that the specialised system evolved to represent and react to (for instance, flies to for the insect-detection system in the frog¹s visual system). On the other hand, the system actually reacts to a , and the set of objects, facts, and properties that it does react to (e.g. flies as well as any small object zooming across the visual field).[sentence a bit long; perhaps break into two if possible] Proper and actual domains are often different. Mimicry and camouflage use this non-congruence. Non-poisonous butterflies may evolve the same bright colours as poisonous ones to avoid predation by birds. The proper (evolved) domain of the birds¹ bright-coloured bug avoidance system is the set of poisonous insects, the actual domain is that of all insects that look like them (Sperber, 1994).

 

[5] In evolution, you can only learn more if you already know more.

            The face-recognition system does not need to store a description of each face in each possible orientation and lighting condition. It only stores particular parameters for an algorithm that connects each sighting of a face with a person¹s ³face-entry².

            Turning to other domains, we find the same use of vast information stores in the environment, together with complex processes required to find and use that information. The lexicon of a natural language (15 to 100 thousand distinct items) is extracted through development from the utterances of other speakers. This constitutes an impressive economy for genetic transmission, as human beings can develop complete fluency without the lexicon being stored in the genome. But this external data-base is available only to a mind with complex phonological and syntactic predispositions (Pinker & Bloom, 1990; Jackendoff, 2002). In a similar way, the diversity and similarities between animal species are inferred from a huge variety of available natural cues (color, sound, shape, behavior, etc.) but that information is relevant only to a mind with a disposition for natural taxonomies (Atran, 1990).

            In general, the more an inference system exploits external sources of information and stable aspects of the cognitive environments, the more computational power is required to home in on that information and derive inferences from it. There is in evolution a general coupling between the evolution of more sophisticated cognitive equipment and the use of more extensive information stored in environments.

[6] Each inferential system has a specific learning logic.

            Infants pay attention to faces and quickly recognise familiar faces because they are biased to pay attention to small differences in this domain that they would ignore in other domains.

            More generally, knowledge acquisition is informed by domain-specific learning principles (R. Gelman, 1990), that we will review in the following pages. Also, different systems have different developmental schedules, including ³windows² of development before or after which learning of a particular kind is difficult. These empirical findings have led developmental psychologists to cast doubt on the notion of a general, all-domain ³learning logic² that would govern cognitive development in various domains (Hirschfeld & Gelman, 1994).

 

[7] Development follows evolved pathways

            [del: To introduce this it may be of help to use] Consider the notion of a ballistic process. This is a process (e.g. kicking a ball) where one has influence over initial conditions (e.g. direction and energy of the kick) but this influence stops there and then, as the motion is influenced only by external factors (e.g. friction). If brain development was one such ballistic system, the genome would assemble a brain with a particular structure and then stop working on it. From the end of organogenesis, the only functionally relevant brain changes would be brought about by interaction with external information. But that is clearly not the case. Genetic influence on many organic structures is pervasive throughout the life span and that is true of the brain too.

            We must insist on this, because discussions of evolved mental structures often imply that genetic influence on brain structures is indeed ballistic, so that one can draw a line between function that is specified at birth (supposedly the result of evolution) and function that emerges during development (supposedly the effect of external factors unrelated to evolution). Indeed, this seems to be the starting point of many discussions of ³innateness² (Elman, Bates, Johnson, & Karmiloff-Smith, 1996)(Elman et al., 1996) even though the assumption is biologically implausible[2]. [the ballistic notion is fantastic to get the point across; your points are excellent and I find widely misunderstood by developmental psychologists, who assume ³evolved² must mean ballistic in the sense you describe; you might consider expanding on this a sentence or two longer]

            Evolution results not just in a specific set of adult capacities but also in a specific set of developmental pathways that lead to such capacities. This is manifest in the rather circuitous path to adult competence that children follow in many domains. For instance, young children do not build syntactic competence in a simple-to-complex manner, starting with short sentences and gradually adding elements. They start with a one-word stage, then proceed to a two-word stage, then discard that structure to adopt their language¹s phrase grammar. Such phenomena are present in other domains too, as we will discuss in the rest of this chapter.

 

[8] Development requires a normal environment

            Face recognition probably would not develop in a context where people always changed faces or all looked identical. Language acquisition requires people interacting with a child in a fairly normal way. Mechanical-physical intelligence requires a world furnished with some functionally-specialised man-made objects. In this sense inference systems are similar to teeth and stomachs, which need digestible foods rather than intra-venous drips for normal development, or to the visual cortex that needs retinal input for proper development.

            What is ³normal² about these normal features of the environment is not that they are inevitable or general (food from pills and I.V. drips may become common in the future, dangerous predators have vanished from most human beings¹ environments) but that they were generally present in the environment of evolution. Children a hundred thousand years ago were born in an environment that included natural language speakers, man-made tools, gender roles, predators, gravity, chewable food and other stable factors that made certain mental dispositions useful adaptations to those environmental features.

 

[9] Inferential systems orchestrate finer-grain neural structures

            The example of face-recognition also shows how our understanding of domain-specificity is crucially informed by what we know about neural structures and their functional specialization. However, the example is perhaps misleading in suggesting a straightforward mapping from functional specialization onto neural specialization.

            Cognitive domains correspond to recurrent fitness-related situations or problems (e.g. Œpredators¹, Œcompetitors¹, Œtools¹, Œforaging techniques¹, Œmate selection¹, Œsocial exchange¹, Œinteractions with kin¹, etc.). Should we expect to find neural structures that are specifically activated by information pertaining to one of these domains?

            There are empirical and theoretical reasons to expect a rather more complex picture. Neural specificity should not be confused with easily tracked anatomical localisation. Local activation differences, salient though they have become because of the (literally) spectacular progress in neuro-imaging techniques, are not the only index of neural specialization. A variety of crucial differences in brain function consist in time-course differences (observed in ERPs), in neuro-transmitter modulation and in spike-train patterns that are not captured by fMRI studies (Posner & Raichle, 1994; Cabeza & Nyberg, 2000).

            In the current state of our knowledge of functional neuro-anatomy, it would seem that most functionally separable neural systems are more specific than the fitness-related domains, so that high-level domain-specificity requires the joint or coordinated activation of different neural systems, and indeed in many cases consists largely of the specific coordination of distinct systems. We illustrate this point presently, when we consider the difference between living and non-living things, or the different systems involved in detecting agency. [these points are important, but are highly abstract; can you elaborate a bit more on why this arrangement is likely?]

 

Living vs. man-made objects: development and impairment

            Let us start with the distinction between animal and other living beings on the one hand, and man-made objects on the other. It would seem that the human mind must include some assumptions about this difference. Indeed, developmental and cognitive evidence suggests that one can find profound differences between these two domains.

            Animal species are intuitively construed in terms of species-specific ³causal essences² (Atran, 1998). That is, their typical features and behavior are interpreted as consequences of possession of an undefined, yet causally relevant quality particular to each identified species. A cat is a cat, not by virtue of having this or that external features ­ even though that is how we recognise it ­ but because it possesses some intrinsic and undefined quality that one only acquires by being born of cats. This assumption appears early in development (Keil, 1986) so that pre-schoolers consider the ³insides² a crucial feature of identity for animals even though they of course only use the ³outside² for identification criteria (S. A. Gelman & Wellman, 1991). Also, all animals and plants are categorised as members of a taxonomy. The specific feature here is not just that categories (e.g. Œsnake¹) are embedded in other, more abstract ones (Œreptiles¹) and include more specific ones (Œadder¹), but also that the categories are mutually exclusive and jointly exhaustive, which is not the case in other domains. Although animal and plant classifications vary between human cultures, the hierarchical ranks (e.g. varietals, genus, family, etc.) are found in all ethno-biological systems and carry rank-specific expectations about body-plan, physiology and behavior (Atran, 1998).

            By contrast, man-made objects are principally construed in terms of their functions. Although children may sometimes seem indifferent to the absence of some crucial functional features in artefacts (e.g. a central screw in a pair of scissors) (Gentner & Rattermann, 1991), young children are sensitive to such functional affordances (physical features that support function) when they actually use tools, either familiar or novel (Kemler Nelson, 1995) and when they try to understand the use of novel objects (Richards, Goldfarb, Richards, & Hassen, 1989)(Richards et al., 1989). Young children construe functional features in teleological terms, explaining for instance that scissors have sharp blades so they cut (Keil, 1986). Artefacts seem to be construed by adults in terms of their designers¹ intentions as well as actual use (Bloom, 1996) and pre-schoolers too consider intentions as relevant to an artefact¹s Œgenuine¹ function (S. A. Gelman & Bloom, 2000), although they are more concerned with the current user¹s intentions rather than the original creator¹s.

            These differences between domains illustrate what we call inferential principles. The fact that an object is identified as either living or man-made leads to [a] paying attention to different aspects of the object; [b] producing different inferences from similar input; [c] producing categories with different internal structures (observable features index possession of an essence [animals] or presence of a human intention [artefacts]; [d] assembling the categories themselves in different ways (there is no hierarchical, nested taxonomy for artefacts, only juxtaposed kind-concepts).

            Neuro-psychological evidence supports this notion of distinct principles. Some types of brain damage result in impaired content or retrieval of linguistic and conceptual information in either one of the two domains. The first cases to appear in the clinical literature showed selective impairment of the living thing domain, in particular knowledge for the names, shapes or associative features of animals (Warrington & McCarthy, 1983; Sartori, Job, Miozzo, Zago, & et al., 1993; Sheridan & Humphreys, 1993; Sartori, Coltheart, Miozzo, & Job, 1994; Moss & Tyler, 2000)(Warrington & McCarthy, 1983; Sartori et al., 1993; Sheridan & Humphreys, 1993; Sartori et al., 1994; Moss & Tyler, 2000). But there is also evidence for double dissociation, for the symmetrical impairment in the artefact domain with preserved knowledge of living things (Warrington & McCarthy, 1987; Sacchett & Humphreys, 1992). This suggests two levels of organisation of semantic information, one comprising modality-specific or modality-associated stores and the other comprising distinct category-specific stores (Caramazza & Shelton, 1998).

 

Living vs. man-made objects: evolved and neural domains

            There may be an over-simplification in any account of semantic knowledge that remains at the level of such broad ontological categories as ³living² and ³man-made². For instance, it is not clear that children really develop domain-specific understandings at the level of the ³living thing² and ³man-made² categories. All the evidence we have concerns their inferences on medium-size animals (gradually and only partly extended to bugs, plants) and on manipulable tools with a direct, observable effect on objects (not houses or dams or lampposts).

            Evolutionary consideration would suggest that specificity of semantic knowledge will be found at a more specific level, corresponding to situations that carry [specific] particular fitness consequences. In evolutionary terms, one should consider not just the categories of objects that are around an organism but also the kinds of interaction likely to impinge on the organism¹s fitness. From that standpoint, humans certainly do not interact with ³living things² in general. Living things comprise plants, bacteria, and middle-sized animals including human beings. Human beings interact very differently with predators, prey, potential foodstuffs, competitors, parasites. Nor do humans handle ³artefacts² in general. Man-made objects include foodstuffs, tools and weapons, buildings, shelters, visual representations, as well as paths, dams and other modifications of the natural environment. Tools, shelters and decorative artefacts are associated with distinct activities and circumstances. So we should expect the input format and activation cues of domain-specific inference systems to reflect this fine-grained specificity.

            Indeed, this hypothesis of a set of finer-grained systems receives some support from behavioral and developmental studies and most importantly from the available neuro-functional evidence. A host of neuro-imaging studies, using both PET and fMRI scans, with either word- or image-recognition or generation, has showed that living things and artefacts trigger significantly different cortical activations (Martin et al., 1994; Perani et al., 1995; Spitzer, Kwong, Kennedy, & Rosen, 1995; Martin, Wiggs, Ungerleider, & Haxby, 1996; Spitzer et al., 1998; Moore & Price, 1999; Gerlach, Law, Gade, & Paulson, 2000)(Martin et al., 1994; Perani et al., 1995; Spitzer et al., 1995; Martin et al., 1996; Spitzer et al., 1998; Moore & Price, 1999; Gerlach et al., 2000). However, the results are not really straightforward or even consistent[3]. Despite many difficulties, what can be observed is that [a] activation in some areas (pre-motor in particular) is modulated by artefacts more clearly than by other stimuli, [b] there is a more diffuse involvement of temporal areas for both categories, [c] one finds distinct activation maps rather than privileged regions.

            The naming of artefacts, or even simple viewing of pictures of artefacts, seems to result in pre-motor activation. Viewing an artefact-like object automatically triggers the search for (and simulation of) motor plans that involve the object in question. Indeed, the areas activated (pre-motor cortex, anterior cingulate, orbito-frontal) are all consistent with this interpretation of a motor plan that is both activated and inhibited. This suggests that ³man-made object² is probably not the right criterion here. Houses are man-made but do not afford motor plans that include handling. If motor plans are triggered, they are about tools rather than man-made objects in general (Moore & Price, 1999). A direct confirmation can be found in a study of manipulable versus non-manipulable artefacts, which finds the classical left ventral frontal (pre-motor) activation only for the former kind of stimuli (Mecklinger, Gruenewald, Besson, Magnie, & Von Cramon, 2002)(Mecklinger et al., 2002).

            Neuro-imaging evidence for the animal domain is less straightforward. Some PET studies found specific activation of the lingual gyrus for animals, but this is also sometimes activated by artefact naming tasks (Perani et al., 1995). Some infero-temporal areas (BA20) are found to be exclusively activated by animal pictures (Perani et al., 1995), as are some occipital areas (left medial occipital) (Martin et al., 1996). The latter activation would only suggest higher modulation of early visual processing for animals. This is consistent with the notion, widespread in discussions of domain-specific selective impairment, that identification of different animal species requires finer-grained distinctions than that of artefacts: animals of different species (cat, dog) often share a basic Bauplan (trunk, legs, head, fur) and differ in details (shape of head, limbs, etc.), while tools (e.g. screwdriver, hammer) differ in overall structure. Animal-specific activations of the posterior temporal lobe seem to vanish when the stimuli are easier to identify (Moore & Price, 1999) which would confirm this interpretation as an effect of fine-grained, relatively effortful processing[4].

            Neuro-imaging findings and developmental evidence converge in supporting the evolutionarily plausible view, that inference systems are not about ontological categories like ³man-made object² or ³living thing² but about types of situations, such as ³fast identification of potential predator-prey² or ³detection of possible use of tool or weapon².

 

Advantages of mind-reading

            A central assumption of human intuitive ontology is that some objects in the world are driven by internal states, in particular by goals and other representational states such as desires and beliefs. This has received great attention in developmental models of ³theory of mind². The term designates the various tacit assumptions that govern our intuitive interpretation of other agents¹ (and our own) behavior as the outcome of invisible states like beliefs and intentions.

            On the basis of tasks such as the familiar ³false-belief² tasks, developmental psychologists suggested that the understanding of belief as representational and therefore possibly false did not emerge in normal children before the age of four (Perner, Leekam, & Wimmer, 1987), and did not develop in a normal way in autistic individuals (Baron-Cohen, Leslie, & Frith, 1985). More recently, other paradigms that avoided some difficulties of classical tasks have demonstrated a much earlier-developed appreciation of false belief or mistaken perception (Leslie & Polizzi, 1998).

            Having a rich explanatory psychological model of other agents¹ behavior is a clear example of a cognitive adaptation (Povinelli & Preuss, 1995). Indeed, above a certain degree of complexity, it is difficult to predict the behavior of a complex organisms without taking the ³intentional stance², that is, describing it in terms of unobservable entities like intentions and beliefs (Dennett, 1987). The difference in predictive power is enormous even in the simplest of situations. A judgement like ³So-and-so tends to share resources² may be based on observable regularities (So-and-so sometimes leaves aside a share of her food for me to pick up). By contrast, a judgement like ³So-and-so is generous² can provide a much more reliable prediction of future behavior, by interpreting past conduct in the light of intentions and beliefs and also knowing in what cases evidence counts or not towards a particular generalisation (e.g. ³So-and-so did not leave me have a share of her food yesterday but that¹s because she had not seen I was there², ³She is generous only with her kin², ³She is generous with friends², etc.)behaviorbehaviour [it seems that there is an issue of the generality v. specificity of the thing inferred in terms of predictive value that at some level can only be decided by empirical evidence; i.e. ³generous with kin² ³generous with friends² might offer more reliable prediction than ³generous.²; this may be too distracting to get in to here, but to me it¹s in interesting issue, since all such inferred internal states are likely also to be erroneous in many of the predictions based on them, but we use them anyway because they give us added predicatability, albeit far from perfect], by interpreting past conduct in the light of intentions and beliefs and also knowing in what cases evidence counts or not towards a particular generalisation (e.g. ³So-and-so did not leave me have a share of her food yesterday because she had not seen I was there²) .

            As in other cases where apparently broad domains are actually more fine-grained, we might ask whether the convenient term ³theory of mind² actually refers to a single inference system or rather a collection of more specialised systems, whose combination produces typically human ³mind-reading². The salience of one particular experimental paradigm (false-belief tasks) together with the existence of a specific pathology of mind-reading (autism) might suggest that ³theory of mind² is a unitary capacity, in many ways akin to a scientific account of mind and behavior (Gopnik & Wellmann, 1994). This also led to speculation as to which species did or did not have ³theory of mind² and at what point in evolution it appeared in humans (Povinelli & Preuss, 1995).

            There are two distinct origin scenarios for our capacity to understand intentional agency, to create representations of other agents¹ behavior, beliefs and intentions. A widely accepted ³social intelligence² scenario is that higher primates evolved more and more complex intentional psychology systems to deal with social interaction. Having larger groups, more stable interaction, and more efficient co-ordination with other agents all bring out, given the right circumstances, significant adaptive benefits for the individual. But they all require finer and finer grained descriptions of other agents¹ behaviors. Social intelligence triggers an arms-race resulting from higher capacity to manipulate others and a higher capacity to resists such manipulation (Whiten, 1991). It also allows the development of coalitional alliance, based on a computation of other agents¹ commitments to a particular purpose (hunting, warfare) (Kurzban & Leary, 2001), as well as the development of friendship as an insurance policy against variance in resources (Tooby & Cosmides, 1996).

            Another possible account is that (at least some aspects of) theory of mind evolved in the context of predator-prey interaction (Barrett, 1999), this volume). A heightened capacity to remain undetected by either predator or prey, as well as a better sense of how these other animals detect us, are of obvious adaptive significance for survival problems such as eating and avoiding being eaten. Indeed, some primatologists have speculated that detection of predators may have been the primary context for the evolution of agency concepts (Van Schaik & Van Hooff, 1983). In the archaeological record, changes towards more flexible hunting patterns in modern Humans suggest a richer, more intentional representation of the hunted animal (Mithen, 1996). Hunting and predator-avoidance become much better when they are more flexible, that is, informed by contingent details about the situation at hand, so that one does not react to all predators or prey in the same way.                   These interpretations are complementary, if we remember that ³theory of mind² is probably not a unitary capacity to produce mentalistic accounts of behavior, but a suite of distinct capacities. Humans throughout evolution did not interact with generic intentional agents. They interacted with predators and prey, with other animals and with conspecifics. The latter consisted of helpful parents and siblings, potentially helpful friends, helpless offspring, dangerous rivals, attractive mates. Also, successful interaction in such situations requires predictive models for general aspects of human behavior (a model of motivation and action, as it were) as well as particular features of each individual (a model of personality differences).[excellent section, and well argued from start to finish]

 

A suite of agency-focused inference engines

            These different, situation-specific models themselves orchestrate a variety of lower-level neural capacities, all of which focus on particular features of animate agents and take some form of ³intentional stance², that is, describe these features in terms of stipulated beliefs and intentions.

            One of the crucial systems is geared at detecting animate motion. For some time now, cognitive psychologists have been able to describe the particular physical parameters that makes motion seem animate. This system takes as its input format [a] particular patterns of motion(Michotte, 1963; Schlottman & Anderson, 1993; Tremoulet & Feldman, 2000) and delivers as output an automatic interpretation of motion as animate. The system seems to develop early in infants (Rochat, Morgan, & Carpenter, 1997; Baldwin, Baird, Saylor, & Clark, 2001)(Rochat, Morgan, & Carpenter, 1997; Baldwin et al., 2001). These inferences are sensitive to category-specific information, such as the to the kind of object that is moving and the context (R. Gelman, Durgin, & Kaufman, 1995; Williams, 2000).

            Animates are also detected in another way, by tracking distant reactivity. If a rock rolls down a hill, the only objects that will react contingently to this event at a distance ­ without direct contact - are the animates that turn their gaze or their head to the object, jump in surprise, run away, etc.). There is evidence that infants can detect causation at a distance (Schlottmann & Surian, 1999). This would provide them with a way of detecting as ³agents² those objects that react to other objects¹ motion. In experimental settings, infants who have seen a shapeless blob reacting to their own behavior then follow that blob¹s orienting as if the (eyeless, faceless) blob was gazing in a particular direction (Johnson, Slaughter, & Carey, 1998). There is also evidence that detection of reactivity modulates particular neural activity, distinct from that involved in the interpretation of intentions and beliefs (Blakemore, Boyer, Pachot-Clouard, Meltzoff, & Decety, 2003)(Blakemore et al., 2003).

            A related capacity is goal-ascription. Animates act in ways that are related to particular objects and states in a principled way (Blythe, Todd, & Miller, 1999). For instance, their trajectories make sense in terms of reaching a particular object of interest and avoiding non-relevant obstacles. Infants seem to interpret the behavior of simple objects in that way. Having seen an object take a detour in its trajectory towards a goal to avoid an obstacle, they are surprised if the object maintains the same trajectory once the obstacle is removed (Csibra, Gergely, Biro, Koos, & Brockbank, 1999)(Csibra et al., 1999), an anticipation that is also present in chimpanzees (Uller & Nichols, 2000)[5].

            A very different kind of process may be required for intention-ascription. This is the process whereby we interpret some agent¹s behavior as efforts towards a particular state of affairs, e.g. seeing the banging of the hammer as a way of forcing the nail though the plank. There is evidence that this capacity develops early in children. For instance, young children imitate successful rather than unsuccessful gestures in the handling of tools (Want & Harris, 2001) and can use actors¹ apparent emotions as a clue to whether the action was successful or not (Phillips, Wellman, & Spelke, 2002). Young children can choose which parts of an action to imitate even if they did not observe the end result of the action (Meltzoff, 1995)[6]. The capacity is particular important for humans, given a history of tool-making that required sophisticated perspective-taking abilities (Tomasello, Kruger, & Ratner, 1993).

            The capacity to engage in joint attention is another crucial foundation for social intelligence (Baron-Cohen, 1991). Again, we find that human capacities in this respect are distinct from those of other primates, and that they have a specific developmental schedule. The most salient development occurs between 9 and 12 months and follows a specific order: first, joint engagement (playing with an object and expecting a person to cooperate); second, communicative gestures (such as pointing); third, attention following (i.e. following people¹s gaze) and more complex skills like gaze alternation (going back and forth between the object and the person) (Carpenter, Nagell, & Tomasello, 1998). In normal adults, following gaze and attending to other agents¹ focus of attention are automatic and quasi-reflexive processes (Friesen & Kingstone, 1998). The comparative evidence shows that chimpanzees take gaze as a simple clue to where objects of interest may be, as opposed to taking it as indicative of the gazer¹s state and intentions, as all toddlers do (Povinelli & Eddy, 1996a, 1996b).

            A capacity for relating facial cues to emotional states is also early developed and seems to achieve similar adult competence in human cultures (Ekman, 1999; Keltner et al., 2003). Five-month old infants react differently to displays of different emotions on a familiar face (D'Entremont & Muir, 1997). It seems that specific neural circuitry is involved in the detection and recognition of specific emotion types (Kesler/West et al., 2001), distinct from the general processing of facial identity. These networks partly overlap with those activated by the emotions themselves. For instance, the amygdala is activated both by the processing of frightening stimuli and frightened faces (Morris et al., 1998). The detection of emotional cues presents autistic patients with a difficult challenge (Adolphs, Sears, & Piven, 2001; Nijokiktjien et al., 2001), compounded by their difficulty in understanding the possible reasons for other people¹s different emotions. Williams syndrome children seem to display a dissociation between preserved processing of emotion cues and impaired understanding of goals and beliefs (³theory of mind² in the narrow sense), which would suggest that these are supported by distinct structures (Tager-Flusberg & Sullivan, 2000).

            This survey is certainly not exhaustive but should indicate the variety of systems engaged in the smooth operation of higher ³theory of mind² proper, that is, the process of interpreting other agents¹ (or one¹s own) behavior in terms of beliefs, intentions, memories and inferences. Rudimentary forms of such mind-reading capacities appear very early in development (Meltzoff, 1999) and develop in fairly similar forms in normal children. Although familial circumstances can boost the development of early mind-reading (Perner, Ruffman, & Leekam, 1994), this is only a subtle influence on a developmental schedule that is quite similar in many different cultures (Avis & Harris, 1991).

            These various systems are activated by very different cues, they handle different input formats and produce different types of inferences. They are also, as far as we can judge given the scarce evidence, based on distinct neural systems. Early studies identified particular areas of the medial frontal lobes as specifically engaged in ³theory-of-mind² tasks (Happe et al., 1996). There is also neuro-psychological evidence that right-hemisphere damage to these regions results in selective impairment of this capacity (Happe, Brownell, & Winner, 1999). Note, however, that in both cases we are considering false-belief tasks, that is, the explicit description of another agent¹s mistaken beliefs. Actual mind-reading requires other associated components, many of which are associated with distinct neural systems. The detection of gaze and attentional focus jointly engages STS and parietal areas (Allison, Puce, & McCarthy, 2000; Haxby et al., 2002). The detection of various other types of socially relevant information also activates distinct parts of STS (Allison et al., 2000). The identification of agents as reactive objects depends on selective engagement of superior parietal areas (Blakemore et al., 2003). The simple discrimination between animate and inanimate motion is probably related to joint specific activation of some MT/MST structures as well as STS (Grossman & Blake, 2001).

            Different kinds of encounters with intentional agents provide contexts in which different cognitive adaptations result in increased fitness. Predator-avoidance places a particular premium on biological motion detection and the detection of reactive objects. Social interaction requires the early development of a capacity to read emotions on faces, but also the later development of a sophisticated simulation of other agents¹ thoughts.[yes; has anyone extended this line of argument to adults and how the adaptive problems differ from those of children; or men v. women? That is, simulation of, say, sexual intention, is different from other kinds of intention, and may have specialized features that come on line only after puberty; and be different in men and women; men in fact have sexual intentions that differ in kind from those of women, at least in some circumstances; this may be beyond scope of this chapter, but you may want to allude to it if you think it¹s the case] Dependence on hunting favours enhanced capacities for deception. The collection of neural systems that collectively support mind-reading is the result of several distinct evolutionary paths.

 

Solid objects and bodies

            We argued that domain-specific inference systems are not so much focused on a specific kind of object (ontological category) as on a certain aspect of objects (cognitive domain). A good example of this is the set of inferential principles that help make sense of the physical properties and behavior of solid objects ­ what is generally called an ³intuitive physics² in the psychological literature (Kaiser, Jonides, & Alexander, 1986).

            The main source of information for the contents and organisation of ³intuitive physics² comes from infant studies (Spelke, 1988; Baillargeon, Kotovsky, & Needham, 1995; Spelke, 2000) that challenged the Piagetian assumption, that the development of physical intuitions followed motor development (Piaget, 1930). The studies have documented the early appearance of systematic expectations about objects as units of attention (Scholl, 2001) [7], in terms of solidity (objects collide, they do not go through one another) continuity (an object has continuous, not punctuate existence in space and time) or support (unsupported objects fall) (Spelke, 1990; Baillargeon et al., 1995). Also, a distinction between the roles of agent and patient in causal events seems accessible to infants (Leslie, 1984). Action at a distance is not intuitively admitted as relevant to physical events (Spelke, 1994).

            However, the picture in terms of evolved systems may be slightly more complicated than that. The fact that many species manipulate the physical world in relatively agile and efficient ways does not necessarily entail that they do that on the basis of similar intuitive physics. In a series of ingenious experiments, Povinelli and colleagues have demonstrated systematic differences between chimpanzees and human infants, (Povinelli, 2000). The chimpanzees¹ physical assumptions are grounded in perceptual generalisations, while those of infants seem based on assumption of underlying, invisible qualities, such as force or centre of mass (Povinelli, 2000). Also, human beings interact with different kinds of physical objects. In our cognitive environment, we find inert objects (like rocks), objects that we make (food, tools) and living bodies (of conspecifics or other animals). Interaction with these is likely to pose different problems and result in different kinds of principles.

            The development of coherent action-plans and motor behavior is crucial in terms of brain development ­ the infant brain undergoes massive change in that respect, and the energy expended in motor training is enormous in the first year of life ­ and in evolutionary terms too. The effects of such development and the underlying systems are somewhat neglected in models of ³intuitive physics². This is all the more important, as neural and behavioral evidence suggests that the development of action-oriented systems and their neural implementation may be distinct from that of intuitive physics in general. That is to say, it may well be the case that young children and adults develop, not one general intuitive physics that spans the entire ontological category of medium-sized solid objects, but two quite distinct systems: one focused on these solid objects, their statics and dynamics, and the other one focused on biological motion. An interesting possible consequence is neural systems¹ representations of physical processes are somewhat redundant, as the same physical event is represented in two distinct ways, depending on the kind of object involved.

            So far, there is little direct evidence for dedicated neural systems handling representations of the physical behavior of solid objects. Many systems are involved, most of which are not exclusively activated by intuitive physical principles. There are few neuro-imaging studies of physical or mechanical violations of the type used in developmental paradigms, but the few we have find involvement of such general structures as MT/V5 (generally involved in motion processing) and parietal attentional systems (Blakemore et al., 2001).

            That biological motion is a special cognitive domain is not really controversial. In the same way as configural information is specially attended to in faces and ignored in other displays, specific processes track biological motion, that is, natural movements of animate beings (people and animals) such as walking, grasping, etc. (Johansson, 1973; Ahlstrom, Blake, & Ahlstrom, 1997; Bellefeuille & Faubert, 1998). There is now some evidence that dedicated neural structures track biological motion (see review in (Decety & Grezes, 1999)), with specific activation in STS, as well as medial cerebellum, on top of the regular activation of MT-MST for coherent motion (Grezes, Costes, & Decety, 1998; Grossman & Blake, 2001). These systems trigger specific inferences about the behavior of biological objects (Heptulla-Chatterjee, Freyd, & Shiffrar, 1996).

            The evidence also suggests that inferences about living bodies are grounded in motor-planning systems. Recent neuro-imaging evidence has given extensive support to the notion that perception of other agents¹ motion, own motor imagery and motor planning, as well as interpretation of goals from this motor imagery, are all tightly integrated (Blakemore & Decety, 2001). That is, perception of biological motion triggers the formation of equivalent motor plans that are subsequently blocked, probably by inhibitory influences from such structures as the orbito-frontal cortex. Now motor plans include specific expectations about the behavior of bodies and body-parts. In this sense they may be said to include a separate domain of intuitive physics.

 

Natural numbers and natural operations

            Numerical cognition too illustrates how cognitive domains can diverge from ontological categories. Numerical processes could in principle consist in a single ³numerosity perception² device. In fact, different processes are in charge of different aspects of number in different situations.

            Numerical competence is engaged in a whole variety of distinct behaviors. Children from an early age can estimate the magnitude or continuous ³numerousness² of aggregates (e.g. they prefer more sugar to less); they also estimate relative quantities of countable objects (a pile of beads is seen as ³bigger² than another); they count objects (applying a verbal counting routine, with number tags and recursive rules, to evaluate the numerosity of a set); they produce numerical inferences (e.g. adding two numbers); they retrieve stored numerical facts (e.g. the fact that two times six is twelve).

            This variety of behaviors is reflected in a diversity of underlying processes. Against the parsimonious but misleading vision of a unitary, integrated numerical capacity, many findings in behavioral, developmental, neuro-psychological and neuro-imaging studies converge to suggest a variety of representations of numbers and a variety of processes engaged in numerical inference (Dehaene, Spelke, Pinel, Stanescu, & Tsivkin, 1999)(Dehaene et al., 1999). In particular, one must distinguish between a pre-verbal, analogue representation of numerosities on the one hand and the verbal system of number-tags and counting rules on the other (Gallistel & Gelman, 1992)[8].

            This division is confirmed by neuro-psychological and neuro-imaging studies (Dehaene et al., 1999). One system is principally modulated by exact computation, recall of mathematical facts and explicit application of rules, engaging activation of (mostly left hemisphere) inferior prefrontal cortex as well as areas typically activated in verbal tasks. The engagement of parietal networks in number estimation suggests a spatial representation of magnitudes, supported by the fact that magnitude estimation is impaired in subjects with spatial neglect, and can be disrupted by transcranial magnetic stimulation of the angular gyrus[9]. The analogue magnitude system encodes different numerosities as different points (or, less strictly, fuzzy locations) along a ³number line², an analogical and incremental representation of magnitudes. The other network is engaged in approximation tasks and comparisons, activating bilateral inferior parietal cortex.

            The distinction between systems is also relevant to development of the domain. To produce numerical inferences, children need to integrate the representations delivered by the two different systems. The first one is the representation of numerosity provided by magnitude estimation. The second one is the representation of object identity. Individuated objects allow inferences such as (1-1=0) or (2-1‚2) which are observed in infants in dishabituation studies (Wynn, 1992, 2002). The acquisition process requires a systematic mapping or correspondence between two distinct representations of the objects of a collection (R. Gelman & Meck, 1992).