Research proposal: Building a brain that speaks – programming the auditory-motor interface

People are speaking machines, and we are born with the ability to learn a language. Yet, the specific language we acquire is determined by our environment. How would an engineer build a system that can learn any language in the world by simply listening to other individuals speaking? This is the problem I try to address by studying word representations in the brain of humans learning their native language. Decoding spoken words in isolation or from continuous speech is in itself one of the great achievements of the human brain, but word production is even more challenging as a highly complex form of motor control involving more than 100 muscles. Most perplexing, however, is the brain’s ability to couple heard and spoken words through an auditory-motor interface and emerge with a communication device that works like a well-oiled machine. The brain mechanisms for this linkage are poorly understood and a breakthrough is overdue. To remedy the current lack of understanding, this project will use advanced neuroimaging and analysis techniques to study the longitudinal development of auditory and articulatory word representations in the brain of infants and toddlers and follow them through kindergarten, with young adults serving as a baseline. Most importantly, the project will investigate how internal models of speech are initially set up in the brain through sensorimotor feedback and become the cornerstone of language as we continue to learn new words throughout life. The results will be compared with data from the development of the vocalization system in rhesus monkeys using the same techniques. Through this comparison, I expect to gain fundamental insights into language evolution and into the crucial brain mechanisms for why humans have the ability to acquire language but monkeys don’t.

The results will have a broad impact on fields as diverse as language disorders, such as aphasia and dyslexia, on education, and on the design of humanoid robots.
Josef Rauschecker, PhD, Professor of Neuroscience

Extended Synopsis of the Scientific Proposal

Humans learn to speak at an early age, but they continue to add new words to their vocabulary throughout their lives. Infants acquire the ability to decode and differentiate complex sounds produced by other humans almost effortlessly during their first year of life and later learn to produce the same sounds with increasing accuracy. This project is about the brain mechanisms for speech perception and production but mostly about the brain’s ability to couple the two systems and turn this ability into a communication device. In terms of control theory, such systems can be characterized as internal models, which accomplish their intended function by emulating the outside world inside the brain. I will study this astonishing brain system – the programming of internal models for speech at the word level and its consequences for language learning in general — in infants, children up to first grade of elementary school, and young adults as controls. Advanced imaging techniques will inform us in minute detail about the brain structures participating in speech perception and production and their connections. Comparing these results with those gained from nonhuman primates using similar techniques will inform us about the origin of language and provide a broad impact on the understanding of the human mind and its complexity.

Another new technique, real-time MRI (RT-MRI), enables the visualization of articulator movements in real time, while a subject is speaking (Frahm et al. 2014; Lingala et al. 2016; Echternach et al. 2016). This will revolutionize the study of speech production and its neurophysiology: Activation of brain centers for speech production can be recorded in the same subjects with BOLD imaging and co-registered on the same brains, which will provide precise information about motor control of speech and its interplay, including from cortical locations.

Research Contributions

I have discovered the organization of the auditory cortical system into two largely segregated pathways (a ventral and a dorsal stream) in nonhuman primates (Rauschecker 1997, 1998; Romanski et al. 1999; Rauschecker and Tian 2000; Tian et al. 2001). This finding was later confirmed for the human auditory and language system. There are similarities with the dual pathways for ‘what’ and ‘where’ in the visual cortical system (Ungerleider and Mishkin 1982; Goodale and Milner 1992), but it was not immediately obvious how the origin of speech and language may have been enabled by such an organization. Later observations have proposed a scheme whereby sound objects (the content of speech and language) are coded in the ventral stream, whereas the sequential structure of language, which is essential for its production, is encoded in the dorsal stream (Hickok and Poeppel 2007; Rauschecker and Scott 2009; Rauschecker 2011, 2018). Recent studies from my lab, partly in collaboration with others, have extended the exploration of auditory processing from nonprimary auditory cortex to prefrontal areas of monkeys (Ortiz-Rios et al. 2017) and humans (Jiang et al. 2018), tackling, among others, the categorization of sound objects, including the words of speech and language: How can words be identified as the same even though their acoustic information varies a great deal with age and gender?

Objectives

The project’s primary objective is to build a conceptual and brain model that links speech perception and production. This primary objective incorporates the following three secondary objectives:

SO1 (decoding of speech sounds in the human auditory cortex) is concerned with how phonological word representations are set up in the auditory cortex. Extrapolating from nonhuman primate studies, the auditory “ventral stream” builds these representations in a hierarchically-organized, feedforward architecture terminating in ventral prefrontal cortex. Selectivity for words is acquired by experience-dependent synaptic plasticity within this network (Kohonen and Hari 1999).

SO2 (production of speech sounds in motor-related areas) explains how short sequences of sounds (e.g. consonant-vowel combinations) are produced by associating dorsal-stream networks of the auditory cortical system with motor-related areas: Phonological planning in prefrontal and premotor cortex; integration of auditory and somatosensory feedback in parietal cortex; and output to articulators via motor cortex and brainstem. SO3 (development of vocalization networks in humans and nonhuman primates) compares the development of vocalization networks in humans and nonhuman primates using structural and functional connectivity analysis. This will identify the key differences between human and nonhuman animals that have enabled the evolution of speech and language in humans.

The Ground-breaking Nature of the Proposed Project

The project takes up the longstanding issue of how babies learn to speak, or – put differently – how receptive (phonological) and productive (articulatory) representations of words develop and interact by forming internal models of speech inside the brain.

The project is unique by combining different approaches from neuroscience, linguistics and robotics, to deliver a cognitively inspired conceptual model of language acquisition.

Cutting-edge neuroimaging techniques, such as rapid-adaptation fMRI (fMRI-RA), will be used to visualize the mapping of speech sounds in the auditory cortex and beyond at ultra-high sub-voxel spatial resolution. Advanced analysis techniques like multivariate pattern analysis (MVPA) and representational similarity analysis (RSA) represent the cutting edge in computational techniques, with the potential to greatly enhance the knowledge that can be gleaned from fMRI data. Real-time MRI (RT-MRI) will be used to analyze, in real time, the mechanisms and structures controlling the articulators of the vocal apparatus. The development of brain connectivity (both structurally and functionally) in infants and children learning their native language will be measured with diffusion tensor imaging (DTI) and resting-state functional connectivity (rsfcMRI) analysis.

The project will also help to answer questions about the evolution of language. The PI is uniquely qualified to spearhead this effort, because he has been a leader in the analysis of brain organization in nonhuman primates for over 20 years, with particular emphasis on the organization of higher auditory pathways and their use for the processing of species-specific vocalizations.

Groundbreaking results can be expected also in terms of a formalized understanding of sensorimotor interactions or internal models, as part of biologically motivated design principles in artificial intelligence and humanoid robotics.

Significance & Impacts

The project is of significance for many branches of linguistics and the overlap with neuroscience, cognition, computer science, and sociology in the following ways. The project will have a significant impact on understanding speech perception and production and their interactions at the level of word representations in the brain.

The project will impact the study of language acquisition (Jakobson 1941; Lenneberg 1967; Kuhl 1994, 2004, 2010) by advancing our understanding of how children learn their native language and, through this understanding, providing us with clues on how second (and third) language learners may use the established scaffold and integrate subsequent languages with the native language.

Beyond linguistics, the project will make significant advances to understanding neurophysiological and neurocognitive principles of (i) auditory and visual perception, (ii) motor control, and (iii) internal models for brain representations of behavior.

The project will have benefits to science outreach given its concern with migration and its impact onlanguage learning and cultural integration. Furthermore, the project will have a direct impact on the field of robotics, especially the design principles of humanoid robots. Speech production is at its core a motor control problem, and the project will spawn further collaboration between cognitive

neuroscience and robotics. These topics are likely to be considerable sources of media interest.

Finally, the project provides significant advances to understanding the human mind and its complexity because it explains for the first time in an integrated cognitive-computational architecture how the highest cognitive functions such as language are implemented in the brain. By comparison with nonhuman primate data the project will identify how human-specific functions have developed from earlier precursors during evolution.

Relation of the Project to Robotics

Speaking is a highly complex form of motor behavior that happens to generate sounds. A number of other things had to happen during evolution for these sounds to serve a purpose of auditory communication: The sounds emitted by the “speaker” had to be decoded by their recipient’s auditory system and evoke responses in their brains. Then these brains had to find a way to produce the same sounds with their own vocal apparatus. In order for this to happen, an internal model needs to be set up that represents the outside world inside the brain, thus coupling the auditory with the motor system. Such internal models are the essence of motor control theory, which has so successfully designed robots and other intelligent machines that mimic human behavior. It appears that nature has been able to incorporate these principles of motor control into the bodies of animals, and there are many examples of this from reaching and grasping to tool use. However, there is no better example of an internal model than speech, which has enabled humans to invent poetry and, in parallel, music.

Resources

More than half of the budget is for personnel and includes: two post-doctoral researchers to focus respectively on human and nonhuman primate imaging; three doctoral positions each with a ¾ position whose focus is on more specialized aspects; and part-time funding for the PI. The remainder is for consumables, publications, and travel. No funds are requested for equipment.

References

Barron HC, Garvert MM, Behrens TEJ. 2016. Repetition suppression: a means to index neural representations using BOLD? Philos Trans R Soc B Biol Sci. 371(1705):20150355. doi:10.1098/rstb.2015.0355.

Dehaene-Lambertz G. 2017. The human infant brain: A neural architecture able to learn language. Psychon Bull Rev. 24(1):48–55. doi:10.3758/s13423-016-1156-9. [accessed 2018 Apr 16]. http://link.springer.com/10.3758/s13423-016-1156-9.

Dehaene-Lambertz G, Spelke ES. 2015. The Infancy of the Human Brain. Neuron. 88(1):93–109. doi:10.1016/J.NEURON.2015.09.026. [accessed 2018 Apr 16]. https://www.sciencedirect.com/science/article/pii/S0896627315008156?via%3Dihub.

DeWitt I, Rauschecker JP. 2012. Phoneme and word recognition in the auditory ventral stream. Proc Natl Acad Sci. 109(8):E505-514.

Echternach M, Burdumi L, Traser B. 2016. Singing in an MRI.

Echternach M, Burk F, Burdumy M, Traser L, Richter B. 2016. Dynamic changes of vocal tract articulators in different loudness conditions. PLoS One 11:e0153792.

Frahm J, Schätz S, Untenberger M, Zhang S, Voit D, Merboldt KD, Sohns JM, Lotz J, Uecker M. 2014. On the Temporal Fidelity of Nonlinear Inverse Reconstructions for Real- Time MRI – The Motion Challenge. Open Med Imaging J. 8:1–7. doi:10.2174/1874347101408010001.

Franklin DW, Wolpert DM. 2011. Computational mechanisms of sensorimotor control. Neuron. 72(3):425–442. doi:10.1016/j.neuron.2011.10.006.

Goodale MA, Milner AD. 1992. Separate visual pathways for perception and action. Trends Neurosci. 15(1):20–25. doi:10.1016/0166-2236(92)90344-8.

Hickok G, Poeppel D. 2007. The cortical organization of speech processing. Nat Rev Neurosci. 8(5):393–402. doi:10.1038/nrn2113.

Jakobson R. 1941. Child Language, Aphasia, and Phonological Universals. The Hague: Mouton Publishers.

Jiang X, Chevillet MA, Rauschecker JP, Riesenhuber M. 2018. Training Humans to Categorize Monkey Calls: Auditory Feature- and Category-Selective Neural Tuning Changes. Neuron. 98(2):405–416.e4. [accessed 2018 Jun 1]. http://www.ncbi.nlm.nih.gov/pubmed/29673483.

Jürgens U. 2002. Neural pathways underlying vocal control. Neurosci Biobehav Rev. 26:235–258.

Jusczyk PW, Aslin RN. 1995. Infants’ Detection of the Sound Patterns of Words in Fluent Speech. Cogn Psychol. 29(1):1–23. doi:10.1006/cogp.1995.1010.

Kohonen T, Hari R. 1999. Where the abstract feature maps of the brain might come from. Trends Neurosci. 22(3):135–139. doi:10.1016/S0166-2236(98)01342-3.

Kuhl PK. 1994. Learning and representation in speech and language. Curr Opin Neurobiol. 4(6):812–822. doi:10.1016/0959-4388(94)90128-7.

Kuhl PK. 2004. Early language acquisition: Cracking the speech code. Nat Rev Neurosci. 5(11):831–843. doi:10.1038/nrn1533.

Kuhl PK. 2010. Brain Mechanisms in Early Language Acquisition. Neuron. 67(5):713–727. doi:10.1016/j.neuron.2010.08.038.

Kumar V, Croxson PL, Simonyan K. 2016. Structural Organization of the Laryngeal Motor Cortical Network and Its Implication for Evolution of Speech Production. J Neurosci. 36:4170–4181.

Kuypers HGJM. 1958. Corticobulbar connexions to the pons and lower brain-stem in man: An anatomical study. Brain. 81:364–388.

Leaver AM, Rauschecker JP. 2010. Cortical Representation of Natural Complex Sounds: Effects of Acoustic Features and Auditory Object Category. J Neurosci. 30:7604–7612.

Lenneberg EH. 1967. Biological foundations of language. New York: Wiley.

Levelt WJM. 1989. Speaking: From Intention to Articulation. MIT Press.

Liberman AM, Mattingly IG. 1985. The motor theory of speech perception revised. Cognition. 21:1–36.

Lingala SG, Sutton BP, Miquel ME, Nayak KS. 2016. Recommendations for real-time speech MRI. J Magn Reson Imaging. 43(1):28–44. doi:10.1002/jmri.24997.

Norman-Haignere S V., Albouy P, Caclin A, McDermott JH, Kanwisher NG, Tillmann B. 2016. Pitch-Responsive Cortical Regions in Congenital Amusia. J Neurosci. 36(10):2986–2994. doi:10.1523/JNEUROSCI.2705-15.2016.

Ortiz-Rios M, Azevedo FAC, Kuśmierek P, Balla DZ, Munk MH, Keliris GA, Logothetis NK, Rauschecker JP. 2017. Widespread and Opponent fMRI Signals Represent Sound Location in Macaque Auditory Cortex. Neuron. 93(4):971–983.e4. doi:10.1016/j.neuron.2017.01.013.

Perani D, Saccuman MC, Scifo P, Anwander A, Spada D, Baldoli C, Poloniato A, Lohmann G, Friederici AD. 2011. Neural language networks at birth. Proc Natl Acad Sci. 108(38):16056–16061. doi:10.1073/pnas.1102991108.

Raschle N, Zuk J, Ortiz-Mantilla S, Sliva DD, Franceschi A, Grant PE, Benasich AA, Gaab N. 2012. Pediatric neuroimaging in early childhood and infancy: Challenges and practical guidelines. Ann N Y Acad Sci. 1252(1):43–50. doi:10.1111/j.1749-6632.2012.06457.x.

Rathelot J-A, Dum RP, Strick PL. 2017. Posterior parietal cortex contains a command apparatus for hand movements. Proc Natl Acad Sci. 114:4255–4260.

Rauschecker JP. 1997. Processing of complex sounds in the auditory cortex of cat, monkey, and man. Acta Otolaryngol. 532:34–38.

Rauschecker JP. 1998. Parallel processing in the auditory cortex of primates. Audiol Neurootol. 3:86–103.

Rauschecker JP. 2011. An expanded role for the dorsal auditory pathway in sensorimotor control and integration. Hear Res. 271:16–25.

Rauschecker JP. 2018. Where, When, and How: Are they all sensorimotor?

Towards a unified view of the dorsal pathway in vision and audition. Cortex. 98:262–268. doi:10.1016/j.cortex.2017.10.020.

Rauschecker JP, Scott SK. 2009. Maps and streams in the auditory cortex: Nonhuman primates illuminate human speech processing. Nat Neurosci. 12(6):718–724.

Rauschecker JP, Tian B. 2000. Mechanisms and streams for processing of “what” and “where” in auditory cortex. Proc Natl Acad Sci. 97(22):11800–11806.

Romanski LM, Tian B, Fritz J, Mishkin M, Goldman-Rakic PS, Rauschecker JP. 1999. Dual streams of auditory afferents target multiple domains in the primate prefrontal cortex. Nat Neurosci. 2:1131–1136.

Tian B, Reser D, Durham A, Kustov A, Rauschecker JP. 2001. Functional specialization in rhesus monkey auditory cortex. Science. 292:290–293.

Ungerleider LG, Mishkin M. 1982. Two cortical visual systems. In: Analysis of Visual Behavior. p. 549–586.

Werker JF. 1995. Exploring developmental changes in cross-language speech perception. In: An Invitation to Cognitive Science: Language 1. p. 87–106.

Wernicke C. 1874. Der aphasische Symptomencomplex. Eine psychologische Studie auf anatomischer Basis. [The aphasia symptom complex. A psychological study on an anatomical basis].

Wolpert DM, Ghahramani Z, Jordan MI. 1995. An internal model for sensorimotor integration. Science. 269:1880–1882.

Research proposal: Building a brain that speaks – programming the auditory-motor interface

More Work

Livermore takes implantable microsystems to the next level

Georgetown neuroscientists design a model to mirror human visual learning

The Walk Again Project