How the mind sees the world

From aardvark to zyzzyva, the world we live in is rich and complex. How is this diversity of objects represented in the human mind? Through an experimental and computational tour de force, Hebart et al. show that people share a mental representation of objects based on a small number of meaningful dimensions.

The question of how concepts are represented in the human mind has driven much exciting research over the past several decades. A new study by Hebart et al.1 in this issue of Nature Human Behaviour represents a critical step forward in understanding these mental representations of concepts, with exciting implications for a number of fields. Previous work in cognitive science posited concept representations based on large numbers of binary attributes2 (for example, ‘has fur’) or high-dimensional semantic vector spaces extracted from co-occurrence patterns of related words3. Yet, these types of representations are somewhat at odds with recent theories of neural coding, which propose that the brain employs ‘sparse’ encoding schemes. In these schemes, a particular stimulus is represented by a small subset of activated neurons, each selective for a different relevant feature dimension4. Sparse codes have computational advantages in terms of generalization ability and encoding capacity. They are also well matched to the constraints of biological hardware in terms of connectivity and efficiency. Evidence for sparse representations has been found at different levels in the brain’s simple-to-complex visual processing hierarchy4–6 and also in non-visual sensory modalities (for example, in audition7). Yet, while it has been shown how sparse representations (for example, for shape4) can be computed from natural images, the inability to probe the mental space of concepts in a comprehensive and unbiased way had stymied the investigation of sparse codes of concepts.

At the core of Hebart et al.’s approach to this challenge is a clever task in which participants are presented with a triplet of images of randomly chosen objects on each trial and asked to pick the ‘odd one out’, i.e., the one object least similar to the other two. This task avoids forcing subjects to use verbal descriptions of particular dimensions, which might not apply to all object comparisons. This is especially true in the large space of 1,854 nameable concepts investigated by Hebart et al. Yet, even when just choosing one representative example per category (as in their study), exhaustively sampling all triplets would require more than a billion comparisons, thereby creating a bit of a logistical challenge. However, if one assumes that the mental space of concept representations is of a low dimensionality (much less than 1,854, say), then one could perhaps hope to estimate this space based on a fraction of those triplet similarity judgments. This is the gambit Hebart et al. tried by sampling only 0.14% of the total space of triplets, resulting in 1.46 million responses from around 5,000 online participants. These ratings were used to perform a model-based estimation of the dimensions underlying the participants’ similarity judgments. Specifically, the similarity ratings were used to train a shallow neural network that mapped individual objects to a 90-dimensional output vector. Importantly, the network training used a sparseness constraint to enforce solutions in which each object was represented by positive coefficients for a small subset of the 90 possible dimensions. After training, the authors used a cut-off to eliminate features with low weights, resulting in a final set of just 49 dimensions that the model predicted participants used to compare objects. Similar outcomes were obtained for different numbers of starting dimensions and conditions, thereby showing that the results were not artefacts of particular parameter choices. Rather strikingly, most of the 49 dimensions were found to be easily and highly consistently labelled by human participants (for example, as ‘food-related’ or ‘red’).

To test the model’s ability to predict behavioural similarity ratings, the authors performed another experiment (with new participants) with 48 objects whose similarity matrix was now completely sampled (yielding 43,200 comparisons). Quite excitingly, the model was able to predict the complete behavioural similarity matrix with accuracy close to the inter-participant noise ceiling. Further validating their model of object similarity, Hebart et al. showed that when they asked participants to rate individual objects along the dimensions indicated by the model (specified merely by showing example images rated high and low along a particular axis), these ratings could be used to predict how similarly participants would rate pairs of these objects.

In summary, the model presented by Hebart et al. represents a breakthrough in our understanding of how humans represent objects. The research directly suggests a host of intriguing avenues for further research, such as: how do individuals’ mental concept spaces develop? Do people start with a (hard-wired?) subset of dimensions (if so, which ones?) that are then added to throughout development based on experience? Or does the nature of the representations change fundamentally during development? Given the simplicity and non-verbal nature of the odd-one-out task, it should be accessible to a wide range of populations with possibly different concept representations. Can a similar approach be applied to, say, auditory ‘objects’, to probe modality-specific and modality-invariant dimensions of concept representations?The study should also energize the search for the neural bases of conceptual similarity judgments. Recent studies using attribute-based8 and semantic vector space-based9 approaches have shown the power of neuroimaging to probe semantic representations in the brain. Where in the brain do neural responses correlate with the particular dimensions of the mental concept space found by Hebart et al.? In particular, are there brain areas where a number of these dimensions come together? Intriguingly, Hebart et al. found that encoding each object along six to eleven object-specific dimensions was sufficient to explain 95–99% of model’s performance in the odd-one-out task. Are there local semantic hubs (‘hublets’, as it were, which would be easier to wire up biologically than a domain-general hub that represents all dimensions10) that encode specific combinations of dimensions—for example, ‘artificial’, ‘transportation’ and ‘technology’ or ‘body-part related’, ‘skin-related’ and ‘head-/face-related’ etc.—and could impairments in such low-dimensional hublets account for patterns of semantic processing deficits found in neuropsychological populations?Finally, while the study was limited to concrete concepts, it seems conceivable that its odd-one-out similarity judgment paradigm and modelling approaches could be extended to abstract concepts (for example, presented verbally). Looking ahead, it will be highly interesting to see how the study by Hebart et al. will further advance the boundaries of our mental spaces.

Maximilian Riesenhuber 

Department of Neuroscience, Georgetown University Medical Center, Washington, DC, USA.
Published online: 12 October 2020

References 1. Hebart, M.N., Zheng, C.Y., Pereira, F. & Baker, C.I. Nat. Hum. Behav. (2020). 2. Rosch, E. & Mervis, C. B. Cognit. Psychol. 7, 573–605 (1975). 3. Landauer, T. K. & Dumais, S. T. Psychol. Rev. 104, 211–240 (1997). 4. Olshausen, B. A. & Field, D. J. Curr. Opin. Neurobiol. 14, 481–487 (2004). 5. Cox, P. H. & Riesenhuber, M. J. Neurosci. 35, 14148–14159 (2015). 6. Reddy, L. & Kanwisher, N. Curr. Opin. Neurobiol. 16, 408–414 (2006). 7. Smith, E. C. & Lewicki, M. S. Nature 439, 978–982 (2006). 8. Mitchell, T. M. et al. Science 320, 1191–1195 (2008). 9. Huth, A. G., de Heer, W. A., Griffiths, T. L., Theunissen, F. E. & Gallant, J. L. Nature 532, 453–458 (2016). 10. Chen, L., Lambon Ralph, M. A. & Rogers, T. T. Nat. Hum. Behav. 1, 0039 (2017).

Competing interests: The author declares no conflict of interest.