November 12, 2019
Written by: Greer Prettyman
Neuroscientists know that our thoughts, sensations, and experiences come about through specific activation patterns of the cells in our brains. Because humans have about 86 billion neurons, determining the specific combinations of activity in these cells that relate to particular thoughts is a daunting challenge. However, some types of “thought” are easier to decode than others. By using computational modeling techniques, researchers have gotten pretty good at interpreting neural signals related to visual experiences, allowing them to “read your mind” and infer what you are seeing.
Decoding visual experiences begins by first recording markers of human brain activity using functional magnetic resonance imaging (fMRI) while people look at pictures or videos1. These signals are then used to build models that relate what was seen with the pattern of activity in different parts of the brain. The key to interpreting the brain signals is to build a “decoder”, or model, that can infer features of an image being perceived. Then, these models can be used to guess or even recreate what a person is looking at.
In a study published in the journal Nature in 2008, researchers found that a visual decoder could identify specific images that people were viewing, just by analyzing their brain signals1. In this study, participants first viewed over 1000 images during an fMRI scan. These images were natural images, meaning they showed complex scenes and objects just like people experience in the natural world, rather than simplified shapes or patterns that are sometimes used as visual stimuli in experiments. Activity was measured in many voxels, three dimensional units of measurement that each contain hundreds of thousands of neurons, in the visual cortex while people looked at these natural images. With this measured activity, researchers were able to build a mode to decode what these people were seeing.
How did these researchers manage this form of mind reading? The first step in building a decoder is to develop a model for how brain activity in each voxel relates to features of the image that was seen. In the visual cortex of the brain, neurons have specific receptive fields, or types of visual input that cause them to respond. These input features include things like brightness, orientation of a line or object, and spatial position. Decoder models often involve characterizing images based on these visuals features and statistically relating the features to the patterns of brain activity. The model is trained by having it compare visual features and brain activity patterns for hundreds or even thousands of images (Figure 1, left). Through this training process, the model “learns” which patterns of activity across many voxels are related to specific underlying components of visual images.
Next, the model is used to predict brain activation responses for a new set of images (Figure 1, right). For each image, the model estimates what the brain activity patterns are expected to look like based on the features in the image and the associations that were learned during the training phase. To test the model, brain activity is measured while a person looks at these new images. During the testing phase, the model isn’t told which image the person was seeing. The decoder makes a guess about which image in the set the person was looking at by matching the observed activity to the most similar set of predicted activity. In this way, what the person saw is inferred from their brain activation. In this specific study, the model correctly inferred which image one person looked at for upwards of 80% of images seen – a performance that is much better than guessing by chance.
A study published in the journal Neuron in 2009 built upon the initial decoding research by incorporating semantic features, or language categorizations, into the model in addition to the visual features of the stimuli in order to improve performance and interpretability of decoding2. To understand how this semantic information can aid in decoding, think about how you might identify an image. You can imagine seeing two round, red shapes that have very similar underlying visual features. By adding the semantic label “fruit”, it would become much easier to identify that the red shape is an apple and not a balloon.
This strategy was used to improve precision of decoding models. Participant’s brain activity was measured while they viewed 1750 natural images. Brain activity was measured both in early visual areas that encode visual features of images as well as in more advanced visual areas that encode more complex features like semantic identification of visual stimuli.
While the Nature study above involved matching evoked brain activity to an image from a known set of only 120 images, this time the model decided what the image should look like based solely on the brain activity pattern. The model “guessed” a reconstruction of the viewed picture by choosing an image from a set of 6 million images in a database that was predicted to create a pattern of brain activity similar to that for the image that was seen.
Before the semantic information was added in to the model, the decoder’s guesses for an image that might evoke the observed patterns of brain activity often had similar visual features to the target image but did not usually meaningfully recreate the object or scene that was viewed. In one example, when a person saw a picture of a row of buildings, the decoder reconstructed an image of a dog as the best guess for what they saw. To fix this problem and ensure that the recreated images were interpretable, semantic information was added into the model during the training phase. This was done by labeling the 1750 images with one of 23 semantic categories. The labels were then added into the model during the training phase so the decoder could learn which features of an image were related to both semantic category and brain activity in higher level visual areas.
In the testing phase, researchers found that with the semantic and visual information both included in the model, the decoder was much better at accurately reconstructing images. For example, when someone viewed a bunch of grapes, the image of the decoder’s reconstructed guess depicted a bunch of berries. While this wasn’t perfectly correct, it was much better than a reconstruction created without the semantic labels included in the model.
Accuracy of the model was evaluated in terms of both visual reconstruction and semantic category classification. Viewed images and reconstructed images were compared using a matching algorithm to determine image similarity. The images had much higher visual similarity correlations than would occur by chance, indicating that the model was accurately recreating visual features. Accuracy of semantic classification was evaluated by determining whether the reconstructed image belonged to the same category as the target image that the person saw. When success was defined by whether the model correctly determined between only two broad categories, animate vs. inanimate, the model was 90% accurate at recreating an image that matched the status of the original images. When using 23 more specific categories, the model only correctly reconstructed an image that belonged to the same category as the original image for 40% of images, but this performance is still much better than chance accuracy. This study demonstrated that decoders can be used to really get an idea of what you are looking at, an important feature of effective decoding.
With advances in machine learning capabilities and more advanced statistical paradigms, decoding models are constantly getting more sophisticated and better at reconstructing visual images. Now models can go far beyond simple matching of predicted and observed activity patterns for images and can handle more complex challenges like recreating movies3. In some cases, decoding models can even reconstruct images from memories4.
While decoding can sound like a futuristic mind reading technique, it’s mostly a tool to better understand the relationship between your brain activity and what you are seeing or thinking. In this short video, you can see Jack Gallant, a pioneer of this method, demonstrate example reconstructions of movie clips. By reconstructing visual imagery, we are able to learn more about processes like dreaming and thinking that are difficult to study in other ways. In addition to being a cool technology, researchers hope this type of work will also lead to clinical and societal advances, such as the ability to communicate with people who cannot speak after a brain injury or illness. Decoding models provide an exciting way to understand our thought patterns and get a glimpse at the intricacies of the mind.
- Kay, K. N., Naselaris, T., Prenger, R. J., & Gallant, J. L. (2008). Identifying natural images from human brain activity. Nature, 452(7185), 352–355.
- Naselaris, T., Prenger, R. J., Kay, K. N., Oliver, M., & Gallant, J. L. (2009). Bayesian Reconstruction of Natural Images from Human Brain Activity. Neuron, 63(6), 902–915.
- Nishimoto, S., Vu, A. T., Naselaris, T., Benjamini, Y., Yu, B., & Gallant, J. L. (2011). Reconstructing Visual Experiences from Brain Activity Evoked by Natural Movies. Current Biology, 21(19), 1641–1646.
- Naselaris, T., Olman, C. A., Stansbury, D. E., Ugurbil, K., & Gallant, J. L. (2015). A voxel-wise encoding model for early visual areas decodes mental images of remembered scenes. NeuroImage, 105, 215–228.
Cover image from TheDigitalArtist via Pixabay https://pixabay.com/photos/woman-technology-binary-computer-3211957/
Figure 1 created with BioRender