How Spotify Uses What We Know About the Brain to Make Music Recommendations

January 26, 2021

You may have encountered the term “neural network” somewhere in the last few years. If you enjoy services like Netflix and Spotify, shop on Amazon, or used Google to get to this article, they’re already part of your daily life. Neural networks are one of the tools that allow companies like Netflix, Spotify, and Amazon to make customized recommendations about movies, songs, and items that you are likely to enjoy. Google uses neural networks to power services like Google Translate, to identify images, and to understand the meaning of web pages without needing a human to read its contents. In certain cases, neural networks are capable of doing many of the same things that humans are, sometimes even faster and more consistently than humans. Perhaps this is because neural networks are inspired by, and named after, the structure our brains use to accomplish these tasks.

One of the first places neural networks were successful was in identifying images. Image identification is a challenging task for machines because images that belong to a given object category can be very diverse. Humans can easily recognize that all of the images in Figure 1 depict a pug. However, the visual features of each image are actually very different from one another, making it difficult to train computers to do the same. The computer should also be able to recognize a pug no matter where it appears in the image and what else is around it, further complicating this task.

Figure 1: Images from the same category can look very different. Training models to recognize that these images belong to the same category used to be very difficult, but neural networks can be just as good as humans at recognizing these images.

Scientists have spent decades trying to understand how our brains accomplish such a difficult task so quickly and successfully. The key lies in the hierarchical layout of the brain’s visual system. Your brain is made of many specialized cells called neurons that are capable of receiving and sending messages. One neuron alone cannot do things as complex as recognizing the face of a friend, but when many of them are connected they can produce the complex thoughts and behaviors that guide our daily lives.

When you look at an image of a pug, one of the first kinds of neuron that is activated is called a Retinal Ganglion Cell. These cells perform a much simpler task than object identification. Each neuron becomes active when there is a spot of light in the area of the image that it responds to, referred to as its receptive field¹. Thousands of these neurons respond to different parts of the image and are connected to another set of neurons in a brain area called V1. The neurons in this area merge the responses of different combinations of Retinal Ganglion Cells so that they respond to bars of light, rather than small circles². These responses are useful because they convey information about where the edges of objects are in our visual field. By connecting neurons in one brain area to neurons in the next, they are able to build a more complex representation of the world (from dots to bars). By continuously connecting neurons in each area to neurons in the next, the brain eventually builds up a representation that is complex enough to identify an object. This capability is what allows us to interact with the world, such as when we recognize the face of a friend, decide when to cross the street, or watch a sports game. While the visual system is a great model of how neurons can start with a simple representation and build up to a more complex one, similar mechanisms in the brain are used to accomplish almost any complex task like decision-making or reading.

In the last decade, scientists have designed artificial neural networks that are run on computers and mimic this hierarchical architecture of the brain, allowing them to achieve human-level identification of images. As shown in Figure 2, artificial neural networks are made of several layers composed of thousands of synthetic neurons. Synthetic neurons mimic the capabilities of real neurons but do not exist in the same way that cells in the brain do. Instead, computer scientists use mathematical equations to represent what each neuron accomplishes. Where real neurons are physically connected in the brain, artificial neural networks will use the output of one equation as the input to the next to model this relationship. In the brain, layers of these interconnected neurons pass information to the next. Similarly, in artificial neural networks each layer of the network is linked to the next by a complex set of learned mathematical connections that mimic the complexity of connections in the brain. After the information is passed through these layers, the network can take an image as input and understand its contents to recognize what it depicts. For example, a neural network could receive any of the images in Figure 1 as an input and learn to assign the label “pug” as an output.

Children might learn to reliability identify something like a pug through repeated exposure to images of dogs. Similarly, artificial neural networks are only capable of completing these tasks once they have been trained. Neurons tend to listen to some of their neighbors more than others and how much a neuron listens to each of its inputs is called the strength of a connection. The strength of the connections between neurons (whether real or synthetic) determines how well a network performs on any given task. During a training period, scientists “show” the network many images with their correct labels. The model learns the best set of connections by adjusting its weights based on how accurate its prediction is compared to the true label. After viewing thousands of images, the network learns a set of weights that allows it to understand the content of an image in order to make judgments about what it contains. After the network is trained, the weights do not change and we can use any image from a category that the network was trained on as an input to get a classification from the model.

Figure 2: Neural networks work by processing information in a manner very similar to the brain. Instead of the brain passing information between layered regions using neurons, artificial neural networks pass information between layers using synthetic neurons which are just mathematical equations that represent the response of a single neuron.
Created with BioRender.com

Although these networks have only recently been used successfully, they have already become one of the most popular tools used by companies and researchers to accomplish complex tasks that were once only possible for humans. For example, artificial neural networks are one of several models used by Spotify to make music recommendations. They use neural networks to identify features of songs (such as tempo, tone, etc.) from their raw audio files. The complex representations possible in a neural network allow the model to “understand” the music so that Spotify can recommend songs that are most similar to those you already enjoy.

Another popular test of neural networks has been their ability to outperform expert humans in games. Before the advent of neural networks, several other computer programs had been successful in beating human chess masters in games of chess³, but none could successfully beat an expert in the game Alpha Go. In 2015, a neural network created and trained by a team at Google DeepMind defeated the European Go champion Fan Hui, making it the first computer program to beat a human Go expert⁴ . Further modifications of the program have made it even more successful, with networks reaching the level of master players after just a few days of training⁵. Recently, the same group released a network capable of playing all 57 Atari video games at higher performance levels than the average human⁶. These and the many other accomplishments of neural networks over the last decade have computer scientists and neuroscientists alike excited about what neural networks are capable of accomplishing.

Despite the impressive accomplishments of neural networks in the last decade, they are still far from reaching the same capacity as the human brain. One major limitation of artificial neural networks is that, unlike the human brain, they struggle to generalize beyond the specific tasks and stimuli that are used during training. For example, if you train a neural network to recognize pictures of pugs the same network will struggle to recognize pictures of your friends. If you then train the network to recognize pictures of your friends it will lose the ability to recognize pugs. This phenomenon is known as catastrophic forgetting and it describes the tendency of artificial neural networks to rapidly forget what they have previously learned when they are taught a new task. This is because the strengths of connections between neurons that are best for one task will not be best for another. When the network is re-trained, the weights change and it loses its ability to complete the first task. One solution is to simultaneously train the network to recognize both pugs and the faces of your friends, but as the complexity and number of desired tasks increases it becomes intractable to train networks this way. Instead, scientists want to design models that can find a set of weights that allow the network to work as intended while not being too specialized to one particular task or set of stimuli.

Computer scientists and neuroscientists are working together to overcome the current limitations of neural networks and to develop better and more refined models. With these complementary contributions, the field is likely to continue making huge progress over the next decade. Whether you are invested in the details of these developments or not, you will experience their benefits through technology that is already an integral part of our lives such as content recommendations, voice recognition systems, translators, and search engines. Next time Spotify recommends a perfect new song, think about the amazing ways that these algorithms combine neuroscience and computer science to find exactly what you’re looking for.

Image Credits:

Cover Image by Gordon Johnson from Pixabay

Figure 1 Created with images from Pixabay users Free-Photos, Free-Photos, StockSnap, StockSnap, designerpoint, and SGNPhotography

Figure 2 Created with BioRender

References:

1. Barlow, H. B., Fitzhugh, R. & Kuffler, S. W. Change of organization in the receptive fields of the cat’s retina during dark adaptation. J. Physiol. 137, 338–354 (1957).

2. Hubel, D. H. & Wiesel, T. N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962).

3. Feng-Hsiung Hsu. IBM’s Deep Blue Chess grandmaster chips. IEEE Micro 19, 70–81 (1999).

4. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

5. Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

6. Badia, A. P. et al. Agent57: Outperforming the Atari Human Benchmark. ArXiv200313350 Cs Stat (2020).

How Spotify Uses What We Know About the Brain to Make Music Recommendations

3 thoughts on “How Spotify Uses What We Know About the Brain to Make Music Recommendations”

Add yours

Leave a comment Cancel reply

Tags

Share this:

Related

3 thoughts on “How Spotify Uses What We Know About the Brain to Make Music Recommendations”

Add yours

Leave a comment Cancel reply

CATEGORIES

Tags