Why you need spatial audio for your podcast

When was the last time you took a moment to listen mindfully to your surroundings? That is, discerning every sound around you, its origin, and other information it reveals about the place. As humans, we have a natural ability to sense sound in three dimensions. Why, then, are podcasts usually a one-dimensional representation of our reality?

Spatial audio & storytelling

«Nights can never be real and enjoyable without the croaking of frogs and the chirping of crickets.»

― Michael Bassey Johnson, Song of a Nature Lover

In literature, the author, like Bassey Johnson above, has the freedom to describe the surroundings to their reader — filling in the gaps with words to paint the full picture. In movies, the director uses both visual and auditory cues to set the scene. It could be a bird taking flight, foreshadowing a predator approaching, or a character tapping their foot as a display of annoyance. There’s a good reason for this: film and television become more engaging when the audience is tasked with piecing the story together through a combination of visuals, dialogue, sounds, and music. Show, don’t just tell. In podcasting, you're limited to audio. However, all audio is not the same. In spatial audio productions, the dialogue, sound effects, and ambience become more realistic, enabling you to “show” the listener what’s happening.

Loading player

Photo: Roman Dolgikh

Imagine you’re leaning against the trunk of a tall pine tree. The sun is kissing your face, and the vast forest stretches out before you. It’s calm and peaceful. Little birds are singing in the trees. The leaves rustle gently in the wind. You hear the faint flow of a nearby creak as it winds through the bush and bramble. You take a deep breath, close your eyes, and listen attentively to the serene landscape around you. The sounds merge together into a steady soundscape, tucking you in like a soft blanket. This is usually referred to as the “ambience” or the “bed” of a spatial audio mix. The ambience can give the listener a realistic depiction of the environment in which the story takes place.

You’re dozing off in the sun when you hear a bird lift off over your left shoulder. Her calls carry through the forest as she circles treetops before diving down to land on a nearby branch to your right. Moments later, you hear a twig snap followed by gentle footsteps on mushy pine straws. You open your eyes, ready to greet a like-minded soul enjoying this beautiful little glade.

These sounds are more pointed. You can easily locate them in your soundscape, and maybe even connect them to particular “objects” around you. It enables you to know that another person is arriving shortly, the rough distance between you, the direction from which they are coming, and even how fast they are moving. All long before you actually see them. Your ears and mind enable you to deduce this information, using factors like reflections from other objects in the environment, the volume of the sound, and the time delay between your ears. Your ears are “gathering” this input, and your mind is “translating” it into a rich experience with information that will guide your actions. All of this happens without you even thinking about it. It’s all part of our nervous system, and our hearing is a very useful tool for making sense of and navigating the world around us.

How our ears work

Human beings are pretty spoiled when it comes to high audio quality. We are born with the best hi-fi audio system ever conceived attached to our heads. With your ears, you can go into nature and experience high-resolution spatial audio whenever you want. Most of us take it for granted, but we don't realize how much we depend on our ears to orient ourselves in space and to identify the location of other objects or potential dangers in our environment.

It’s rooted in our primal need for survival. Just think of the people living on earth 100,000 years ago. Whenever they heard a loud rumble from the earth or the heavens, they knew to find cover. Or when they heard a loud growl behind them in the forest. They knew to run and which direction to take.

Photo: Jessica Flavia

Like with our hearing, our ancestors equally relied on storytelling to survive. Since the break of dawn, this oral tradition has been our primary vehicle for passing on vital knowledge about the land and other communities, where to find food, and how to make tools. Not to mention entertaining each other. It’s the cultural glue that connects us. Powerful stories engage our imagination and emotions, and spatial audio is the perfect tool for immersive storytelling. Suppose the spatial mix is believable, and you are listening on speakers or headphones capable of providing a realistic experience. In that case, you will “feel” the story more than you would with a conventional stereo production.

Consuming spatial audio

A great spatial audio production contains ambiences and audio objects to tell the story. Ambiences like wind in trees, ocean surf, traffic, or even AC noise will help the listener understand where the story takes place. The dialogue is the most essential audio object and drives the story forward. Sound effects and foley, such as footsteps or doors opening, are other examples of objects in a spatial mix that will emphasize actions in the story. Together, ambiences and objects will immerse the listeners into the story, making them feel like they are there, participating. It's similar to virtual reality experiences but for audio. An essential part of this realism is adding the third dimension we’re used to in real life.

On a headset or home stereo, you really only have one dimension. Left to right. If you continue by adding speakers behind you, like on a surround sound system, you have introduced a second dimension. Front to back. This is 2D audio. However, adding the third dimension, up and down, enables 3D audio. But there are a couple of challenges connected to this. One is having access to a space where you can consume 3D audio, and, as we touched on earlier, making it sound realistic is another challenge.

Photo: Raouf Nouari

Let’s talk about 3D audio on loudspeakers. The more speakers you have, the higher the resolution and realism. The problem is that this will quickly become an expensive endeavor. In addition, you would also need adequate space to install the speakers, the know-how to get the setup right, and compatible software to playback 3D audio. Ultimately, this method is not very accessible for the average Joe. In recent years, soundbars and smart speakers have become more advanced, and they make spatial audio more accessible for living rooms. But there is an easier way to get an immersive, true-to-life audio experience.

Enter headphones. Most people have a pair, and we can easily play content from our personal devices, like laptops, tablets, or smartphones. As humans, we only have two ears, yet we can still experience 3D audio. Even though headphones only have two speakers or audio channels, they can provide a realistic experience in three dimensions. As you might've guessed, the process to achieve this is quite complex.

Simply put, you have to process the audio stream through a filter that mimics how our ears pick up sound in real life. We call this filter an HRTF, aka head-related transfer function. This filter represents the shape of your ears and ear canals, your upper body, your head size, the distance between your ears, and so on. Your body and ears make up a filter that influences how you experience sound. Your brain learns how to consider this filter when it's translating the input you hear into information, thus enabling you to estimate the position and distance of sound sources in your environment.

Creating spatial audio

We come in different sizes and shapes, so a personalized HRTF will naturally provide the best experience. Today, there are still no simple ways for consumers to create accurate personal HRTFs, but some good generic HRTFs are available. In spite of these challenges, this is probably how most people will experience or produce 3D audio in the future. And the future might not be too far away.

Big brands like Apple, Dolby, and Sonos are paving the way for spatial audio to be more accessible to listeners. For example, with just an iPhone and the Airpods Pro, you can take an image of your head to enjoy personalized spatial audio on the go. Traditionally, the production of spatial content has been a bottleneck. But that’s where Nomono comes in. With the Nomono Sound Capsule, we’re dedicated to simplifying and automating the production of spatial audio. That way we’re enabling content creators to record high-quality audio without needing university degrees in signal processing or audio production. Ultimately, this will enable all listeners to hear the full story.

Photo: The Bunch

Summary

So far in this post, we’ve used three different terms to describe this kind of audio format: 3D, immersive, and spatial. They all mean the same thing: an audio experience that sounds like it’s happening around you in real life. And we are rapidly closing in on achieving that. At CES2024 in Las Vegas, we got a demo of MPEG-I, 1 of 4 new spatial audio formats released there. They played different music over speakers in a small hotel room while we walked around, noting how the music sounded from each position. Then they gave us a headset with head-tracking and played the same audio over this while we moved around the room. That is the most convincing 3D audio we have heard. We honestly couldn't tell what was real and had to remove the headset to double-check our senses.

With that technology, you could go beyond realistic. Imagine coupling it with the new developments in VR; it will elevate the entertainment value of your favorite movies and significantly improve news and educational content. Spatial audio will make listening more authentic and immersive, but it will also make voices clearer and the content easier to understand.



Written by:

Ruben Åeng

Lead Audio Engineer at Nomono

Viktor Rydal

Chief Design Officer at Nomono


In collaboration with:

Martin Rieger

Immersive 3D Spatial Audio Expert at VRTONUNG


Want to hear more stories like that? Subscribe to our newsletter: