Ambisonics in Wwise: Overview
The key feature of the ambisonics pipeline is the ability to set the channel configuration of busses to 1st, 2nd, 3rd, 4th, or 5th order ambisonics. From there, any non-ambisonic signal routed to such a bus is automatically encoded to ambisonics, and any ambisonic signal routed to a non-ambisonic bus is automatically decoded, according to the chosen positioning settings. Finally, ambisonic signals routed to ambisonic busses are either passed unchanged (Direct Assignment) or rotated and contracted according to the relative positions and orientations of the game object and listener (3D Spatialization). In addition, you can:
- Import and play back B-format assets in FuMa format (up to 3rd order) or AmbiX format (up to 5th order);
- Use Effect plug-ins to customize decoding;
- Use Effect plug-ins such as Google Resonance or Auro Headphone to convert ambisonics to binaural;
- Pass an ambisonic bed to ambisonics-capable audio devices;
- Record an ambisonic bus to disk and re-import it using the Wwise Recorder;
- Use your favorite Wwise plug-ins* to process ambisonics as they would process other formats.
*Except Stereo Delay and Matrix Reverb.
Intermediate spatial representations for the purpose of 3D audio
The blog Ambisonics as an Intermediate Spatial Representation (for VR) explains in detail the concept of intermediate spatial (or 3D) representations. In summary, mixing busses at the higher levels of your bus hierarchy should be configured such that they convey 3D information, in order to be properly binauralized to headphones, or “downmixed” (for lack of a better term) to speakers. This is especially true when your game seeks to leverage 3D audio technology embedded in some platforms.
Wwise supports three types of intermediate 3D representations of audio submixes:
- Audio Objects: Before the Object-Based Pipeline released with Wwise 2021.1, this meant no mix at all. The Object-Based representation lies at one end of the spectrum, where the individual sounds of a mix are not mixed per se, but are in fact just gathered so that their individual positioning information is preserved until they are consumed by a binauralizer (or more generally, a “3D renderer”). This lets the 3D renderer work optimally, because there is no loss of 3D information. However, sounds are not downmixed, meaning that when we could once apply an Effect on a mixing bus’s handful of channels, we now need to apply it to all sounds independently, which could be hundreds.
- Fixed Objects: In other words, a channel configuration whose speaker positions are known. A typical choice is 7.1.4, because it has “height” speakers (above the ears) and can thus represent sound coming from above. On the other hand, 7.1.4 cannot properly represent sounds coming from below. Also, when a sound is not directly aligned with a speaker, its directionality is suboptimally conveyed by the three neighboring speakers.
- Ambisonics: Ambisonics is similar to Fixed Objects, in that the number of channels is constant. However, the spatial representation is not better in some directions than others like with Fixed Objects: it is invariant to rotation, and thus uniformly blurry. The representation’s sharpness is proportional to the ambisonic order.
Setting up a binauralizer Effect on an ambisonic bus
Create a bus under the Master Audio Bus, where we will put our ambisonics binauralizer Effect, such as Google Resonance or Auro Headphones. All audio requiring binaural processing should be routed to this bus.
Set this bus to one of the ambisonic configurations. Used as the intermediate spatial representation, the higher the ambisonic order, the better. Although the bus’s configuration will be ambisonics, the bus’s Out Config will be made stereo by the binauralizer Effect.
Ambisonics versus Audio Objects
Ambisonics is a suitable format to be leveraged by 3D renderers, including binauralizers. As opposed to Audio Objects, it guarantees a fixed number of channels to be processed by submix Effects and the 3D renderer itself, at the expense of some precision loss in terms of directionality. The precision obtained is directly proportional to the number of channels you choose (from 4 to 36).
On the other hand, the Object-based pipeline makes no concessions when it comes to directional accuracy. It will thus preserve the Objects’ precious 3D information at all costs. It is then up to designers to find other means (for example, voice limiting) to keep the number of Audio Objects, and thus channels, under control.
Ambisonics with Audio Objects
The Object-based pipeline in Wwise supports Audio Objects with multiple channels, including ambisonics. This means that the two representations are not mutually exclusive. For example, you can use Audio Objects in order to preserve the sounds that would benefit the most from the optimal rendering of directionality provided by Audio Objects, all while using ambisonics for the rest. In Audio Object parlance, every sound that is not preserved as an Audio Object goes to the bed. And ambisonics is ideal for representing a bed, for the same reasons that were mentioned above.
Setting up an ambisonic bed in the context of Audio Objects
The “top-level” spatial representation should use Audio Objects, but we create a child, ambisonic bus for the bed, thus forcing an ambisonic downmix of sounds routed to it. Sounds routed directly to the parent bus will be treated as objects. The ambisonic bed will also be treated as a single, multichannel object.
At the moment, no software Object binauralizer Effect ships with Wwise. But some platforms support object binauralization at the Audio Device level. For example, Windows Sonic. In that case, the Master Audio Bus would inherit the Audio Object configuration of the Audio Device, and the Ambisonic Bed can be made its direct child.
Refer to Audio Objects - From the System Audio Device to the Endpoint for more details about the role of the Audio Device in the Object-based pipeline.
How do ambisonic representations rotate with respect to the head-mounted device?
Normally, the head-mounted device (HMD) head-tracking data should be continuously passed to Wwise by the game engine as the listener orientation, via the SetListenerPosition() API. Sounds whose positioning is set to 3D are attached to game objects. When they are mixed into an ambisonic bus, the encoding angles depend on the game object position relative to the listener’s orientation. Thus, the ambisonic downmix is made of sound sources that are already rotated with respect to the HMD.
You may also use 3D Spatialization with B-format sources. In such a case, they are considered as sound fields and are rotated according to the relative orientation of the game object and listener (see Ambisonics as sound fields, below).
Ambisonics can be rotated with minimal CPU and memory usage by computing rotation matrices in the ambisonics domain. This makes it an ideal format for exchanging audio for VR. For example, one may build a complete auditory scene with sources coming from any direction, encode them to an ambisonic signal, and store it to disk using an ambisonics-capable DAW (such as Wwise). At the moment of playback in the VR device, the playback engine only has to read the head tracking coordinates, rotate the ambisonic signal in the opposite direction, and then decode/virtualize it to a binaural signal for headphones.
In the previous section, rotation was achieved interactively. You could instead use Wwise to produce non-interactive (cinematic) content for VR, apart from rotation due to head tracking. To do so, you simply need to replace the aforementioned binaural virtualizer Effect with a Wwise Recorder Effect on the ambisonic bus. The recorded file will be a compatible FuMa or AmbiX file, with the same order as that of the bus. You can then embed this file into your 360 video and let the player rotate the sound field using the process described above.
More or less the same considerations apply to Auxiliary Busses used for environmental Effects. Currently, the RoomVerb and the Convolution Reverb support ambisonics natively, however the MatrixReverb does not. You may use them on a stereo or 4.0 (or more) bus, whether you want to use the front-back delays or not, and route them to ambisonic busses. Their output will be encoded to ambisonics following the same rules that apply to 2D sounds. While you can do the same with RoomVerb, you may also use it directly on ambisonic busses. The directional channels will consist of decorrelated signals, similar to standard multichannel configurations. The higher orders will result in more decorrelated channels and will, therefore, require more processing. We encourage you to experiment to find the desired balance between quality and performance.
Ambisonics Panning versus VBAP
Encoding of mono sources into an ambisonics bed may be used for aesthetic reasons. The default panning algorithm of 3D sounds implemented in Wwise is based on the ubiquitous VBAP algorithm, which maximizes sound accuracy with constant overall energy, at the expense of variability of the energy spread. That is, the energy spread is minimal when the virtual source is aligned with a loudspeaker and is maximal when it is exactly in the middle of a loudspeaker arc (for a 2D config such as 7.1) or triangle (for a 3D config such as 7.1.4). Ambisonics, in contrast, has a constant energy spread regardless of the source direction and loudspeaker layout. Its spread is inversely proportional to the order. First order ambisonics is therefore very blurry. Since ambisonics submixes are decoded automatically to the standard configuration of their parent bus (such as 5.1 or 7.1.4), using the all-around ambisonics decoding technique, all-round ambisonics panning  may be implemented in Wwise by routing 3D sounds to an ambisonics bus and then routing this bus to a parent bus that has a standard configuration.
Ambisonics as sound fields
Ambionics sounds, either recorded or synthetic, are great for implementing ambient sounds. Although higher order ambisonics are typically used in the context of your game’s intermediate spatial representation, low orders are usually sufficient for ambient sounds. The blog Using Ambisonics for Dynamic Ambiences covers the subject in depth.
Sound field contraction
When used as a sound field with 3D Spatialization, ambisonics, like other multichannel files, will collapse into a mono point source if its Spread is left at 0, which is the default. In order to be fully surrounded by the sound field, you need to make its Spread equal to 100% by adding an appropriate Attenuation Shareset to the ambisonic sound.
As explained in Using Ambisonics for Dynamic Ambiences, with 100% Spread and Position + Orientation 3D Spatialization, the ambisonic sound field will be rotated based on the relative orientation of the emitter and listener game objects, effectively giving the illusion that the sound field is tied to the world.
Ambisonics represents wavefronts coming towards the listener, so the sources that constitute the sound field are always away. It is thus difficult to use this representation to translate within the sound field. However, Spread can help approximate the effect of going in and out of a sound field. Indeed, Spread less than 100% has the effect of contracting the sound field in the direction of the position of the emitter game object, as explained in Effect of Spread. This is exactly how Wwise Spatial Audio Rooms handle room tones and reverberation (see Room Tones):
- When the listener is inside the room, the Spread is close to 100%, and the room tone and reverb bus are tied to the room’s orientation per rotation of the sound field.
- When the listener is located outside, the Room’s game object is placed on the closest portal, with Spread less than 50% and dependent on the portal’s aperture. The sound field is thus contracted towards the location of the portal, all while being rotated based on the orientation of the room. As the listener walks away, the sound field is further contracted into a point source at the location of the portal.