The blog Ambisonics as an Intermediate Spatial Representation (for VR) explains in detail the concept of intermediate spatial (or 3D) representations. In summary, busses at the higher levels of your bus hierarchy should be configured such that they preserve 3D information so that it can be used to binauralize for headphones, or “downmix” (for lack of a better term) for speakers. This is especially true when your game seeks to leverage 3D audio technology embedded in some platforms.
Wwise supports three types of intermediate 3D representations of audio submixes:
Audio Objects: The Object-Based representation lies at one end of the spectrum, where the sounds passing through a bus are not mixed, but are instead just gathered so that their individual positioning information is preserved until they are consumed by a binauralizer (or more generally, a “3D renderer”). This lets the 3D renderer work optimally, because there is no loss of 3D information. However, because the sounds are not mixed, instead of applying an Effect on a mixing bus’s handful of channels, the Effect must be applied to all sounds independently, which could be hundreds.
Fixed Objects: A channel configuration whose speaker positions are known. A typical choice is 7.1.4, because it has height speakers (above the ears) and can thus represent sound coming from above. On the other hand, 7.1.4 cannot properly represent sounds coming from below. Also, when a sound is not directly aligned with a speaker, its directionality is suboptimally conveyed by the three neighboring speakers.
Ambisonics: Ambisonics is similar to fixed objects, in that the number of channels is constant. However, unlike fixed objects, the spatial representation is invariant to rotation, and thus has uniform precision in all directions. The representation’s sharpness is proportional to the ambisonic order.