(Continuation of: The commercial future of Ambisonics, 15/5/2013)


Dear colleagues,

following the recent standardization of 3D audio by Mpeg (ISO/IEC 23008-3) and related activities, I have come to the conclusion that the (older) B format up to 3rd order might need some updates.

However, I also came to the conclusion that FOA (first order Ambisonics) could be easily included into all current distribution models for audio in the Internet, which are (to "99.98%") stereo-based. We nearly have been "there", in the above cited thread! ("The commercial future of Ambisonics")

I will start with this part, because you can see this as an own format. Which might be the perfect bridge or transition format for future surround/3D audio (3DA) formats...


I. UHJ (surround/3D audio) as extension of stereo based files (distribution via Internet, on discs and streaming, including YouTube, Spotify etc.)

a) As Richard Elen (and me) have suggested, you could distribute surround sound and 3D audio as (relativey simple) extension of (UHJ encoded) stereo files. You would have to add to a stereo file (.aac file, for example) a 3rd audio channel, OR two audio channels, as an extension audio stream. The restriction was that these extension would have to fit into the current distribution models, say downloadable AAC files via iTunes.

Contrary to my/our first impressions, this is firstly possible (this has already been pretty clear), and secondly feasible < without any serious drawbacks >. Which will be shown...

b) Technically speaking, you would have to distribute the ("downsampled") stereo file of FOA, which contains some surround information, and the one or two audio extension streams.

This is of course UHJ, brought into some AAC extension scheme.

http://en.wikipedia.org/wiki/Ambisonic_UHJ_format

Although UHJ permits the use of up to four channels (carrying full-sphere with-height surround), only the 2-channel variant is in current use (as it is compatible with currently-available 2-channel media). In Ambisonics, UHJ is also known as "C-Format".


(Small potential problem:

"UHJ was developed by the Ambisonic team, incorporating work done by the BBC (on their quadraphonic system, Matrix H) and Duane Cooper (on Nippon Columbia's UD-4/UMX quadraphonic system) and others, and building on the then-current version of Ambisonics, System 45J. The initials indicate some of sources incorporated into the system: U from Universal (UD-4); H from Matrix H; and J from System 45J."

This means you < might > think about an update of UHJ, to achieve more consistency between B format and the UHJ scheme. Or you might leave things how they are defined, for historical reasons. In any case, you have to be aware of this...
)


Although an hierarchically extended version of UHJ stereo has been tested in the area of FM broadcasting, nobody hass tried to distribute UHJ (hierarchically) extended stereo files via the Internet. Which is just a head-banging fact... Or maybe there are some deeper reasons?!

If a third channel (T) is available, this can be used to give improved localisation accuracy to the planar surround effect when decoded via a 3-channel UHJ decoder. The third channel does not have to have full audio bandwidth for this purpose, leading to the possibility of so-called "2½-channel" systems, where the third channel is bandwidth-limited to 5 kHz. The third channel can be broadcast via FM radio, for example, by means of phase-quadrature modulation. This configuration was tested by the Independent Broadcasting Authority (IBA) in the United Kingdom as a method of broadcasting surround recordings. 2½ or 3-channel UHJ delivers the same accuracy as 3-channel (WXY) B-Format


Adding a fourth channel (Q) to the UHJ system allows the encoding of full surround sound with height, known as Periphony, with a level of accuracy identical to 4-channel B-Format.



c) UHJ extended AAC files

AAC allows up to 16 audio channels, and can include 16 data channels. (I believe that .aac as a < file format > is just .m4a, or .mp4.)

To offer a backward-compatible extension of a < UHJ extended > AAC stereo file, you would have to include the T and Q audio channels as 3rd or 4th audio stream, somewhere. (Probably you could "label" such a file as stereo, the first 2 channels being L and R. Include some tags/flags in the header that there are one or two further < extension > audio channels, which would have to be decoded by a UHJ decoder. The decoder could be an app running on a smartphone, and the output could be a binaural version of the surround or actually LRTQ 3D audio recording.)

If this "audio channels" approach doesn't work, use the "data" extensions of .mp4. (T and Q are not direct audio channels, so this might actually be the formally correct approach... Because T and Q go into some decoder, as extension < data >.)


d) Bitrate limits

Whereas Apple uses 256 kbps ("VBR") as current standard (they have used 128kbps before), the usual limit for AAC stereo is 320kbps, both for CBR/VBR (Anything above would not necessarily be "undefined", but probably would break most existing hardware/software.)

This means that you have 80kbps /channel, or 160kbps for the L/R (back-ward compatible) stereo file. (IMHO, this is a sufficient value.)

IF people think that you should distribute AAC stereo files with higher bitrates, there are several solutions.

- The UHJ article already mentions that the T channel could be bandwidth-limited.

The third channel does not have to have full audio bandwidth for this purpose, leading to the possibility of so-called "2½-channel" systems, where the third channel is bandwidth-limited to 5 kHz.


(I suggest you could do the same with the Q channel.


- Within AAC, you could compress (full) channels with different bitrates. (I was tempted to patent this :-D , but as I don't think this is even necessary... )

- You could probably use specific properties of the .MP4 format to get rid of any bitrate limitations, assumption by a person (= me) who has worked on backward-compatible stuff and standards... (Any interested company or interested party is free to mail to me.. If the asking company doesn't sue me at the moment, I usually will answer in a friendly and more or less competent way! )


- You can offer a bundle within a typical container format, which means you have an .aac file and an associated audio/data file. Related to the last point, but not necessarily the same.



e) Is there enough surround content?

I think yes!

- Original UHJ/FOA/soundfield recordings

- 5.1 surround recordings

You would have to transcode 5.1 into some WXY representation (lossy), and bring this into some LRT (UHJ) form. (Looks like a lossless process)

Whreas this is not without any drawbacks, you could distribute your 5.1 surround recordings in some (stereo) backward-compatible form. Which means you fit into the current AAC/stereo based file/streaming environment, offering something new/better/additional as extension to stereo files or streams.

Transcoding from 5.1 to WXY is a solved problem, I think Xiph.org has applied this idea in some form.. (Whereas I didn't see a big deal in transcoding 5.1 film audio tracks into 3-channel Ambisonics, it is another thing to translate some 5.1 audio-only surround recording to some form which is backward-compatible to AAC stereo. And which fits into the current forms of audio distribution. This is an important issue.)



e) Consumer reception

Obvously a lot of music is nowadays listened on mobile devices, via headphones. Decoder programs/apps for LRT(Q) files which might decode Ambisonics to a (binaural) headphone version can easily be distributed. Without such a "special" playback option, people still can listen to a ("downfolded" but definitively viable) stereo version of the same surround recording.

If Ambisonics based surround/3D audio would be introduced in about this way, I fully believe that people would consider this to be something "cool". And if you are too old to think in cool-app terms, maybe even then you would be tempted to experiment with these files, because how does 3D audio actually sound? Can you hear the difference? O:-)


f) Other audio compression formats, "non-AAC"

Everything which has been said about AAC extension streams ("audio" or "data") would apply to (Ogg) FLAC, (Ogg) Vorbis and to the new "Opus" audio codec (official Internet standard) in a very similar way.

I would not use .MP3 for extended audio streams. (Compression efficiency does matter; AAC is more modern, and far more extensible as a file format.)

(If anyone plans to distribute CD/DVD bundles or DualDiscs offering LRT(Q) based surround/3DA versions of recordings, I still have some connections in this area and could give some practical advice. If anyone doesn't need advice and just does it, even better...)


g) Extensions of UHJ

It seems possible to extend the UHJ scheme to higher oders. However, this would be some idea for the future, as implementations of undefined formats are not that trivial. And I doubt there is some real value in this idea. (In practical terms, not as an academic idea... :-D )

Speaking about higher order and < the future > leads us to the second proposed format, an extended .AMB format....



II. A new ".AMB+" format going up to 4th order, or maybe higher

a) HOA is already "accepted" as an input format for the (future) encoder of (Mpeg) 3D audio.

Evidence:

The "Higher Oder Ambisonics Test Material" (12 items) is mostly presented in 4th order. (Mpeg-H CfP, Table-Set 2)

(H_01 is HOA order 6, H_02 - H_11 is 4th order, H_12 is 3rd order.)


As I have said here and elsewhere, Ambisonics and HOA are existing and completely "usable" formats for 3D audio. In fact, Ambisonics has always been about 3D audio... Because a lot of people seem to work currently at 4th order (see also the eigenmike, which goes up to 4th order), it seems necessary to extend the .AMB format to (at least) 4th order. Or should we replace .AMB with some really new (Ambisonics) format, but which should allow mixed orders?

http://en.wikipedia.org/wiki/Ambisonics

(Table Higher Order B format channels)

You would have to add the combinations

Horizontal  order             Height order          Number of channels

4 0 9 4 1 10 4 2 13 4 3 18 4 4 25


A "collision" of 9 channels of (full-shere) 2nd order and 4th order horizontal-order should not matter, IMO. (I also don't need the concept of "metadata" to distinguish between these cases. Every format or streaming format I know has some header - or several/repeated headers - with some descriptive information, which in this case should include the values for < horizontal order >, < height order > etc. Not just the number of channels, which also should be included. Many formats I know bear so many data fields that they actually describe the same properties in different but repeated ways. Redundancy is in this case no problem, missing information actually is. Therefore, it is certainly no problem to have two mixed order variants with 9 channels within the same scheme, if you also include some additional information like "horizontal order" etc. Call this metadata, descriptive data fields, or whatever..)

4th order looks like a good fit to the 22.2 surround system, which serves as a kind of upper "reference" in the Mpeg-H 3D audio framework. (For example, you have 10 fairly evenly distributed loudspeakers n the 0º plane, which is the minimum amount of speakers you need to fully represent horizontal 4th order.)

The 4h3p mixed order system seems to fit quite well to the complete "3D" 22.2 layout, BTW. You have the horizontal plane, middle plane at about 35º but in any case near 45º, and the 90º over-head speaker. 18 channels coded via AAC and 64kbit/s are less than 1,2MBit/s, which is still less than "DTS on CD". AAC has been proven to be a good codec to compress Ambisonic/HOA audio channels. )


A future surround format should supersede 5.1/6.1 surround, which is the undisputed current film standard and will survive for a long time. You would have to include height or (full-sphere) 3D audio into any next-gen surround format. If FOA is not really a good format for cinema audio, you would probably not go to 2nd order, but probably to about 3rd and 4th orders, to obtain some real improvements. (Is there any reason that you could not apply advanced Ambisonics decoder technologies for 1st order - say Harpex - also to 3rd and 4th order? If so, I would try this < before > I go to 7th, 10th or xxth orders...)


b) Ambisonics is not included into current standardization efforts for 3D cinema audio (but HOA is referred as "encoder input format"), nevertheless it is an existing and real alternative to other developping standards:

- Ambisonics/HOA is a 3D audio format, by definition. (Sound fiels are not restricted to some "horizontal plane".)

- The Ambisonics theory seems to be quite developped. As long as we don't talk about "any order" approaches (too much freedom means often just "lack of definition"), you can develop practical applications. (We talk about some format up to 4th order, not abstract "nth" order.)

- It is a format which fits to different loudspeaker layouts. (With some restrictions, but other formats might have similar problems, included object-based codecs.)

- It is full-sphere (and not restricted by some hemispherical layout). Binaural audio presented via headphones is compatible to full-sphere formates!



c) Company support: This is far better than some people might think. Obviously the people at Orange/FT are quite "positive" about HOA. BBC might also show some renewed interest. Qualcomm (now a very big company, because of its expertise in wireless technologies and the fact that they are a major manufacturer of ARM based mobile processors) has shown some significant interest in the development of 3D audio, including HOA.

For example:

http://jobs.cellular-news.com/index.php?post_id=34345

Huawei is interested in HOA, among other companies in Asia.

In the end, Ambisonics and HOA might not need as much company support as other technologies, because there is some established theory behind, solutions are really getting more and more available, and Ambisonics (FOA/HOA) is truly an open system.


d) Extension of the .AMB scheme to 4th order

"Furse-Malham higher-order format (FMH-Format) is a set of coefficients that can be applied to the first 16 B-format channels. The FMH set of coefficients applies weightings to the channels such that all the spherical harmonic coefficients have a maximum value of unity. Whilst this approach is not rigorously "correct" in mathematical terms, it has significant engineering advantages in that it restricts the maximum levels a panned mono source will generate in some of the higher-order channels.[21]

The Furse-Malham set of weighting factors is part of the ".amb" specification for downloadable B-Format files."

(Wikipedia article about Ambisonics)


So, if some people seem to work at exactly 4th order: Can you extend the (mixed-order) .AMB system to 4th order? What would be the weight factors?

If this is not possible: How should a modernized mixed-order file format up to 4th order look like? (I am aware that you could just "mix" .AMB and a 4th order format proposal, at least in this case. This would be a so-called "hack", albeit a workable hack... Monsieurs Malham, Furse, Adriaensen, Greene: Please present some workable and consistent solution within Western scientific-mathematical traditions, and if possible do this soonly. :-) Cos even the technical terms "hack" and "hacker" might lead to some serious and joint investigation by the Holy Inquisition and its modern form called NSA, which means we should aim for a mathematical, beautiful and eternal solution, not for some geek < hacks > ... Aaaargh, I have used the wrong word, yet again... :-)



III. The proposals (I) and (II) are presented in some joint form, because I believe they are complementary.

You would need a "lightwight" surround/3DA format as presented in (I), which is backward-compatible to stereo distribution but is a true format for surround sound and "even" 3D audio ("even" used for justified PR purposes!), via a 3rd and 4th audio extension channel. This is a bit like the older G format. However, I believe it makes far more sense to extend stereo files to surround sound than surround files (5.1) to surround sound. ;-)


II (".AMB+" up to 4th order") is a proposal for a powerful surround/3DA format, but which still is in the limits of a true CE format (note that AAC allows effective compression of up to 25 channels), can be decoded to binaural representation (mobile listening via headphones), and might be good enough to be used as "wide area" format, including for applications like life concerts, cinema audio, spatial audio for theatre/concerts/museums, etc.)


You would start with (I), which could be implemented even with the information and links provided within this posting. (Yet again, the only "problem" might be that the bitrate of the LR stereo downfold would be lower than 256kbps. If people think that you can't code HQ stereo within a 160kbps or say 192 kbps AAC file - 192 kbps was part of the patent I didn't apply for :'( - then I believe with some good reasons that there are several ways to get around the 320kbps restriction for 3/4 channels.. (Which means: I think you could code the 3 or 4 channels in say 384kbps or 512kbps. In any case, I believe the customers won't hear any difference between the 160kbps,192kbps and 256kbps AAC stereo rates, as long as you don't tell them about these differences. And as long they didn't read about "low" AAC bitrates in unnamed audiophile magazines. Note that FLAC based solutions could be developped for some audiophile customers, but point was to apply the LRTQ/UHJ hierarchical system within the current distribution of AAC encoded stereo files/streams.)


IV. Improved binaural representation via headphones

Note that headphones with HT "chips" and motion-corrected binaural playback of surround sound (including 3D audio) could easily be realized, with available and actually quite affordable chips.

Oculus Rift is the direct example for this, as this is a full (and certainly more complex) VR and gaming device.

http://worthplaying.com/event/E3_2013/PostE3_2013/89888/

In its current state, the Oculus Rift is an amazing piece of work, and after decades of dealing with VR technology, it seems that we may finally see a VR unit that is going to get it right.


Wikipedia writes about the Oculus Rift motion-tracking:

"Initial prototypes used a Hillcrest 3DoF head tracker that is normally 120 Hz, with a special firmware that John Carmack requested which makes it run at 250 Hz, tracker latency being vital due to the dependency of virtual reality's realism on response time. The latest version includes Oculus' new 1000 Hz Adjacent Reality Tracker that will allow for much lower latency tracking than almost any other tracker. It uses a combination of 3-axis gyros, accelerometers, and magnetometers, which make it capable of absolute (relative to earth) head orientation tracking without drift.[20][25]"


Now, apply the same or similar HT silicon (which is already very affordable) to HT/motion-tracking headphones... (I could give some detailled recommendations how to do this, but this is also one of the next steps... Nice to see that at least the video and gaming people have kept some sense for cool technology and seemingly "weird" ideas, so to speak. How many motion updates per second would a "fluent" head-tracking binaural decoder/decoding program actually requiere? Ye Ambisonics experts, what do you think or better know?! You would have to decode some UHJ/".AMB+" file and shift the soundfield relative to the head position, I guess. The head position needs some regular and frequent updates, what we easily get b y now. You could also track the < absolute > movements of persons within some area. Say: Your decoder program tracks the movements of the visitors in some museum or building, and plays the associated audio/explanations fitting to the current position. "This is the dining hall of the castle, which was quite cold during winter, but warm or even hot during summer." Ok, this was a truly dull example... :-) )


Best regards

Stefan Schreiber                                          Lisbon



_______________________________________________
Sursound mailing list
[email protected]
https://mail.music.vt.edu/mailman/listinfo/sursound

Reply via email to