[Sursound] Two new approaches for the distribution of surround sound/3D audio

Stefan Schreiber Sun, 28 Jul 2013 19:58:09 -0700

(Continuation of: The commercial future of Ambisonics, 15/5/2013)



Dear colleagues,

following the recent standardization of 3D audio by Mpeg (ISO/IEC23008-3) and related activities, I have come to the conclusion that the(older) B format up to 3rd order might need some updates.

However, I also came to the conclusion that FOA (first orderAmbisonics) could be easily included into all current distributionmodels for audio in the Internet, which are (to "99.98%") stereo-based.We nearly have been "there", in the above cited thread! ("Thecommercial future of Ambisonics")

I will start with this part, because you can see this as an own format.Which might be the perfect bridge or transition format for futuresurround/3D audio (3DA) formats...

I. UHJ (surround/3D audio) as extension of stereo based files(distribution via Internet, on discs and streaming, including YouTube,Spotify etc.)

a) As Richard Elen (and me) have suggested, you could distributesurround sound and 3D audio as (relativey simple) extension of (UHJencoded) stereo files. You would have to add to a stereo file (.aacfile, for example) a 3rd audio channel, OR two audio channels, as anextension audio stream. The restriction was that these extension wouldhave to fit into the current distribution models, say downloadable AACfiles via iTunes.

Contrary to my/our first impressions, this is firstly possible (thishas already been pretty clear), and secondly feasible < without anyserious drawbacks >. Which will be shown...

b) Technically speaking, you would have to distribute the("downsampled") stereo file of FOA, which contains some surroundinformation, and the one or two audio extension streams.


This is of course UHJ, brought into some AAC extension scheme.

http://en.wikipedia.org/wiki/Ambisonic_UHJ_format

Although UHJ permits the use of up to four channels (carryingfull-sphere with-height surround), only the 2-channel variant is incurrent use (as it is compatible with currently-available 2-channelmedia). In Ambisonics, UHJ is also known as "C-Format".



(Small potential problem:

"UHJ was developed by the Ambisonic team, incorporating work done by theBBC (on their quadraphonic system, Matrix H) and Duane Cooper (on NipponColumbia's UD-4/UMX quadraphonic system) and others, and building on thethen-current version of Ambisonics, System 45J. The initials indicatesome of sources incorporated into the system: U from Universal (UD-4); Hfrom Matrix H; and J from System 45J."

This means you < might > think about an update of UHJ, to achieve moreconsistency between B format and the UHJ scheme. Or you might leavethings how they are defined, for historical reasons. In any case, youhave to be aware of this...

Although an hierarchically extended version of UHJ stereo has beentested in the area of FM broadcasting, nobody hass tried to distributeUHJ (hierarchically) extended stereo files via the Internet. Which isjust a head-banging fact... Or maybe there are some deeper reasons?!

If a third channel (T) is available, this can be used to give improvedlocalisation accuracy to the planar surround effect when decoded via a3-channel UHJ decoder. The third channel does not have to have fullaudio bandwidth for this purpose, leading to the possibility ofso-called "2½-channel" systems, where the third channel isbandwidth-limited to 5 kHz. The third channel can be broadcast via FMradio, for example, by means of phase-quadrature modulation. Thisconfiguration was tested by the Independent Broadcasting Authority(IBA) in the United Kingdom as a method of broadcasting surroundrecordings. 2½ or 3-channel UHJ delivers the same accuracy as3-channel (WXY) B-Format

Adding a fourth channel (Q) to the UHJ system allows the encoding offull surround sound with height, known as Periphony, with a level ofaccuracy identical to 4-channel B-Format.




c) UHJ extended AAC files

AAC allows up to 16 audio channels, and can include 16 data channels. (Ibelieve that .aac as a < file format > is just .m4a, or .mp4.)

To offer a backward-compatible extension of a < UHJ extended > AACstereo file, you would have to include the T and Q audio channels as 3rdor 4th audio stream, somewhere. (Probably you could "label" such a fileas stereo, the first 2 channels being L and R. Include some tags/flagsin the header that there are one or two further < extension > audiochannels, which would have to be decoded by a UHJ decoder. The decodercould be an app running on a smartphone, and the output could be abinaural version of the surround or actually LRTQ 3D audio recording.)

If this "audio channels" approach doesn't work, use the "data"extensions of .mp4. (T and Q are not direct audio channels, so thismight actually be the formally correct approach... Because T and Q gointo some decoder, as extension < data >.)



d) Bitrate limits

Whereas Apple uses 256 kbps ("VBR") as current standard (they have used128kbps before), the usual limit for AAC stereo is 320kbps, both forCBR/VBR (Anything above would not necessarily be "undefined", butprobably would break most existing hardware/software.)

This means that you have 80kbps /channel, or 160kbps for the L/R(back-ward compatible) stereo file. (IMHO, this is a sufficient value.)

IF people think that you should distribute AAC stereo files with higherbitrates, there are several solutions.

- The UHJ article already mentions that the T channel could bebandwidth-limited.

The third channel does not have to have full audio bandwidth for thispurpose, leading to the possibility of so-called "2½-channel" systems,where the third channel is bandwidth-limited to 5 kHz.



(I suggest you could do the same with the Q channel.

- Within AAC, you could compress (full) channels with differentbitrates. (I was tempted to patent this :-D , but as I don't think thisis even necessary... )

- You could probably use specific properties of the .MP4 format to getrid of any bitrate limitations, assumption by a person (= me) who hasworked on backward-compatible stuff and standards...(Any interested company or interested party is free to mail to me.. Ifthe asking company doesn't sue me at the moment, I usually will answerin a friendly and more or less competent way! )

- You can offer a bundle within a typical container format, which meansyou have an .aac file and an associated audio/data file. Related to thelast point, but not necessarily the same.




e) Is there enough surround content?

I think yes!

- Original UHJ/FOA/soundfield recordings

- 5.1 surround recordings

You would have to transcode 5.1 into some WXY representation (lossy),and bring this into some LRT (UHJ) form. (Looks like a lossless process)

Whreas this is not without any drawbacks, you could distribute your 5.1surround recordings in some (stereo) backward-compatible form. Whichmeans you fit into the current AAC/stereo based file/streamingenvironment, offering something new/better/additional as extension tostereo files or streams.

Transcoding from 5.1 to WXY is a solved problem, I think Xiph.org hasapplied this idea in some form.. (Whereas I didn't see a big deal intranscoding 5.1 film audio tracks into 3-channel Ambisonics, it isanother thing to translate some 5.1 audio-only surround recording tosome form which is backward-compatible to AAC stereo. And which fitsinto the current forms of audio distribution. This is an important issue.)




e) Consumer reception

Obvously a lot of music is nowadays listened on mobile devices, viaheadphones. Decoder programs/apps for LRT(Q) files which might decodeAmbisonics to a (binaural) headphone version can easily be distributed.Without such a "special" playback option, people still can listen to a("downfolded" but definitively viable) stereo version of the samesurround recording.

If Ambisonics based surround/3D audio would be introduced in about thisway, I fully believe that people would consider this to be something"cool". And if you are too old to think in cool-app terms, maybe eventhen you would be tempted to experiment with these files, because howdoes 3D audio actually sound? Can you hear the difference? O:-)



f) Other audio compression formats, "non-AAC"

Everything which has been said about AAC extension streams ("audio" or"data") would apply to (Ogg) FLAC, (Ogg) Vorbis and to the new "Opus"audio codec (official Internet standard) in a very similar way.

I would not use .MP3 for extended audio streams. (Compression efficiencydoes matter; AAC is more modern, and far more extensible as a file format.)

(If anyone plans to distribute CD/DVD bundles or DualDiscs offeringLRT(Q) based surround/3DA versions of recordings, I still have someconnections in this area and could give some practical advice. Ifanyone doesn't need advice and just does it, even better...)



g) Extensions of UHJ

It seems possible to extend the UHJ scheme to higher oders. However,this would be some idea for the future, as implementations of undefinedformats are not that trivial. And I doubt there is some real value inthis idea. (In practical terms, not as an academic idea... :-D )

Speaking about higher order and < the future > leads us to the secondproposed format, an extended .AMB format....




II. A new ".AMB+" format going up to 4th order, or maybe higher

a) HOA is already "accepted" as an input format for the (future) encoderof (Mpeg) 3D audio.


Evidence:

The "Higher Oder Ambisonics Test Material" (12 items) is mostlypresented in 4th order. (Mpeg-H CfP, Table-Set 2)


(H_01 is HOA order 6, H_02 - H_11 is 4th order, H_12 is 3rd order.)

As I have said here and elsewhere, Ambisonics and HOA are existing andcompletely "usable" formats for 3D audio. In fact, Ambisonics has alwaysbeen about 3D audio...Because a lot of people seem to work currently at 4th order (see alsothe eigenmike, which goes up to 4th order), it seems necessary to extendthe .AMB format to (at least) 4th order. Or should we replace .AMB withsome really new (Ambisonics) format, but which should allow mixed orders?


http://en.wikipedia.org/wiki/Ambisonics

(Table Higher Order B format channels)

You would have to add the combinations

Horizontal  order             Height order          Number of channels

40 941 1042 1343 1844 25

A "collision" of 9 channels of (full-shere) 2nd order and 4th orderhorizontal-order should not matter, IMO. (I also don't need the conceptof "metadata" to distinguish between these cases. Every format orstreaming format I know has some header - or several/repeated headers -with some descriptive information, which in this case should include thevalues for < horizontal order >, < height order > etc. Not just thenumber of channels, which also should be included. Many formats I knowbear so many data fields that they actually describe the same propertiesin different but repeated ways. Redundancy is in this case no problem,missing information actually is. Therefore, it is certainly no problemto have two mixed order variants with 9 channels within the same scheme,if you also include some additional information like "horizontal order"etc. Call this metadata, descriptive data fields, or whatever..)

4th order looks like a good fit to the 22.2 surround system, whichserves as a kind of upper "reference" in the Mpeg-H 3D audio framework.(For example, you have 10 fairly evenly distributed loudspeakers n the0º plane, which is the minimum amount of speakers you need to fullyrepresent horizontal 4th order.)

The 4h3p mixed order system seems to fit quite well to the complete "3D"22.2 layout, BTW. You have the horizontal plane, middle plane at about35º but in any case near 45º, and the 90º over-head speaker. 18 channelscoded via AAC and 64kbit/s are less than 1,2MBit/s, which is still lessthan "DTS on CD". AAC has been proven to be a good codec to compressAmbisonic/HOA audio channels. )

A future surround format should supersede 5.1/6.1 surround, which is theundisputed current film standard and will survive for a long time. Youwould have to include height or (full-sphere) 3D audio into any next-gensurround format.If FOA is not really a good format for cinema audio, you would probablynot go to 2nd order, but probably to about 3rd and 4th orders, to obtainsome real improvements. (Is there any reason that you could not applyadvanced Ambisonics decoder technologies for 1st order - say Harpex -also to 3rd and 4th order? If so, I would try this < before > I go to7th, 10th or xxth orders...)

b) Ambisonics is not included into current standardization efforts for3D cinema audio (but HOA is referred as "encoder input format"),nevertheless it is an existing and real alternative to other developpingstandards:

- Ambisonics/HOA is a 3D audio format, by definition. (Sound fiels arenot restricted to some "horizontal plane".)

- The Ambisonics theory seems to be quite developped. As long as wedon't talk about "any order" approaches (too much freedom means oftenjust "lack of definition"), you can develop practical applications. (Wetalk about some format up to 4th order, not abstract "nth" order.)

- It is a format which fits to different loudspeaker layouts. (Withsome restrictions, but other formats might have similar problems,included object-based codecs.)

- It is full-sphere (and not restricted by some hemispherical layout).Binaural audio presented via headphones is compatible to full-sphereformates!

c) Company support: This is far better than some people might think.Obviously the people at Orange/FT are quite "positive" about HOA. BBCmight also show some renewed interest.Qualcomm (now a very big company, because of its expertise in wirelesstechnologies and the fact that they are a major manufacturer of ARMbased mobile processors) has shown some significant interest in thedevelopment of 3D audio, including HOA.


For example:

http://jobs.cellular-news.com/index.php?post_id=34345

Huawei is interested in HOA, among other companies in Asia.

In the end, Ambisonics and HOA might not need as much company support asother technologies, because there is some established theory behind,solutions are really getting more and more available, and Ambisonics(FOA/HOA) is truly an open system.



d) Extension of the .AMB scheme to 4th order

"Furse-Malham higher-order format (FMH-Format) is a set of coefficientsthat can be applied to the first 16 B-format channels. The FMH set ofcoefficients applies weightings to the channels such that all thespherical harmonic coefficients have a maximum value of unity. Whilstthis approach is not rigorously "correct" in mathematical terms, it hassignificant engineering advantages in that it restricts the maximumlevels a panned mono source will generate in some of the higher-orderchannels.[21]

The Furse-Malham set of weighting factors is part of the ".amb"specification for downloadable B-Format files."


(Wikipedia article about Ambisonics)

So, if some people seem to work at exactly 4th order: Can you extend the(mixed-order) .AMB system to 4th order? What would be the weight factors?

If this is not possible: How should a modernized mixed-order file formatup to 4th order look like?(I am aware that you could just "mix" .AMB and a 4th order formatproposal, at least in this case. This would be a so-called "hack",albeit a workable hack... Monsieurs Malham, Furse, Adriaensen, Greene:Please present some workable and consistent solution within Westernscientific-mathematical traditions, and if possible do this soonly. :-)Cos even the technical terms "hack" and "hacker" might lead to someserious and joint investigation by the Holy Inquisition and its modernform called NSA, which means we should aim for a mathematical, beautifuland eternal solution, not for some geek < hacks > ... Aaaargh, I haveused the wrong word, yet again... :-)

III. The proposals (I) and (II) are presented in some joint form,because I believe they are complementary.

You would need a "lightwight" surround/3DA format as presented in (I),which is backward-compatible to stereo distribution but is a true formatfor surround sound and "even" 3D audio ("even" used for justified PRpurposes!), via a 3rd and 4th audio extension channel. This is a bitlike the older G format. However, I believe it makes far more sense toextend stereo files to surround sound than surround files (5.1) tosurround sound. ;-)

II (".AMB+" up to 4th order") is a proposal for a powerful surround/3DAformat, but which still is in the limits of a true CE format (note thatAAC allows effective compression of up to 25 channels), can be decodedto binaural representation (mobile listening via headphones), and mightbe good enough to be used as "wide area" format, including forapplications like life concerts, cinema audio, spatial audio fortheatre/concerts/museums, etc.)

You would start with (I), which could be implemented even with theinformation and links provided within this posting. (Yet again, the only"problem" might be that the bitrate of the LR stereo downfold would belower than 256kbps. If people think that you can't code HQ stereo withina 160kbps or say 192 kbps AAC file - 192 kbps was part of the patent Ididn't apply for :'( - then I believe with some good reasons thatthere are several ways to get around the 320kbps restriction for 3/4channels.. (Which means: I think you could code the 3 or 4 channels insay 384kbps or 512kbps. In any case, I believe the customers won't hearany difference between the 160kbps,192kbps and 256kbps AAC stereo rates,as long as you don't tell them about these differences. And as long theydidn't read about "low" AAC bitrates in unnamed audiophile magazines.Note that FLAC based solutions could be developped for some audiophilecustomers, but point was to apply the LRTQ/UHJ hierarchical systemwithin the current distribution of AAC encoded stereo files/streams.)



IV. Improved binaural representation via headphones

Note that headphones with HT "chips" and motion-corrected binauralplayback of surround sound (including 3D audio) could easily berealized, with available and actually quite affordable chips.

Oculus Rift is the direct example for this, as this is a full (andcertainly more complex) VR and gaming device.


http://worthplaying.com/event/E3_2013/PostE3_2013/89888/

In its current state, the Oculus Rift is an amazing piece of work, andafter decades of dealing with VR technology, it seems that we mayfinally see a VR unit that is going to get it right.



Wikipedia writes about the Oculus Rift motion-tracking:

"Initial prototypes used a Hillcrest 3DoF head tracker that is normally120 Hz, with a special firmware that John Carmack requested which makesit run at 250 Hz, tracker latency being vital due to the dependency ofvirtual reality's realism on response time. The latest version includesOculus' new 1000 Hz Adjacent Reality Tracker that will allow for muchlower latency tracking than almost any other tracker. It uses acombination of 3-axis gyros, accelerometers, and magnetometers, whichmake it capable of absolute (relative to earth) head orientationtracking without drift.[20][25]"

Now, apply the same or similar HT silicon (which is already veryaffordable) to HT/motion-tracking headphones... (I could give somedetailled recommendations how to do this, but this is also one of thenext steps... Nice to see that at least the video and gaming people havekept some sense for cool technology and seemingly "weird" ideas, so tospeak. How many motion updates per second would a "fluent" head-trackingbinaural decoder/decoding program actually requiere? Ye Ambisonicsexperts, what do you think or better know?!You would have to decode some UHJ/".AMB+" file and shift the soundfieldrelative to the head position, I guess. The head position needs someregular and frequent updates, what we easily get b y now. You could alsotrack the < absolute > movements of persons within some area. Say: Yourdecoder program tracks the movements of the visitors in some museum orbuilding, and plays the associated audio/explanations fitting to thecurrent position. "This is the dining hall of the castle, which wasquite cold during winter, but warm or even hot during summer." Ok, thiswas a truly dull example... :-) )



Best regards

Stefan Schreiber                                          Lisbon



_______________________________________________
Sursound mailing list
[email protected]
https://mail.music.vt.edu/mailman/listinfo/sursound

[Sursound] Two new approaches for the distribution of surround sound/3D audio

Reply via email to