Re: [Sursound] A proposal for an Ambisonics based 3D audio codec, MPEG/ITU style...

Stefan Schreiber Sat, 19 Jan 2013 20:11:40 -0800

Reading back, and evaluating...

http://steinhardt.nyu.edu/scmsAdmin/uploads/005/384/Miles_Fulwider_Thesis.pdf

I believe B+ "classic" could serve as a convincing "1st" proposal for asound-field based 3D cinema audio system. (And therefore, as a generalsurround format).


The very obvious changes "could"/would be:

- You have at least 3 front channels behind the screen (up to 5 in DolbyAtmos), not 2

- Which Ambisonics order would you need for cinema use? Might even 1storder suffice (provided there is a "direct"/precise front)

- LFE channel(s) should (probably) be treated in a simplechannel-loudspeaker "configuration", such as the front channels. (Youalso could code an "LFE soundfield", B format style. Or you don't codeLFE at all - but we are talking about cinema use, in which case LFEseems to matter.)

- If using a sound field approach, you would not want to mix soundfieldsthis with audio objects at rendering time. (Rather, use audio objects inthe studio. Mixing stage, not decoding stage...)

- The "object audio" proposals are all driven by the need to cope formany different loudspeaker layouts. This issue is no problem forAmbisonics/soundfields...(Could say much more about this point, but it is too late, and whyshould I... :-) )



Best,

Stefan Schreiber






Richard Furse wrote:

Very interesting post & following discussion.

I've actually been added very recently to the IST/37 committee, which
apparently is a close relative to the MPEG one. However, I've not talked to
any other members so far and I'm not sure how all this stuff works just yet!

In other news, I spent a bit of time last year putting together a C API for
object streaming etc (including Ambisonics). Hopefully it roughly captures
the suggestions/requirements below. This project is now in a state where
there's a fairly short API that seems to work and a basic SDK which provides
some basic reference tools like a simple stereo renderer, lossless file
format and network streaming. However, this isn't part of the API/Spec
itself - the intent is that the C API should be independent of actual
rendering/stream/persistence formats (although a reference is provided), so
would hopefully play nice with Atmos/MDA. That said, I've not seen a
*technical* spec for either of these yet, so there's a fair bit of guesswork
happening. Certainly what's there now seems to work well for me, so far. :-/
The provisional spec has been bounced off a few folk but I've not heard much
back (though I also had some email problems at a similar time). I'm
wondering about releasing the API and SDK using some kind of open source
license. Anyway - if folk are interested in more detail, please get in touch
off-list!

Best wishes,

--Richard


-----Original Message-----
From: sursound-boun...@music.vt.edu [mailto:sursound-boun...@music.vt.edu]
On Behalf Of Stefan Schreiber
Sent: 06 January 2013 02:00
To: Surround Sound discussion group
Subject: [Sursound] A proposal for an Ambisonics based 3D audio codec,
MPEG/ITU style...

Dear colleagues...
I would like to remember everybody interested or already being involvedthat ITU/MPEG plan to define and issue some 3D audio standard (better:3D audio standard framework) during this year. The 3D audio codec ismeant to be part of the (wider) MPEG-H standard.
This all makes a lot of sense, 'cos ;-) there is already somecompetition around:
1. Hamasaki 22.2, well known as (audio) part of former UHDTV (SuperHi-vision) proposals.
2. http://www.auro-3d.com/system/listening-formats

(Note:

a)
The Auro-3D® Engine comprises:
Auro Codec: The revolutionary codec that delivers native, discreteAuro-3D® content.
Auro-Matic: The groundbreaking up-mixing algorithm that convertslegacy content into the Auro- 3D® format.
Auro-3D® Headphone: Like other audio configurations, similar resultscan be achieved with headphones that use binaural technology.
b)
Film, Broadcast, Gaming, Mobile, Automotive and Multimedia industriesare all searching for a next generation sound format. With 3DStereoscopic imagery becoming commonplace, the time is right for anaudio experience that matches this increased level of fidelity. Soundin 3D is clearly the next step.
3. http://www.dolby.com/us/en/consumer/technology/movie/dolby-atmos.html
(IMHO, Dolby won't participate in the MPEG standardization process. Andeven if, Dolby Atmos seems to be finished.)
The current situation at MPEG:

http://www.itu.int/en/ITU-T/studygroups/com16/video/Pages/jctvc.aspx

Next meetings:
   * Geneva, Switzerland, October 2013 (tentative)
   * Vienna, Austria, 27 July - 2 August 2013 (tentative)
   * Incheon, Korea, 20-26 April 2013 (tentative)
   * Geneva, Switzerland, 14-23 January 2013 (tentative)
During the next conference (January, Genève), the important HEVC codecshould be technically finished. (Status: FDIS, for "Final DraftInternational Standard")
There will also be issued a final call for an 3D audio codec:
At the 102nd MPEG meeting MPEG has issued a Draft Call for Proposals(CfP) on 3D Audio Coding.
(This was the last meeting, Shanghai, October 2012)
MPEG-H 3D Audio is envisaged to provide a highly immersive audioexperience to accompany the highly immersive experience provided byMPEG-H HEVC. Such an immersive listening experience will be realizedby the rendering of a realistic and compelling 3D audio scene eitherby using a large number of loudspeakers, such as for 22.2 channelaudio programs, or by using headphones supporting binauralization.Key issues to be addressed are a compact and bit-efficientrepresentation of multi-channel audio programs and the ability toflexibly render an audio program to an arbitrary number ofloudspeakers with arbitrary configurations. 3D Audio support viaheadphones is also a key capability in order to deliver an immersiveexperience for users of mobile devices.A final CfP will be issued at the 103rd meeting in January 2012,
(they mean January 2013, of course...)
with selection of technology from amongst the responses received atthe 105th meeting in July 2013. This technology will form the basisfor MPEG-H 3D Audio, the Audio part (Part 3) of the MPEG-H (ISO/IEC23008) suite of technologies.
Taken together, the final deadline for any proposal seems to be aroundApril 2013. (Incheon, Korea meeting, April 2013)
If some Ambisonics based audio-codec is proposed (it has been done, butas an official proposal??), I would like to add some observations.
Cinema audio and UHD TV (and this is where the push comes from) icludesome "discrete" elements, and anybody has to be aware of this. Firstly,there are one or two (Hamasaki 22.2) separate LFE channels. (LFEchannels make sense for movies and in the cinema, even if some peoplealways will dispute this...we are not talking about most music you willlisten to at home, but about cinema sound with special effects.)
Secondly, a lot of sound is tied to the screen. The narrow-spaced frontspeakers might represent a problem for Ambisonics, at least forlow-order Ambisonics. (Dolby Atmos defines actually up to 5 "screen"loudspeakers, this means three or five. Note that the front C channel isoften used as voice/conversation channel.)
A possible solution would be to offer some kind of B"+" option, the"plus" part being the front and LFE channels. 2D/3D surround for all the"resting" sound field would be offered via the B format (order?) soundfield, or HOA sound field. (To mix such a hybrid sound format is rathertrivial, I would say. Just leave out the front and the LFE parts in thesurround/3D field... )
So maybe define some "purist" solution (say B format 3rd order, orhorizontal 4th order mixed with vertical 1st/2nd order, or whatever),and also some "B+" option. (The original B+ proposal was FOA + 2 stereochannels. Note that a direct consequence of the "hybrid" Ambisonicsoption would be that a 2nd or 3rd order soundfield should be enough forthe representation of the surround and height channels. In fact, you candecode to 5.1, 7.1 Hamasaki 22.2, Auro-3D and Dolby Atmos surroundlayouts. The B format "resolution" should be more than enough for any ofthese layouts - maybe even at 2nd order, certainly at 3rd. Thenarrow-spaced front wouldn't be any problem, by definition. LFE channelsare discrete in any case, as stated before.)
I would't be afraid to offer some hybrid option, anyway. (Dolby Atmosdefines up to 64 channels, and also audio objects for differentloudspeaker layouts. Therefore, Dolby Atmos is itself a hybrid system -based on discrete channels and audio objects.)
I just wanted to give a small hint ;-) how anybody might set up a validproposal. The B+ could and < should > be included as an option. Thebasic idea behind for this is that cinema audio has some specificproperties, which have to be covered by any system. (The < front > isextremely important, because voice and many sounds are tied to events onthe screen; LFE channels are discreet; the C channel is mostly used in adiscreet way, being used as the voice channel.)
Note also that the clock is already ticking, and I absolutely mean this.The MPEG can chose from some valid proposals, (Hamasaki) 22.2 andAuro-3D among these.
Ambisonics is defining a 3D audio field since the 70s, so it would seemlogic to include Ambisonics into any 3D audio standard. There are alsosome clear advantages, which are getting more and more important.(Different cinemas won't offer anywhere the same loudspeaker layouts,pretty safe bet)
Because the MPEG will basically chose from existing proposals, somebodyhas to define some valid Ambisonics based proposal.
I am apologizing to the already involved experts to have written on apretty basic or say introductory level. But nobody has done this herebefore, and I think not everybody is sufficiently informed about theseissues - maybe even some very competent people.
However/but:
The next two or three MPEG conferences are not just like the nextspacial audio or Linux audio conference ;-) , we are talking (also)about the probable next real-world standard for 3D audio. After MPEG-H3rd part (audio) and Dolby Atmos exist, every future endeavour wouldface some (extremely difficult) uphill battle.
If Ambisonics is not included for reasons of laziness, infighting tribesor whatever else, I would say: Game over for Ambisonics in thereal-world.... (I don't mean this in a rude way. The thing is justthat the MPEG won't wait for even the most beautiful HOA standard whichwill be represented in the year 2015 or 2020...)
The advantages of Ambisonics are clear: It is by definition a 3D audiotheory/codec, and you can decode to different loudpeaker layouts andheadphones. (This is of course very basic, but you have to tell this topeople if presenting a proposal.)
Best regards,

Stefan Schreiber                                     Lisbon
P.S.: I personally would/will work with any 3D audio standard. BecauseMPEG-H 3rd part (audio) will be a selection of severalcodecs/approaches, Ambisonics should be included. If so, I would definetwo options: some "purist" approach, but also some "B+" approach, whichmaybe fits more to cinema-audio in the real world.
Now Thomas Chen (still lurking on this list?) would probably agree,because the original (6-channel) B+ proposal if from him. Unfortunatelyhe works at Dolby... :-X
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound


-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20130120/cea64251/attachment.html>
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

Re: [Sursound] A proposal for an Ambisonics based 3D audio codec, MPEG/ITU style...

Reply via email to