(Continuation of: The commercial future of Ambisonics, 15/5/2013)
Dear colleagues,
following the recent standardization of 3D audio by Mpeg (ISO/IEC
23008-3) and related activities, I have come to the conclusion that the
(older) B format up to 3rd order might need some updates.
However, I also came to the conclusion that FOA (first order
Ambisonics) could be easily included into all current distribution
models for audio in the Internet, which are (to "99.98%") stereo-based.
We nearly have been "there", in the above cited thread! ("The
commercial future of Ambisonics")
I will start with this part, because you can see this as an own format.
Which might be the perfect bridge or transition format for future
surround/3D audio (3DA) formats...
I. UHJ (surround/3D audio) as extension of stereo based files
(distribution via Internet, on discs and streaming, including YouTube,
Spotify etc.)
a) As Richard Elen (and me) have suggested, you could distribute
surround sound and 3D audio as (relativey simple) extension of (UHJ
encoded) stereo files. You would have to add to a stereo file (.aac
file, for example) a 3rd audio channel, OR two audio channels, as an
extension audio stream. The restriction was that these extension would
have to fit into the current distribution models, say downloadable AAC
files via iTunes.
Contrary to my/our first impressions, this is firstly possible (this
has already been pretty clear), and secondly feasible < without any
serious drawbacks >. Which will be shown...
b) Technically speaking, you would have to distribute the
("downsampled") stereo file of FOA, which contains some surround
information, and the one or two audio extension streams.
This is of course UHJ, brought into some AAC extension scheme.
http://en.wikipedia.org/wiki/Ambisonic_UHJ_format
Although UHJ permits the use of up to four channels (carrying
full-sphere with-height surround), only the 2-channel variant is in
current use (as it is compatible with currently-available 2-channel
media). In Ambisonics, UHJ is also known as "C-Format".
(Small potential problem:
"UHJ was developed by the Ambisonic team, incorporating work done by the
BBC (on their quadraphonic system, Matrix H) and Duane Cooper (on Nippon
Columbia's UD-4/UMX quadraphonic system) and others, and building on the
then-current version of Ambisonics, System 45J. The initials indicate
some of sources incorporated into the system: U from Universal (UD-4); H
from Matrix H; and J from System 45J."
This means you < might > think about an update of UHJ, to achieve more
consistency between B format and the UHJ scheme. Or you might leave
things how they are defined, for historical reasons. In any case, you
have to be aware of this...
)
Although an hierarchically extended version of UHJ stereo has been
tested in the area of FM broadcasting, nobody hass tried to distribute
UHJ (hierarchically) extended stereo files via the Internet. Which is
just a head-banging fact... Or maybe there are some deeper reasons?!
If a third channel (T) is available, this can be used to give improved
localisation accuracy to the planar surround effect when decoded via a
3-channel UHJ decoder. The third channel does not have to have full
audio bandwidth for this purpose, leading to the possibility of
so-called "2½-channel" systems, where the third channel is
bandwidth-limited to 5 kHz. The third channel can be broadcast via FM
radio, for example, by means of phase-quadrature modulation. This
configuration was tested by the Independent Broadcasting Authority
(IBA) in the United Kingdom as a method of broadcasting surround
recordings. 2½ or 3-channel UHJ delivers the same accuracy as
3-channel (WXY) B-Format
Adding a fourth channel (Q) to the UHJ system allows the encoding of
full surround sound with height, known as Periphony, with a level of
accuracy identical to 4-channel B-Format.
c) UHJ extended AAC files
AAC allows up to 16 audio channels, and can include 16 data channels. (I
believe that .aac as a < file format > is just .m4a, or .mp4.)
To offer a backward-compatible extension of a < UHJ extended > AAC
stereo file, you would have to include the T and Q audio channels as 3rd
or 4th audio stream, somewhere. (Probably you could "label" such a file
as stereo, the first 2 channels being L and R. Include some tags/flags
in the header that there are one or two further < extension > audio
channels, which would have to be decoded by a UHJ decoder. The decoder
could be an app running on a smartphone, and the output could be a
binaural version of the surround or actually LRTQ 3D audio recording.)
If this "audio channels" approach doesn't work, use the "data"
extensions of .mp4. (T and Q are not direct audio channels, so this
might actually be the formally correct approach... Because T and Q go
into some decoder, as extension < data >.)
d) Bitrate limits
Whereas Apple uses 256 kbps ("VBR") as current standard (they have used
128kbps before), the usual limit for AAC stereo is 320kbps, both for
CBR/VBR (Anything above would not necessarily be "undefined", but
probably would break most existing hardware/software.)
This means that you have 80kbps /channel, or 160kbps for the L/R
(back-ward compatible) stereo file. (IMHO, this is a sufficient value.)
IF people think that you should distribute AAC stereo files with higher
bitrates, there are several solutions.
- The UHJ article already mentions that the T channel could be
bandwidth-limited.
The third channel does not have to have full audio bandwidth for this
purpose, leading to the possibility of so-called "2½-channel" systems,
where the third channel is bandwidth-limited to 5 kHz.
(I suggest you could do the same with the Q channel.
- Within AAC, you could compress (full) channels with different
bitrates. (I was tempted to patent this :-D , but as I don't think this
is even necessary... )
- You could probably use specific properties of the .MP4 format to get
rid of any bitrate limitations, assumption by a person (= me) who has
worked on backward-compatible stuff and standards...
(Any interested company or interested party is free to mail to me.. If
the asking company doesn't sue me at the moment, I usually will answer
in a friendly and more or less competent way! )
- You can offer a bundle within a typical container format, which means
you have an .aac file and an associated audio/data file. Related to the
last point, but not necessarily the same.
e) Is there enough surround content?
I think yes!
- Original UHJ/FOA/soundfield recordings
- 5.1 surround recordings
You would have to transcode 5.1 into some WXY representation (lossy),
and bring this into some LRT (UHJ) form. (Looks like a lossless process)
Whreas this is not without any drawbacks, you could distribute your 5.1
surround recordings in some (stereo) backward-compatible form. Which
means you fit into the current AAC/stereo based file/streaming
environment, offering something new/better/additional as extension to
stereo files or streams.
Transcoding from 5.1 to WXY is a solved problem, I think Xiph.org has
applied this idea in some form.. (Whereas I didn't see a big deal in
transcoding 5.1 film audio tracks into 3-channel Ambisonics, it is
another thing to translate some 5.1 audio-only surround recording to
some form which is backward-compatible to AAC stereo. And which fits
into the current forms of audio distribution. This is an important issue.)
e) Consumer reception
Obvously a lot of music is nowadays listened on mobile devices, via
headphones. Decoder programs/apps for LRT(Q) files which might decode
Ambisonics to a (binaural) headphone version can easily be distributed.
Without such a "special" playback option, people still can listen to a
("downfolded" but definitively viable) stereo version of the same
surround recording.
If Ambisonics based surround/3D audio would be introduced in about this
way, I fully believe that people would consider this to be something
"cool". And if you are too old to think in cool-app terms, maybe even
then you would be tempted to experiment with these files, because how
does 3D audio actually sound? Can you hear the difference? O:-)
f) Other audio compression formats, "non-AAC"
Everything which has been said about AAC extension streams ("audio" or
"data") would apply to (Ogg) FLAC, (Ogg) Vorbis and to the new "Opus"
audio codec (official Internet standard) in a very similar way.
I would not use .MP3 for extended audio streams. (Compression efficiency
does matter; AAC is more modern, and far more extensible as a file format.)
(If anyone plans to distribute CD/DVD bundles or DualDiscs offering
LRT(Q) based surround/3DA versions of recordings, I still have some
connections in this area and could give some practical advice. If
anyone doesn't need advice and just does it, even better...)
g) Extensions of UHJ
It seems possible to extend the UHJ scheme to higher oders. However,
this would be some idea for the future, as implementations of undefined
formats are not that trivial. And I doubt there is some real value in
this idea. (In practical terms, not as an academic idea... :-D )
Speaking about higher order and < the future > leads us to the second
proposed format, an extended .AMB format....
II. A new ".AMB+" format going up to 4th order, or maybe higher
a) HOA is already "accepted" as an input format for the (future) encoder
of (Mpeg) 3D audio.
Evidence:
The "Higher Oder Ambisonics Test Material" (12 items) is mostly
presented in 4th order. (Mpeg-H CfP, Table-Set 2)
(H_01 is HOA order 6, H_02 - H_11 is 4th order, H_12 is 3rd order.)
As I have said here and elsewhere, Ambisonics and HOA are existing and
completely "usable" formats for 3D audio. In fact, Ambisonics has always
been about 3D audio...
Because a lot of people seem to work currently at 4th order (see also
the eigenmike, which goes up to 4th order), it seems necessary to extend
the .AMB format to (at least) 4th order. Or should we replace .AMB with
some really new (Ambisonics) format, but which should allow mixed orders?
http://en.wikipedia.org/wiki/Ambisonics
(Table Higher Order B format channels)
You would have to add the combinations
Horizontal order Height order Number of channels
4
0 9
4
1 10
4
2 13
4
3 18
4
4 25
A "collision" of 9 channels of (full-shere) 2nd order and 4th order
horizontal-order should not matter, IMO. (I also don't need the concept
of "metadata" to distinguish between these cases. Every format or
streaming format I know has some header - or several/repeated headers -
with some descriptive information, which in this case should include the
values for < horizontal order >, < height order > etc. Not just the
number of channels, which also should be included. Many formats I know
bear so many data fields that they actually describe the same properties
in different but repeated ways. Redundancy is in this case no problem,
missing information actually is. Therefore, it is certainly no problem
to have two mixed order variants with 9 channels within the same scheme,
if you also include some additional information like "horizontal order"
etc. Call this metadata, descriptive data fields, or whatever..)
4th order looks like a good fit to the 22.2 surround system, which
serves as a kind of upper "reference" in the Mpeg-H 3D audio framework.
(For example, you have 10 fairly evenly distributed loudspeakers n the
0º plane, which is the minimum amount of speakers you need to fully
represent horizontal 4th order.)
The 4h3p mixed order system seems to fit quite well to the complete "3D"
22.2 layout, BTW. You have the horizontal plane, middle plane at about
35º but in any case near 45º, and the 90º over-head speaker. 18 channels
coded via AAC and 64kbit/s are less than 1,2MBit/s, which is still less
than "DTS on CD". AAC has been proven to be a good codec to compress
Ambisonic/HOA audio channels. )
A future surround format should supersede 5.1/6.1 surround, which is the
undisputed current film standard and will survive for a long time. You
would have to include height or (full-sphere) 3D audio into any next-gen
surround format.
If FOA is not really a good format for cinema audio, you would probably
not go to 2nd order, but probably to about 3rd and 4th orders, to obtain
some real improvements. (Is there any reason that you could not apply
advanced Ambisonics decoder technologies for 1st order - say Harpex -
also to 3rd and 4th order? If so, I would try this < before > I go to
7th, 10th or xxth orders...)
b) Ambisonics is not included into current standardization efforts for
3D cinema audio (but HOA is referred as "encoder input format"),
nevertheless it is an existing and real alternative to other developping
standards:
- Ambisonics/HOA is a 3D audio format, by definition. (Sound fiels are
not restricted to some "horizontal plane".)
- The Ambisonics theory seems to be quite developped. As long as we
don't talk about "any order" approaches (too much freedom means often
just "lack of definition"), you can develop practical applications. (We
talk about some format up to 4th order, not abstract "nth" order.)
- It is a format which fits to different loudspeaker layouts. (With
some restrictions, but other formats might have similar problems,
included object-based codecs.)
- It is full-sphere (and not restricted by some hemispherical layout).
Binaural audio presented via headphones is compatible to full-sphere
formates!
c) Company support: This is far better than some people might think.
Obviously the people at Orange/FT are quite "positive" about HOA. BBC
might also show some renewed interest.
Qualcomm (now a very big company, because of its expertise in wireless
technologies and the fact that they are a major manufacturer of ARM
based mobile processors) has shown some significant interest in the
development of 3D audio, including HOA.
For example:
http://jobs.cellular-news.com/index.php?post_id=34345
Huawei is interested in HOA, among other companies in Asia.
In the end, Ambisonics and HOA might not need as much company support as
other technologies, because there is some established theory behind,
solutions are really getting more and more available, and Ambisonics
(FOA/HOA) is truly an open system.
d) Extension of the .AMB scheme to 4th order
"Furse-Malham higher-order format (FMH-Format) is a set of coefficients
that can be applied to the first 16 B-format channels. The FMH set of
coefficients applies weightings to the channels such that all the
spherical harmonic coefficients have a maximum value of unity. Whilst
this approach is not rigorously "correct" in mathematical terms, it has
significant engineering advantages in that it restricts the maximum
levels a panned mono source will generate in some of the higher-order
channels.[21]
The Furse-Malham set of weighting factors is part of the ".amb"
specification for downloadable B-Format files."
(Wikipedia article about Ambisonics)
So, if some people seem to work at exactly 4th order: Can you extend the
(mixed-order) .AMB system to 4th order? What would be the weight factors?
If this is not possible: How should a modernized mixed-order file format
up to 4th order look like?
(I am aware that you could just "mix" .AMB and a 4th order format
proposal, at least in this case. This would be a so-called "hack",
albeit a workable hack... Monsieurs Malham, Furse, Adriaensen, Greene:
Please present some workable and consistent solution within Western
scientific-mathematical traditions, and if possible do this soonly. :-)
Cos even the technical terms "hack" and "hacker" might lead to some
serious and joint investigation by the Holy Inquisition and its modern
form called NSA, which means we should aim for a mathematical, beautiful
and eternal solution, not for some geek < hacks > ... Aaaargh, I have
used the wrong word, yet again... :-)
III. The proposals (I) and (II) are presented in some joint form,
because I believe they are complementary.
You would need a "lightwight" surround/3DA format as presented in (I),
which is backward-compatible to stereo distribution but is a true format
for surround sound and "even" 3D audio ("even" used for justified PR
purposes!), via a 3rd and 4th audio extension channel. This is a bit
like the older G format. However, I believe it makes far more sense to
extend stereo files to surround sound than surround files (5.1) to
surround sound. ;-)
II (".AMB+" up to 4th order") is a proposal for a powerful surround/3DA
format, but which still is in the limits of a true CE format (note that
AAC allows effective compression of up to 25 channels), can be decoded
to binaural representation (mobile listening via headphones), and might
be good enough to be used as "wide area" format, including for
applications like life concerts, cinema audio, spatial audio for
theatre/concerts/museums, etc.)
You would start with (I), which could be implemented even with the
information and links provided within this posting. (Yet again, the only
"problem" might be that the bitrate of the LR stereo downfold would be
lower than 256kbps. If people think that you can't code HQ stereo within
a 160kbps or say 192 kbps AAC file - 192 kbps was part of the patent I
didn't apply for :'( - then I believe with some good reasons that
there are several ways to get around the 320kbps restriction for 3/4
channels.. (Which means: I think you could code the 3 or 4 channels in
say 384kbps or 512kbps. In any case, I believe the customers won't hear
any difference between the 160kbps,192kbps and 256kbps AAC stereo rates,
as long as you don't tell them about these differences. And as long they
didn't read about "low" AAC bitrates in unnamed audiophile magazines.
Note that FLAC based solutions could be developped for some audiophile
customers, but point was to apply the LRTQ/UHJ hierarchical system
within the current distribution of AAC encoded stereo files/streams.)
IV. Improved binaural representation via headphones
Note that headphones with HT "chips" and motion-corrected binaural
playback of surround sound (including 3D audio) could easily be
realized, with available and actually quite affordable chips.
Oculus Rift is the direct example for this, as this is a full (and
certainly more complex) VR and gaming device.
http://worthplaying.com/event/E3_2013/PostE3_2013/89888/
In its current state, the Oculus Rift is an amazing piece of work, and
after decades of dealing with VR technology, it seems that we may
finally see a VR unit that is going to get it right.
Wikipedia writes about the Oculus Rift motion-tracking:
"Initial prototypes used a Hillcrest 3DoF head tracker that is normally
120 Hz, with a special firmware that John Carmack requested which makes
it run at 250 Hz, tracker latency being vital due to the dependency of
virtual reality's realism on response time. The latest version includes
Oculus' new 1000 Hz Adjacent Reality Tracker that will allow for much
lower latency tracking than almost any other tracker. It uses a
combination of 3-axis gyros, accelerometers, and magnetometers, which
make it capable of absolute (relative to earth) head orientation
tracking without drift.[20][25]"
Now, apply the same or similar HT silicon (which is already very
affordable) to HT/motion-tracking headphones... (I could give some
detailled recommendations how to do this, but this is also one of the
next steps... Nice to see that at least the video and gaming people have
kept some sense for cool technology and seemingly "weird" ideas, so to
speak. How many motion updates per second would a "fluent" head-tracking
binaural decoder/decoding program actually requiere? Ye Ambisonics
experts, what do you think or better know?!
You would have to decode some UHJ/".AMB+" file and shift the soundfield
relative to the head position, I guess. The head position needs some
regular and frequent updates, what we easily get b y now. You could also
track the < absolute > movements of persons within some area. Say: Your
decoder program tracks the movements of the visitors in some museum or
building, and plays the associated audio/explanations fitting to the
current position. "This is the dining hall of the castle, which was
quite cold during winter, but warm or even hot during summer." Ok, this
was a truly dull example... :-) )
Best regards
Stefan Schreiber Lisbon
_______________________________________________
Sursound mailing list
[email protected]
https://mail.music.vt.edu/mailman/listinfo/sursound