Dear audio experts...

I would like to inform you about the availibility of some introductory article about MPEG-H 3D Audio, written in a competent and very readable style by some of the creators:

http://www2.iis.fraunhofer.de/mpeghaa/papers/AES137_MPEG-H_v14_final.pdf
(Source 1; presented in October 2014)

------------------------------------
Mpeg 3DA is part of a bigger standard group, as such a co-standard of the most recent and powerful Mpeg video compression standard ( = HEVC):

ISO/IEC 23008 - High efficiency coding and media delivery in heterogeneous environments

http://en.wikipedia.org/wiki/MPEG-H

--------------------------------------

MPEG-H 3D Audio supports and integrates channel-based, object based and sound field/HOA based audio formats/technologies.

"Within MPEG-H 3D Audio,
flexible rendering to different speaker layouts is
implemented by a format converter that adapts the
content format to the actual real-world speaker setup
available on the playback side to provide an optimum
user experience under the given user conditions. For
well-defined formats, specific downmix metadata can
be set on the encoder to ensure downmix quality, e.g.
when playing back 9.1 content on a 5.1 or stereo
playback system."

You could say that 3DA is highly flexible both on the format input side (encoding basically any known < practically used > format) and at the rendering/output stage.


Fig. 1 in the cited standard description gives very strong evidence that 3D audio is a worthwile improvement, compared to horizontal surround sound. 5.1 + 4H means 5.1 + 4 "Height" (speakers), BTW. (So I believe "5.1 + 4H" should refer to an Auro 3D configuration - which seems to be quite obvious.)

(Knowing some studies which are claiming that 3D audio is not "worth it": I firstly don't believe that any of these studies has been very set up in a very careful way, secondly they seem to contradict quite simple observations. IF we are able to hear - at least to some significant degree - in three dimensions THEN any technology which claims to be perceptually < complete > has to reproduce sound in 3D. To test what sounds "good" or "better" even doesn't matter if you think in this way. So maybe the first relevant test in any relevant scientific study referring to basic acoustical and perceptual questions could be if you can hear and localize - at least "to some degree" - elevated and lowered sound sources. The answer is a profound "Yes", as everybody knows...)

Now a citing from another Fraunhofer (IIS) paper:

Thus, Silzle et al. undertook a study(7) to determine the relative overall perceived sound quality of several speaker configurations, to determine if a practical compromise was possible between the sound quality provided by a 22.2 system and today's 5.1 and 2.0 formats. As shown in Figure 4, the perceived quality improvement from 5.1 surround to 22.2 is greater than that from stereo to 5.1. However, ignoring the LFE channels, an upgrade from stereo to 5.1 requires 3 new speakers, while an upgrade from 5.1 to 22.2 requires 17 new speakers. Our tests using an active downmix method show that most of the perceived improvement of upgrading 5.1 to a
22.2 system can be obtained with four additional height speakers

Source:
http://www.iis.fraunhofer.de/content/dam/iis/en/dokumente/forschungsfelder/AMM/Conference-Paper/BleidtR_SMPTE2014_Object-Based_Audio.pdf

And on the more practical side:

Adding immersive audio not only involves adding new signals, but also extending the panning and mixing functions of the live console or post-production digital audio workstation to handle perhaps 5.1 + 4H or 7.1 + 4H or third or fourth-order HOA signals. There are several operational
strategies for producing these signals:



Returning to Source 1 (Introductory Standard Description/3DA):

It is foreseeable that media consumption is moving
further towards mobile devices with headphones being
the primary way to play back audio.

Therefore, a
binaural rendering component was included in the
MPEG-H 3D audio decoder for dedicated rendering on
headphones with the aim of conveying the spatial
impression of immersive audio production also on
headphones.



The standard aims have all been met if not exceeded, it seems to me (as an outsider who has followed the standardization process since at least 2011). So, congratulations!

It must be said that the development of such a standard has only been possible because of many years of basic and applied research in this area. The main contributors have therefore invested time, personal and money on this project, during a significant time frame. It is simply encouraging to see that certain institutions and companies related to audio research and (audio) consumer electronics still have some long-term strategies and views. (Now compare this with the situation in the so-called music industry - but this is getting "offtopic", and I don't want to get too angry on Sunday evening...)


Contrary to some < publicly funded > university research I won't specify (or refer by name) but which remains "unavailable for anybody" - in spite of many presented < papers > about :-D , < this > standard will be licensable as any other Mpeg standard - and seems already to be applied as (3D) audio standard for future (HD/UHD) TV standards - as currently defined by ATSC and EBU.

Best regards,

Stefan Schreiber

P.S.:

There are definitely some more 3D audio standardization efforts around...

http://www.aes.org/events/137/papers/?ID=4048

P1-7 ECMA-407: New Approaches to 3D Audio Content Data Rate Reduction with RVC-CAL

Inverse problems have only been known in spatial audio for a very short time; their only solution, called "inverse coding" in literature, is essentially based on time-level modeling. Inverse problems, however, unlike parametric coding, require only an initial transmission of spatial side information, and thus can achieve much lower bitrates than could be achieved with parametric coding.

The technology has been specified as the world's first international 3D audio standard ECMA-407 and may be further extended with static models in frequency domain.

Public available source:
http://ecma-international.org/publications/standards/Ecma-407.htm

(ECMA standards are freely available, which is good.)

A new way to perceptually eliminate redundant information makes use of invariant theory inside the encoder. Invariants with Gaussian processes were unknown until 2010 and have represented one major problem in non-applied mathematics for more than a century: David Hilbert's proof that these coefficient functions form a field then insinuated that their existence in random processes was very likely. As will be shown, when applied to spatial audio coding, invariants represent a numerically efficient and perceptually powerful algebraic tool.


Now, I am very probably too stupid to understand ALL O:-) of this, but hopefully this is some nice entertainment for all the lurking mathematicians on our list...

P.S. 2: And what about our Ambisonics and "open audio" application and standardization efforts? The question has to be asked, especially because most of these efforts seem to end in the dustbin - as unfinished projects, and after people lost interest. I hope I will proved to be wrong...

Time to achieve some more "visible" results we have had plenty. (I am sorry to have to write this.)

Progress outside the acadmic world is done. The new Mpeg 3DA standard seems to be a milestone in audio standardization, being very general - but still open for future extensions.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20141207/fb6c1ebc/attachment.html>
_______________________________________________
Sursound mailing list
[email protected]
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Reply via email to