Re: [Sursound] Using Ambisonic for a live streaming VR project

Stefan Schreiber Sun, 12 Jun 2016 18:03:58 -0700

Hi Archontis,

sorry for the relatively late response. I was travelling and had someproblems to post anything on sursound during my trip. (I finally knowwhat went wrong...)

Anyway, many thanks for the (as always) clear and well-informed answeryou gave to my posting.

It is quite remarkable that some "pinnae-less" (but multi-perspective)binaural format seems to work well for HT VR applications. This is justanother proof that some perceptual cues can be omitted (here: pinnaecues) if other cues (ILD, ITD) are more or less intact.

However, there seem to be a couple of limitations of the (propietary)MTB recording format.

- I would expect some problems to distinguish between front and back.(Head movements will fix these, but what if you want to keep your headin some "listening position"?)- I would not expect that any or at least some significant height cuesare captured. (?)


More important:

- It seems to be very difficult if not impossible to bring MTBrecordings into some loudspeaker format. (Even to classical Stereo...)"Application case": Imagine you would like to present some VR/360º moviein some ("plain old") cinema version, or just to broadcast it on TV. Youwould need some 2.0 or 5.1 (or Auro-3D/Dolby Atmos etc.) audio versionto do so. How to derive this from any binaural recording, in somerational way?(Unless they would interpret the 8-mic= RondoMic sphere recording assome HOA source. Which brings us back to my 1st mail...)


Last, but not least:

Good MTB recordings require many capsules assembled in an SA mirophone,in my eyes not any less than for HOA.


Sources:

http://dysonics.com/rondo360/

http://dysonics.com/wp-content/uploads/2014/05/dysonics_immersive_spatial_sound_for_mobile.pdf

"In practice, we find that this

procedure produces high-quality results using 8 microphones for speechand < 16 microphones for music >."

"Although MTB produces highly-realistic, well externalized spatialsound, the signals produced bythis method only approximate the exact experience, and criticallistening tests have revealed variousaudible defects [7]. We have developed methods to correct for theseproblem, if corrections are required,and refer the interested reader to [7] for an extended discussion ofthis topic."


Chapter 4.1:

"For the numerical values a = 0.0875 m, c = 343 m/s and fmax = 2.5 kHz,these formulas callfor 55 microphones for omnidirectional and 16 microphones for panoramicsampling."

55 microphones is quite a lot, especially if you are restricted tobinaural applications.


<>Best regards

Stefan

-----------------------

Politis Archontis wrote:

Hi Stefan,


On 07 Jun 2016, at 04:35, Stefan Schreiber 
<st...@mail.telepac.pt<mailto:st...@mail.telepac.pt>> wrote:

Politis Archontis wrote:

But instead of combining all microphones to generate the binaural directivities 
(as in ambisonics), it interpolates only between the two adjacent microphones 
that should be closest to the listener’s ears. Otherwise, it does not capture 
pinna cues or cues from a non-spherical/assymetrical head.
Any source  for this explanation?

I actually dare to question your view... How will you receive any binaural cues 
via interpolation between two relatively closely spaced omni mikes (fixed on a 
sphere)?

As you even write, this doesn't seem to give any head and pinna cues. (It's 
called MTB. So I guess they would aim to provide several binaural perspectives, 
including head and pinna cues?)

The source is the AES paper describing the method:

Algazi, R. V., Duda, R. O., & Thompson, D. M. (2004). Motion-Tracked Binaural 
Sound. In 116th AES Convention. Berlin, Germany.

It does give head-related cues, that of a spherical head without pinna. If you 
put an omni on a rigid sphere, it is not an omni anymore, it has a 
frequency-dependent directionality, if you put two of them at opposite sides, 
they have opposite directionalities and introduce inter-channel level 
differences. Depending on the size of the sphere, the two signals have a 
direction-dependent phase-difference too. If the size of the sphere is 
approximately the size of a head, then you can assume that the level and time 
differences are close to the binaural ones. This is the infamous spherical head 
model, and its ITDs and ILDs are known analytically. It captures the cues for 
lateralization, but not for a pinna (that it doesn’t have) or for head 
assymetries.

If instead of two omnis, you put many of them on the horizontal plane, then you 
can track the listener’s head yaw rotation and use the two omnis that are 
closer to their ears - or interpolate for a smoother transition. That’s what 
Algazi and Duda are doing in their paper, and they compare various 
interpolation schemes.

Regards,
Archontis


_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Re: [Sursound] Using Ambisonic for a live streaming VR project

Reply via email to