Hi Archontis,

sorry for the relatively late response. I was travelling and had some problems to post anything on sursound during my trip. (I finally know what went wrong...)

Anyway, many thanks for the (as always) clear and well-informed answer you gave to my posting.

It is quite remarkable that some "pinnae-less" (but multi-perspective) binaural format seems to work well for HT VR applications. This is just another proof that some perceptual cues can be omitted (here: pinnae cues) if other cues (ILD, ITD) are more or less intact.

However, there seem to be a couple of limitations of the (propietary) MTB recording format.

- I would expect some problems to distinguish between front and back. (Head movements will fix these, but what if you want to keep your head in some "listening position"?) - I would not expect that any or at least some significant height cues are captured. (?)

More important:

- It seems to be very difficult if not impossible to bring MTB recordings into some loudspeaker format. (Even to classical Stereo...) "Application case": Imagine you would like to present some VR/360º movie in some ("plain old") cinema version, or just to broadcast it on TV. You would need some 2.0 or 5.1 (or Auro-3D/Dolby Atmos etc.) audio version to do so. How to derive this from any binaural recording, in some rational way? (Unless they would interpret the 8-mic= RondoMic sphere recording as some HOA source. Which brings us back to my 1st mail...)

Last, but not least:

Good MTB recordings require many capsules assembled in an SA mirophone, in my eyes not any less than for HOA.

Sources:

http://dysonics.com/rondo360/

http://dysonics.com/wp-content/uploads/2014/05/dysonics_immersive_spatial_sound_for_mobile.pdf

"In practice, we find that this
procedure produces high-quality results using 8 microphones for speech and < 16 microphones for music >."

"Although MTB produces highly-realistic, well externalized spatial sound, the signals produced by this method only approximate the exact experience, and critical listening tests have revealed various audible defects [7]. We have developed methods to correct for these problem, if corrections are required, and refer the interested reader to [7] for an extended discussion of this topic."

Chapter 4.1:
"For the numerical values a = 0.0875 m, c = 343 m/s and fmax = 2.5 kHz, these formulas call for 55 microphones for omnidirectional and 16 microphones for panoramic sampling."

55 microphones is quite a lot, especially if you are restricted to binaural applications.

<>Best regards

Stefan

-----------------------

Politis Archontis wrote:

Hi Stefan,


On 07 Jun 2016, at 04:35, Stefan Schreiber 
<st...@mail.telepac.pt<mailto:st...@mail.telepac.pt>> wrote:

Politis Archontis wrote:

But instead of combining all microphones to generate the binaural directivities 
(as in ambisonics), it interpolates only between the two adjacent microphones 
that should be closest to the listener’s ears. Otherwise, it does not capture 
pinna cues or cues from a non-spherical/assymetrical head.
Any source  for this explanation?

I actually dare to question your view... How will you receive any binaural cues 
via interpolation between two relatively closely spaced omni mikes (fixed on a 
sphere)?

As you even write, this doesn't seem to give any head and pinna cues. (It's 
called MTB. So I guess they would aim to provide several binaural perspectives, 
including head and pinna cues?)

The source is the AES paper describing the method:

Algazi, R. V., Duda, R. O., & Thompson, D. M. (2004). Motion-Tracked Binaural 
Sound. In 116th AES Convention. Berlin, Germany.

It does give head-related cues, that of a spherical head without pinna. If you 
put an omni on a rigid sphere, it is not an omni anymore, it has a 
frequency-dependent directionality, if you put two of them at opposite sides, 
they have opposite directionalities and introduce inter-channel level 
differences. Depending on the size of the sphere, the two signals have a 
direction-dependent phase-difference too. If the size of the sphere is 
approximately the size of a head, then you can assume that the level and time 
differences are close to the binaural ones. This is the infamous spherical head 
model, and its ITDs and ILDs are known analytically. It captures the cues for 
lateralization, but not for a pinna (that it doesn’t have) or for head 
assymetries.

If instead of two omnis, you put many of them on the horizontal plane, then you 
can track the listener’s head yaw rotation and use the two omnis that are 
closer to their ears - or interpolate for a smoother transition. That’s what 
Algazi and Duda are doing in their paper, and they compare various 
interpolation schemes.

Regards,
Archontis

_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Reply via email to