Hi Archontis,
sorry for the relatively late response. I was travelling and had some
problems to post anything on sursound during my trip. (I finally know
what went wrong...)
Anyway, many thanks for the (as always) clear and well-informed answer
you gave to my posting.
It is quite remarkable that some "pinnae-less" (but multi-perspective)
binaural format seems to work well for HT VR applications. This is just
another proof that some perceptual cues can be omitted (here: pinnae
cues) if other cues (ILD, ITD) are more or less intact.
However, there seem to be a couple of limitations of the (propietary)
MTB recording format.
- I would expect some problems to distinguish between front and back.
(Head movements will fix these, but what if you want to keep your head
in some "listening position"?)
- I would not expect that any or at least some significant height cues
are captured. (?)
More important:
- It seems to be very difficult if not impossible to bring MTB
recordings into some loudspeaker format. (Even to classical Stereo...)
"Application case": Imagine you would like to present some VR/360º movie
in some ("plain old") cinema version, or just to broadcast it on TV. You
would need some 2.0 or 5.1 (or Auro-3D/Dolby Atmos etc.) audio version
to do so. How to derive this from any binaural recording, in some
rational way?
(Unless they would interpret the 8-mic= RondoMic sphere recording as
some HOA source. Which brings us back to my 1st mail...)
Last, but not least:
Good MTB recordings require many capsules assembled in an SA mirophone,
in my eyes not any less than for HOA.
Sources:
http://dysonics.com/rondo360/
http://dysonics.com/wp-content/uploads/2014/05/dysonics_immersive_spatial_sound_for_mobile.pdf
"In practice, we find that this
procedure produces high-quality results using 8 microphones for speech
and < 16 microphones for music >."
"Although MTB produces highly-realistic, well externalized spatial
sound, the signals produced by
this method only approximate the exact experience, and critical
listening tests have revealed various
audible defects [7]. We have developed methods to correct for these
problem, if corrections are required,
and refer the interested reader to [7] for an extended discussion of
this topic."
Chapter 4.1:
"For the numerical values a = 0.0875 m, c = 343 m/s and fmax = 2.5 kHz,
these formulas call
for 55 microphones for omnidirectional and 16 microphones for panoramic
sampling."
55 microphones is quite a lot, especially if you are restricted to
binaural applications.
<>Best regards
Stefan
-----------------------
Politis Archontis wrote:
Hi Stefan,
On 07 Jun 2016, at 04:35, Stefan Schreiber
<st...@mail.telepac.pt<mailto:st...@mail.telepac.pt>> wrote:
Politis Archontis wrote:
But instead of combining all microphones to generate the binaural directivities
(as in ambisonics), it interpolates only between the two adjacent microphones
that should be closest to the listener’s ears. Otherwise, it does not capture
pinna cues or cues from a non-spherical/assymetrical head.
Any source for this explanation?
I actually dare to question your view... How will you receive any binaural cues
via interpolation between two relatively closely spaced omni mikes (fixed on a
sphere)?
As you even write, this doesn't seem to give any head and pinna cues. (It's
called MTB. So I guess they would aim to provide several binaural perspectives,
including head and pinna cues?)
The source is the AES paper describing the method:
Algazi, R. V., Duda, R. O., & Thompson, D. M. (2004). Motion-Tracked Binaural
Sound. In 116th AES Convention. Berlin, Germany.
It does give head-related cues, that of a spherical head without pinna. If you
put an omni on a rigid sphere, it is not an omni anymore, it has a
frequency-dependent directionality, if you put two of them at opposite sides,
they have opposite directionalities and introduce inter-channel level
differences. Depending on the size of the sphere, the two signals have a
direction-dependent phase-difference too. If the size of the sphere is
approximately the size of a head, then you can assume that the level and time
differences are close to the binaural ones. This is the infamous spherical head
model, and its ITDs and ILDs are known analytically. It captures the cues for
lateralization, but not for a pinna (that it doesn’t have) or for head
assymetries.
If instead of two omnis, you put many of them on the horizontal plane, then you
can track the listener’s head yaw rotation and use the two omnis that are
closer to their ears - or interpolate for a smoother transition. That’s what
Algazi and Duda are doing in their paper, and they compare various
interpolation schemes.
Regards,
Archontis
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit
account or options, view archives and so on.