Albert Leusink wrote:
Good evening,
It’s been very informative reading this list and learning from all of you
experts.
I’m an experienced audio engineer that suddenly discovered Ambisonics due to
the whole VR 360 explosion.
VR needs surround sound, but you could also say that surround sound (at
least its wide application) might be "saved" by current VR
developments. (Surround sound has arguably been ignored by the music
industry - but also and quite strangely by the audio equipment
manufacturers, including certain "headphone companies". The IT industry
didn't help too much either, IMO.)
(Although I have made some recordings with a Calrec MK4 in the mid nineties; we
would just mix
them down to stereo, not knowing what to do with these “B-format” outputs,
thinking that they
were used by the “B”BC only…shameful, I now realize…we were young….:-)
As I’m very new in this, so many questions - that even after reading this list thoroughly and other resources - remain unanswered and hopefully some of you can take the time to answer them.
I’ll try to put them in separate threads so we can tackle the issues one by
one, unless you prefer otherwise, let me know.
Question 1:
I’m understanding that a big variable re. localization in ambi to binaural
decoding is picking the right HRTF.
True. There has been a lot of research and about three professional
products, but nothing practical for the music listener.
Now, is there a method whereby we could use test tones or pink/white noise to approximate the subject's HRTF and then use the closest measured HRTF from i.e. the IRCAM or CIPIC database?
For example let’s say we use 100Hz, 1K, and 10K and the listener has to press a
button on his device when he hears each tone exactly in the middle or exactly
at -180 or otherwise. Or using regular and phase reversed tones and subject has
to
calibrate when they are the loudest or softest?
I believe "picking of HRTF data sets" could/will currently be achieved
via measurement of (simple) biometric data and/or matching to HRTF data
bases, as described in these two introductory articles:
I.
http://www.tvtechnology.com/audio/0014/hrtfs-and-binaural-reproduction/276663
(
Interesting:
“Some research labs developed array prototypes that use way more
capsules,” said Markus Noisternig from Ircam (Institute for Research
and Coordination in Acoustics/Music, Paris). “At Ircam we are using 64
high-quality back-electret capsules for musical recordings, and are
working towards a new prototype using 256 MEMS microphones. The more
capsules, the higher the spatial resolution, the more precise the
binaural transcoding.”
But can they overcome certain noise problems? Spatial resolution is not
all....)
II. (2nd part...)
http://www.tvtechnology.com/audio/0014/deriving-hrtfs-and-the-aes692015-file-format/276920
BEM is a very computationally demanding process. “Computations could
take a half a day on a powerful computer or less on a computer
cluster,” Noisternig said. “There would be the option to upload the
mesh-grid to a server that processes the data on a huge cluster and
sends back the rendering results. Anyway, we are very far from doing
this on a smartphone.”
There might be ways to reduce computational complexity. (But this is
also a research topic for Microsoft, Oculus and others.)
Is this a ridiculous idea or does it have some standing? Would it be very CPU
intensive or just a matter of supplying a spreadsheet with the IRCAM/CIPIC
measurements and comparing the subject’s answers to that?
The only accessible individualization method on mobile devices is to
find the best match with HRTFs from huge databases,” Noisternig said.
This method uses a best-fit model that doesn’t involve scanning the
head, but rather uses some biometric data that can easily be measured,
like head radius or the distance between the ears. This information is
sent to HRTF databases, which propose possible HRTF matches to the
listener, along with a test recording that contains localization
information. It is expected that there will be some trial and error,
as the listener selects the best match.
This is a mixed selection process, based on (basic) biometric data and
refinement. ("listener selects the best match")
Of course you could go with just one of the two steps. (see method of
AmbieXplorer version 2, which is "just" the 2nd step.)
In the future, devices like the Kinect (version 2) already allow to
capture a 3D model of the human head and torso. Pinnae would have to be
captured with relatively high resolution.
Best,
Stefan
P.S.: I would like to say again that BiLi and AES should lay open their
AES-69 standard. (Otherwise some de facto standard will be set by
Oculus, Microsoft or Mozilla Foundation, later and in any case.)
Surely, it’s far from perfect, but what other solutions do we currently have to
give binaural listeners the best possible outcome apart from getting themselves
measured or them going through a whole list of HRTF’s ?
Thank you very much for asking the right questions... ;-)
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit
account or options, view archives and so on.