Re: [Sursound] about principled rendering of ambisonic to binaural

Fons Adriaensen Tue, 13 Sep 2022 07:22:22 -0700

On Sun, Sep 11, 2022 at 07:21:50PM +0300, Sampo Syreeni wrote:

> If the directional
> sampling was statistically uniform over the whole sphere of directions, and
> in addition the sample of directions probed was to be in quadrature, it
> would be an easy excercise in discrete summation to gain the transform
> matrix we need.


Even if that case it isn't as simple as you seem to think.

Any set of measured HRIR will need some non trivial preprocessing before
it can be used. One reason is low-frequency errors. Accurate IR measurements
below say 200 Hz are difficult (unless you have a very big and good anechoic
room). OTOH we know that HRIR in that frequency range are very low order and
can be synthesised quite easily.

Another reason is that you can't reduce a set of HRIR to low order (the order
of the content you want to render) without introducing significant new errors.
One way to reduce these is to reduce or even fully remove ITD at mid and high
frequencies, again depending on the order the renderer is supposed to support.
Getting the magnitudes (and hence ILD) accurate requires much lower order than
if you also want to keep the delays.

Compared to these and some other issues, not having a set on a regular grid
(e.g. t-design or Lebedev) is the least of problems you will encounter.

There are other considerations. For best results you need head tracking
and a plausible room sound (even if the content already includes its own).

> So the best framework I could think of, years past, was to try and
> interpolate the incoming directional point cloud from the KEMAR and other
> sets, to the whole sphere, and then integrate. Using a priori knowledge for
> the edge, singular cases, where a number of the empirical observations prove
> to be co-planar, and as such singular in inversion. I tried stuff such as
> information theoretical Kullback-Leibner divergence, and Vapnik-Cervonenkis
> dimension, in order to pare down the stuff. The thing I settled on was a
> kind of mutual recursion between the directional mutual information between
> empirical point gained/removed and Mahalanobis distance to each spherical
> harmonic added/removed. It ought to have worked.

The practical solutions do not depend on such concepts and are much more
ad-hoc. Some members of my team and myself worked on them for the last
three years. Most of the results are confidential, although others (e.g.
IEM) have arrived at some similar results and published them.  

Another question is if for high quality binaural rendering, starting from
Ambisonic content is a good idea at all. 

Simple fact is that if you want really good results you need very high
order, and

1. such content isn't available from direct recordings (we don't have even
10th order microphpones), so it has to be synthetic,

2. rendering it from an Ambisonic format would be very inefficient. For
example for order 20 you'd need 441 convolutions if you assume L/R head
symmetry, twice that number if you don't.

Compare this to rendering from object encoded content (i.e. mono signals 
plus directional metadata). You need only two convolutions per object.
Starting from a sufficiently dense HRIR set, you can easily generate a
new set on a regular grid with a few thousand points, and interpolate 
them (VBAP style) in real time. This can give you the same resolution
as e.g. order 40 Ambisonics at fraction of the complexity.

 
Ciao,

-- 
FA

_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound - unsubscribe here, edit 
account or options, view archives and so on.

Re: [Sursound] about principled rendering of ambisonic to binaural

Reply via email to