Re: [Sursound] Decoding coefficients for non symmetrical setups

Eric Benjamin Wed, 29 Feb 2012 14:43:21 -0800

> Bruce Wiggins's (I hope) research was what started this fray out in the first 
>place
Yup.  And several others.  But the point is that there is a good deal more to 
be 
done, especially as you point out that:

> this sort of optimization retains the blackbox leanings of machine learning 
> as 
>a general discipline
Which would be OK, if the black box could give a meaningful rating of which 
decoders are good and which are bad, or more to the point, which is better than 
another.  But we're not to that point yet.

> how many actually take a look at the early bispectral model of Gerzon?
Took a look at.  But that's not the same thing as implementing it!

On the side of improving the psychoacoustic models I've been working on using 
spherical head models to predict the localization cues achieved and making in 
situ measurements of the ear signals of a real listener when listening to 
Ambisonic reproduction.  Some of this is available in:
"Why Ambisonics Does Work ", Benjamin, Lee and Heller, AES preprint 8242 (2010)

a paper which was semi-humorous but which also contains some good stuff.  I was 
partly unsuccessful at showing the relationship between Gerzon's Energy vector 
and ILDs and that is something which I will devote some further serious 
attention to soon.

> a more well-thought out optimization criterion, with some intelligent, 
>psychoacoustically minded regularization built in <snip> could still cut the 
>mustard
Ah, if only we could find some intelligence to apply to the problem.

Eric

----- Original Message ----
From: Sampo Syreeni <de...@iki.fi>
To: Surround Sound discussion group <sursound@music.vt.edu>
Sent: Wed, February 29, 2012 2:19:46 PM
Subject: Re: [Sursound] Decoding coefficients for non symmetrical setups

On 2012-02-29, Gregory Maxwell wrote:

>> Would an automated “blind" search algorithm possibly
> 
> Speaking of that, you probably want to search the list archives for a thread 
> I 
>started in 2009 titled:
> 
> "A stupid optimizer for irregular ambisonic layouts"
> 
> In it I provide the source for a simplistic decoder that uses a generic open 
>source blackbox non-linear optimizer library with a simple objective to make 
>matrixes.

Before They point it out themselves, I think the fourth installment of Blah 
does 
very much the same. And of course Bruce Wiggins's (I hope) research was what 
started this fray out in the first place. So, yes, this is something that seems 
to be recommended from more than one corner, with regard to irregular layouts. 
But still...

Personally what I find a bit worrisome is that this sort of optimization 
retains 
the blackbox leanings of machine learning as a general discipline. None of the 
ambisonic specific, closed form optimization literature, or the derived 
specifics of the base optimization problem, are being utilized. Instead the two 
(sometimes simultaneous, sometimes even not that) Gerzonian equations are being 
fed into one or another optimization framework, with no regard to what happens 
then, and without feeding in all of the age-old mathematical-physical knowhow 
of 
how those systems of equations behave. Like for instance psychoacoustical 
sensitivity estimates from the BBC era.

In addition to being a fan of black box algorithms, including all of the stuff 
that goes under the rubric of "data mining" (professionally I make my living as 
a database guy), I'm also a little bit of a skeptic towards the stuff. At least 
as far as the math I know and love suggests I should be.

For example, when using support vector machines to fit polynomial bases, how 
many people actually care to evaluate the Vapnik-Cervonekis bound intrinsic to 
the problem, and then bound it in a principled fashion before commencing to 
optimize numerically? That after all is the most principled framework in which 
to bound overfitting by the machine -- i.e. the very same thing which leads to 
speaker detent within the ambisonic framework, even after simple dimensional 
constraints have already been dealt with.

And how many actually take a look at the early bispectral model of Gerzon? Or 
the third one which name I don't remember right now? Even if those aren't 
backed 
up by psychoacoustics, they are still very, *very* relevant as (easily, 
formally, in-principled-fashion) saturable optimization criteria (in the usual 
ambisonic L^2 sense no less).

I don't think going with the easy route and just using blackbox optimizers does 
the job best, here. Instead, I would think we have to find a way to inject more 
and more current, analytically purified, psychoacoustic knowledge into the 
system, before we even start to optimize. Even if numerical optimization still 
remains the key in reaching a local optimum in this kind of a very difficult 
nonlinear optimization problem.

Once again, Robert Greene, please help me if I'm falling short on the hard 
math, 
somehow.

> I like the generic optimization approaches _more_ than more mathematically 
>elegant closed form solutions because it's easy to play around with the 
>objective functions— and usually any change to the objective makes your closed 
>form solutions need to start from scratch.

So to reiterate, numerical optimization is a must, because the most general 
problem seems to be analytically intractable. I'm even pretty sure that certain 
rig configurations could be shown to be impossible to solve using analytic 
means, and even instable around their steepest, global optimum if that was ever 
found.

At the same time, though, I think a more well-thought out optimization 
criterion, with some intelligent, psychoacoustically minded regularization 
built 
in, and perhaps utilizing not only the L^2 norm but also the L^1 at the same 
time, could still cut the mustard. That's only going to happen if we push more 
and more of the post-Gerzon psychoacoustic research into the optimization 
criterion and then use an optimization engine capable of dealing with that sort 
of thing.

That isn't being done now. Even to accelerate convergence, or to give a global, 
smooth starting point for the optimization procedure(s), or to regularize the 
eventual outcome. Why not? Are we really that lazy (well I am, but are the 
researchers in the feel as lazy as me as well?)

> https://people.xiph.org/~greg/ambisonics/ambi_opt.c

Under xiph.org? Ooh! Please, more of that. And then more reseach plus 
application in how to optimally code/decode even first order using Vorbis (or 
some derivative?).

> Giving a brief glance at the code, now with several more years of experience 
>with optimization— and I see that my objective function appears 
>differentiable.  
>If I were to do this again I'd probably use a C++ reverse mode automatic 
>differentiation library, so that I could get a version of the objective with 
>gradients.

Don't. With ambisonic, you will have to deal with both pantophony and 
periphony, 
and the transition between them is decidedly singular. No stock numerical 
library can deal with something like that, that I know of.

> My email archives indicate that Aaron Heller made a version with a bunch of 
>improvements like RME rE optimization, and adding direction mismatch between 
>rE 
>and rV as part of the objective.

Yes. That's part of the BLaH work, and very, *very* cool. But even there, the 
precise tradeoff between directional error in rV and rE seems to be more of an 
instintual decision than a one based on hard science. The resulting decoder is 
exceptionally good compared to anything preceding it, true, but I don't think 
it's necessarily the best, as a global solution, or especially that it would 
generalize too easily to higher orders.

> Somewhere I had some version with support for higher orders and 3d but I 
> don't 
>know where that is right now.

If you had it, I'd bet it'd suck -- if only a bit -- compared to the optimum 3D 
code we will eventually find.

> There are a lot of things you can do starting from a simple framework like 
>this.

Finally, no contest there. It's just that little nagging detail beyond which 
annoys me...so. :)
-- Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

Re: [Sursound] Decoding coefficients for non symmetrical setups

Reply via email to