On 2011-07-23, dave.mal...@york.ac.uk wrote:
(For some reason Dave H's post didn't arrive at me as-is. Dave Malham
appears once quoted and Dave Hunt as twice, in below. Sorry about the
hassle.)
I have an interesting question (well, I think it's interesting). The
Soundfield microphone, like any directional microphone, has a boosted
bass response to close sounds. When listening to this through a
speaker rig, we hear this boost and tend to interpret it as meaning
the sound is close especially in a dry acoustic with a Greene-Lee head
brace etc., etc.,. However, surely (unless I am being more dense than
usual tonight) this is a learnt response based on the behaviour we
have heard from directional mics?
To a degree it is. I think the worst thing here is that we continuously
close-mic acoustic sources, to "give them that close, intimate feel".
That's originally about the proximity effect and about HF attenuation
with distance, but since it's now a part of a culturally shared idiom as
well, we associate extra power with it as such, beyond the mere
psychoacoustics. Electronic distortion with its synthetic overtones
doesn't help here, either.
But still, at the very bottom, you'd have this very same reaction
because of the pure, physical acoustical proximity effect. E.g. I'm
pretty sure someone like Philippo Fazi could explain with neat
visualisation how the boundary conditions represented by the human head
turn this curved wavefront reactivity into a noticeable bass-boost at
the tympanic membrane, given the shape of the human head, upper torso
and pinnae. It really isn't just about the reproduction technology; it's
about how soundfields work and how we naturally perceive them from
within them.
After all, taken individually, at those sort of frequencies our ears
are essentially omnidirectional and not subject to bass boost (to
anything like the same degree).
The NFC-HOA theory doesn't take into account how we hear things. It's a
physical theory and not a psychoacoustic one. It just recreates
soundfields, and lets us hear them as we happen to do. If we don't hear
those spatially differentiated low frequencies, so what? They were still
recreated with perfect fidelity, and we heard them as such. If perhaps
losing some of the impact in the process, given that we're no blue
whales.
If you then look at the second NFC-HOA paper, they actually exploit this
fact to arrive at a lower energy intake transmission pipe. They don't go
into psychoacoustics just yet, but they do cut out the huge low
frequency anti-phase signals which come from the naïve soundfield
reconstruction math.
You're right that POA assumes plane waves. The encoded signals are
reproduced at the distance of the loudspeakers.
No. Those two are in direct contradiction. If a point source comes from
the surface of the rig, it's going to be a spherical wave originating
from that distance. If it's a planewave instead, it's either just a
planewave, or a (very tight) approximation of an infinitely far-away
point source (that also being a plane wave when you measure it locally).
The shelf filters in a BLaH compliant decoder are (as I understand
it) an attempt to compensate for the speakers finite distance, and
that they don't produce plane waves at the listener. This is often
referred to as 'distance compensation'.
Sorta. In the conventional ambisonic decoder there are two separate
circuits. The first (set) is the shelf one, which is based on
psychoacoustics alone. In a sense it tries to compensate for the very
low, seventies bandwidth of just four channels for periphony. It does so
by frequency selectively varying the relative amplitude of the zeroth
and first components. At low frequencies it goes with a velocity, or
systematic decode, because there we seem to (or seemed to) hear
interaural phase differences pretty well, but not the amplitude ones.
Going higher up, the shelf filtering is optimized to reproduce power
differences between the two ears, which seem(ed) to work pretty well for
both a) a well-centered single listener, and b) for purely mathematical,
statistical reasons for non-centered listeners as well.
This first circuitry is never something you can switch off in a
conventional ambisonic decoder, because it's about pure psychoacoustical
optimization, with regard to the pure, physical bound signal that is
being received.
The distance compensation circuit on the other hand can be swiched on or
off, because it has to do with the receiver end rig geometry, which is
then naturally variable by design. In reality it too ought to be a
continuous knob, just as the aspect ratio knob in the four speaker
decoders is. But then this would lead to a total bitch of a filter
circuit in the analog domain, for little gain in the usual domestic
situation which ambisonic was originally aimed at. Thus, the classical
decoders simply give you the choice of no distance compensation, or a
switch-in compensation tuned for something like a 3 metre rig diametre
(or was it radius, I forget).
So the shelf filter has nothing to do with the form of the emitted
wavefronts, or as such the recreation of proximity effects -- if
anything it distorts the reconstructed soundfield from the physically
optimal form, based on purely psychoacoustic criteria, and almost beyond
recognition if I might add. (Some critics of ambisonic have actually
fallen into the trap of simply looking at what physically follows, and
because of that said that "it couldn't work", while it clearly does.)
The distance compensation filter on the other hand does participate in
wavefront reconstruction. It's sort of the "physical acoustic
concession" in this whole shindig. It tries to negate the fact that the
speakers lie at a finite distance, so that they emit spherical waves,
which by definition have a reactive, proximity component. It can do so
only very imperfectly, because it has to work at first order directional
accuracy, and it can't do anything about the spatial aliasing caused by
a discrete, widely spaced rig. But it still does a very important job,
and one that is easily masked by how simple it really is in circuitry:
that simplicity is caused by the basic physics and the symmetricity
assumptions that go into classical decoder design, and not because this
part could be somehow overlooked or thought has of something to be
subsumed by the flashier shelf part.
(Thanks to the linear time invariance and the spatial symmetry of the
whole classical ambisonic setup, you can permute the circuitry every
which way. So that you might not even distinguish from the diagram what
part of the circuitry belongs to the shelf filter and what to something
else.)
Me too, but as I remember it tries to build the 'distance
compensation' into the encoding, and thus is dependent on the
distance of the loudspeakers.
Correct. The gimmick is that when you do both the encoding and the
decoding in one step, you get limited, guaranteed amplifications for
finite rig sizes. No possibility of channel overload. Whereas if you do
it the old way, and separate the source and the rig term, in between you
can have unlimited LF amplification, overloading both analog and digital
channels.
Of course this is nonsense in practice. Even the closest sources
register only finite bass amplification on a (theoretically and almost
even practically) perfect SoundField mic. That's because there's no such
thing as a physical monopole, even. At close range even small sources
will start to take on sizable, discernible features, stop being a point
source, and start to lead to phase cancellation from their constituent
parts. But when you go with the pure, synthetic, continuous time/space
theory, you have to worry about this sort of thing.
Thus the encoding is only suitable for an identical or similar rig,
and is not transferable to other rigs.
True, but the paper does tell you how to convert from one diameter to
another.
Amplitude/delay based systems such as WFS, Delta stereophony and
TiMax have similar problems. The encoding has to be matched to the
speaker rig.
But then those make simplifying assumptions which are embedded into the
transfer stream as well. NFC-HOA does not.
We're still left with the "40 foot high geese" problem.
This is then something we've never really sorted out. I don't think I've
seen any plausible, final explanation for why the geese should be so
large upon culling. So to speak.
If you could do the encoding assuming a given speaker distance, then
modify the decoding for a different distance it might help, though
I've no idea how to do this.
You can't, unless you modify the decoder, as in NFC-HOA work. But in
theory you could take some very near-field
averaged-over-direction-to-first-order HRTF set, and fade into that. I
also think I've referred to Dylan Menzies's work with such sweet spot
rescalings.
--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound