On 2011-07-23, dave.mal...@york.ac.uk wrote:

(For some reason Dave H's post didn't arrive at me as-is. Dave Malham appears once quoted and Dave Hunt as twice, in below. Sorry about the hassle.)

I have an interesting question (well, I think it's interesting). The Soundfield microphone, like any directional microphone, has a boosted bass response to close sounds. When listening to this through a speaker rig, we hear this boost and tend to interpret it as meaning the sound is close especially in a dry acoustic with a Greene-Lee head brace etc., etc.,. However, surely (unless I am being more dense than usual tonight) this is a learnt response based on the behaviour we have heard from directional mics?

To a degree it is. I think the worst thing here is that we continuously close-mic acoustic sources, to "give them that close, intimate feel". That's originally about the proximity effect and about HF attenuation with distance, but since it's now a part of a culturally shared idiom as well, we associate extra power with it as such, beyond the mere psychoacoustics. Electronic distortion with its synthetic overtones doesn't help here, either.

But still, at the very bottom, you'd have this very same reaction because of the pure, physical acoustical proximity effect. E.g. I'm pretty sure someone like Philippo Fazi could explain with neat visualisation how the boundary conditions represented by the human head turn this curved wavefront reactivity into a noticeable bass-boost at the tympanic membrane, given the shape of the human head, upper torso and pinnae. It really isn't just about the reproduction technology; it's about how soundfields work and how we naturally perceive them from within them.

After all, taken individually, at those sort of frequencies our ears are essentially omnidirectional and not subject to bass boost (to anything like the same degree).

The NFC-HOA theory doesn't take into account how we hear things. It's a physical theory and not a psychoacoustic one. It just recreates soundfields, and lets us hear them as we happen to do. If we don't hear those spatially differentiated low frequencies, so what? They were still recreated with perfect fidelity, and we heard them as such. If perhaps losing some of the impact in the process, given that we're no blue whales.

If you then look at the second NFC-HOA paper, they actually exploit this fact to arrive at a lower energy intake transmission pipe. They don't go into psychoacoustics just yet, but they do cut out the huge low frequency anti-phase signals which come from the naïve soundfield reconstruction math.

You're right that POA assumes plane waves. The encoded signals are reproduced at the distance of the loudspeakers.

No. Those two are in direct contradiction. If a point source comes from the surface of the rig, it's going to be a spherical wave originating from that distance. If it's a planewave instead, it's either just a planewave, or a (very tight) approximation of an infinitely far-away point source (that also being a plane wave when you measure it locally).

The shelf filters in a BLaH compliant decoder are (as I understand it) an attempt to compensate for the speakers finite distance, and that they don't produce plane waves at the listener. This is often referred to as 'distance compensation'.

Sorta. In the conventional ambisonic decoder there are two separate circuits. The first (set) is the shelf one, which is based on psychoacoustics alone. In a sense it tries to compensate for the very low, seventies bandwidth of just four channels for periphony. It does so by frequency selectively varying the relative amplitude of the zeroth and first components. At low frequencies it goes with a velocity, or systematic decode, because there we seem to (or seemed to) hear interaural phase differences pretty well, but not the amplitude ones. Going higher up, the shelf filtering is optimized to reproduce power differences between the two ears, which seem(ed) to work pretty well for both a) a well-centered single listener, and b) for purely mathematical, statistical reasons for non-centered listeners as well.

This first circuitry is never something you can switch off in a conventional ambisonic decoder, because it's about pure psychoacoustical optimization, with regard to the pure, physical bound signal that is being received.

The distance compensation circuit on the other hand can be swiched on or off, because it has to do with the receiver end rig geometry, which is then naturally variable by design. In reality it too ought to be a continuous knob, just as the aspect ratio knob in the four speaker decoders is. But then this would lead to a total bitch of a filter circuit in the analog domain, for little gain in the usual domestic situation which ambisonic was originally aimed at. Thus, the classical decoders simply give you the choice of no distance compensation, or a switch-in compensation tuned for something like a 3 metre rig diametre (or was it radius, I forget).

So the shelf filter has nothing to do with the form of the emitted wavefronts, or as such the recreation of proximity effects -- if anything it distorts the reconstructed soundfield from the physically optimal form, based on purely psychoacoustic criteria, and almost beyond recognition if I might add. (Some critics of ambisonic have actually fallen into the trap of simply looking at what physically follows, and because of that said that "it couldn't work", while it clearly does.)

The distance compensation filter on the other hand does participate in wavefront reconstruction. It's sort of the "physical acoustic concession" in this whole shindig. It tries to negate the fact that the speakers lie at a finite distance, so that they emit spherical waves, which by definition have a reactive, proximity component. It can do so only very imperfectly, because it has to work at first order directional accuracy, and it can't do anything about the spatial aliasing caused by a discrete, widely spaced rig. But it still does a very important job, and one that is easily masked by how simple it really is in circuitry: that simplicity is caused by the basic physics and the symmetricity assumptions that go into classical decoder design, and not because this part could be somehow overlooked or thought has of something to be subsumed by the flashier shelf part.

(Thanks to the linear time invariance and the spatial symmetry of the whole classical ambisonic setup, you can permute the circuitry every which way. So that you might not even distinguish from the diagram what part of the circuitry belongs to the shelf filter and what to something else.)

Me too, but as I remember it tries to build the 'distance compensation' into the encoding, and thus is dependent on the distance of the loudspeakers.

Correct. The gimmick is that when you do both the encoding and the decoding in one step, you get limited, guaranteed amplifications for finite rig sizes. No possibility of channel overload. Whereas if you do it the old way, and separate the source and the rig term, in between you can have unlimited LF amplification, overloading both analog and digital channels.

Of course this is nonsense in practice. Even the closest sources register only finite bass amplification on a (theoretically and almost even practically) perfect SoundField mic. That's because there's no such thing as a physical monopole, even. At close range even small sources will start to take on sizable, discernible features, stop being a point source, and start to lead to phase cancellation from their constituent parts. But when you go with the pure, synthetic, continuous time/space theory, you have to worry about this sort of thing.

Thus the encoding is only suitable for an identical or similar rig, and is not transferable to other rigs.

True, but the paper does tell you how to convert from one diameter to another.

Amplitude/delay based systems such as WFS, Delta stereophony and TiMax have similar problems. The encoding has to be matched to the speaker rig.

But then those make simplifying assumptions which are embedded into the transfer stream as well. NFC-HOA does not.

We're still left with the "40 foot high geese" problem.

This is then something we've never really sorted out. I don't think I've seen any plausible, final explanation for why the geese should be so large upon culling. So to speak.

If you could do the encoding assuming a given speaker distance, then modify the decoding for a different distance it might help, though I've no idea how to do this.

You can't, unless you modify the decoder, as in NFC-HOA work. But in theory you could take some very near-field averaged-over-direction-to-first-order HRTF set, and fade into that. I also think I've referred to Dylan Menzies's work with such sweet spot rescalings.
--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

Reply via email to