Re: [Sursound] the recent 2-channel 3D sound formats and their viability for actual 360 degree sound

Sampo Syreeni Sat, 23 Jul 2011 19:15:37 -0700

On 2011-07-23, dave.mal...@york.ac.uk wrote:

(For some reason Dave H's post didn't arrive at me as-is. Dave Malhamappears once quoted and Dave Hunt as twice, in below. Sorry about thehassle.)

I have an interesting question (well, I think it's interesting). TheSoundfield microphone, like any directional microphone, has a boostedbass response to close sounds. When listening to this through aspeaker rig, we hear this boost and tend to interpret it as meaningthe sound is close especially in a dry acoustic with a Greene-Lee headbrace etc., etc.,. However, surely (unless I am being more dense thanusual tonight) this is a learnt response based on the behaviour wehave heard from directional mics?

To a degree it is. I think the worst thing here is that we continuouslyclose-mic acoustic sources, to "give them that close, intimate feel".That's originally about the proximity effect and about HF attenuationwith distance, but since it's now a part of a culturally shared idiom aswell, we associate extra power with it as such, beyond the merepsychoacoustics. Electronic distortion with its synthetic overtonesdoesn't help here, either.

But still, at the very bottom, you'd have this very same reactionbecause of the pure, physical acoustical proximity effect. E.g. I'mpretty sure someone like Philippo Fazi could explain with neatvisualisation how the boundary conditions represented by the human headturn this curved wavefront reactivity into a noticeable bass-boost atthe tympanic membrane, given the shape of the human head, upper torsoand pinnae. It really isn't just about the reproduction technology; it'sabout how soundfields work and how we naturally perceive them fromwithin them.

After all, taken individually, at those sort of frequencies our earsare essentially omnidirectional and not subject to bass boost (toanything like the same degree).

The NFC-HOA theory doesn't take into account how we hear things. It's aphysical theory and not a psychoacoustic one. It just recreatessoundfields, and lets us hear them as we happen to do. If we don't hearthose spatially differentiated low frequencies, so what? They were stillrecreated with perfect fidelity, and we heard them as such. If perhapslosing some of the impact in the process, given that we're no bluewhales.

If you then look at the second NFC-HOA paper, they actually exploit thisfact to arrive at a lower energy intake transmission pipe. They don't gointo psychoacoustics just yet, but they do cut out the huge lowfrequency anti-phase signals which come from the naïve soundfieldreconstruction math.

You're right that POA assumes plane waves. The encoded signals arereproduced at the distance of the loudspeakers.

No. Those two are in direct contradiction. If a point source comes fromthe surface of the rig, it's going to be a spherical wave originatingfrom that distance. If it's a planewave instead, it's either just aplanewave, or a (very tight) approximation of an infinitely far-awaypoint source (that also being a plane wave when you measure it locally).

The shelf filters in a BLaH compliant decoder are (as I understandit) an attempt to compensate for the speakers finite distance, andthat they don't produce plane waves at the listener. This is oftenreferred to as 'distance compensation'.

Sorta. In the conventional ambisonic decoder there are two separatecircuits. The first (set) is the shelf one, which is based onpsychoacoustics alone. In a sense it tries to compensate for the verylow, seventies bandwidth of just four channels for periphony. It does soby frequency selectively varying the relative amplitude of the zerothand first components. At low frequencies it goes with a velocity, orsystematic decode, because there we seem to (or seemed to) hearinteraural phase differences pretty well, but not the amplitude ones.Going higher up, the shelf filtering is optimized to reproduce powerdifferences between the two ears, which seem(ed) to work pretty well forboth a) a well-centered single listener, and b) for purely mathematical,statistical reasons for non-centered listeners as well.

This first circuitry is never something you can switch off in aconventional ambisonic decoder, because it's about pure psychoacousticaloptimization, with regard to the pure, physical bound signal that isbeing received.

The distance compensation circuit on the other hand can be swiched on oroff, because it has to do with the receiver end rig geometry, which isthen naturally variable by design. In reality it too ought to be acontinuous knob, just as the aspect ratio knob in the four speakerdecoders is. But then this would lead to a total bitch of a filtercircuit in the analog domain, for little gain in the usual domesticsituation which ambisonic was originally aimed at. Thus, the classicaldecoders simply give you the choice of no distance compensation, or aswitch-in compensation tuned for something like a 3 metre rig diametre(or was it radius, I forget).

So the shelf filter has nothing to do with the form of the emittedwavefronts, or as such the recreation of proximity effects -- ifanything it distorts the reconstructed soundfield from the physicallyoptimal form, based on purely psychoacoustic criteria, and almost beyondrecognition if I might add. (Some critics of ambisonic have actuallyfallen into the trap of simply looking at what physically follows, andbecause of that said that "it couldn't work", while it clearly does.)

The distance compensation filter on the other hand does participate inwavefront reconstruction. It's sort of the "physical acousticconcession" in this whole shindig. It tries to negate the fact that thespeakers lie at a finite distance, so that they emit spherical waves,which by definition have a reactive, proximity component. It can do soonly very imperfectly, because it has to work at first order directionalaccuracy, and it can't do anything about the spatial aliasing caused bya discrete, widely spaced rig. But it still does a very important job,and one that is easily masked by how simple it really is in circuitry:that simplicity is caused by the basic physics and the symmetricityassumptions that go into classical decoder design, and not because thispart could be somehow overlooked or thought has of something to besubsumed by the flashier shelf part.

(Thanks to the linear time invariance and the spatial symmetry of thewhole classical ambisonic setup, you can permute the circuitry everywhich way. So that you might not even distinguish from the diagram whatpart of the circuitry belongs to the shelf filter and what to somethingelse.)

Me too, but as I remember it tries to build the 'distancecompensation' into the encoding, and thus is dependent on thedistance of the loudspeakers.

Correct. The gimmick is that when you do both the encoding and thedecoding in one step, you get limited, guaranteed amplifications forfinite rig sizes. No possibility of channel overload. Whereas if you doit the old way, and separate the source and the rig term, in between youcan have unlimited LF amplification, overloading both analog and digitalchannels.

Of course this is nonsense in practice. Even the closest sourcesregister only finite bass amplification on a (theoretically and almosteven practically) perfect SoundField mic. That's because there's no suchthing as a physical monopole, even. At close range even small sourceswill start to take on sizable, discernible features, stop being a pointsource, and start to lead to phase cancellation from their constituentparts. But when you go with the pure, synthetic, continuous time/spacetheory, you have to worry about this sort of thing.

Thus the encoding is only suitable for an identical or similar rig,and is not transferable to other rigs.

True, but the paper does tell you how to convert from one diameter toanother.

Amplitude/delay based systems such as WFS, Delta stereophony andTiMax have similar problems. The encoding has to be matched to thespeaker rig.

But then those make simplifying assumptions which are embedded into thetransfer stream as well. NFC-HOA does not.

We're still left with the "40 foot high geese" problem.

This is then something we've never really sorted out. I don't think I'veseen any plausible, final explanation for why the geese should be solarge upon culling. So to speak.

If you could do the encoding assuming a given speaker distance, thenmodify the decoding for a different distance it might help, thoughI've no idea how to do this.

You can't, unless you modify the decoder, as in NFC-HOA work. But intheory you could take some very near-fieldaveraged-over-direction-to-first-order HRTF set, and fade into that. Ialso think I've referred to Dylan Menzies's work with such sweet spotrescalings.

--
Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

Re: [Sursound] the recent 2-channel 3D sound formats and their viability for actual 360 degree sound

Reply via email to