On 2023-04-24, robert bristow-johnson wrote:

It's just easier and mathematically simpler to work in fixpoint.

Whoa! That's very interesting! Seems to me that the common sentiment was to the contrary. With floating point you don't have to worry about scaling and trading off headroom with quantization noise floor.

This is two-pronged. If you "just want it to work, as if it was reals", then floats are easier. Indeed with denormals included too. But if you *really* want it to work *exactly* to the limit, and you know what you're doing, the linearity of fixpoint saves you a lot of trouble. In, say, the numerical stability analysis of your filters, and in the latency and bother possibly coming from underflow exceptions.

When analysed to the hilt to beginwith, fixpoint is just far simpler. It's much more regular than floating point. When properly dithered, it's more or less linear, which floating point is not, and can't be made so in any known way. You can shove the conventional LTI theory at fixpoint even in filter topology, whereas with floats, especially with denormals, you can not.

The basic example of this is a slowly, exponentially decaying reverb tail. Something like that is a numerical nightmare in float arithmetic. Your sound will inevitably decay into the denormalised range, as the typical case arising from zero input. Then you'll have to take in louder sounds, so you're suddenly forced to sum denormals to whatever louder. Fuck. This is then the typical thing in music, with any kind of decent dynamic range, and pauses.

In fixpoint of sufficient width, you just dither everything on input, mind your gain structure, and let your filters decay down into the noise floor. In the stochastically linear fashion that theory guarantees.

Obviously it's much more difficult than this in the end. Using the easiest additive TPDF dither we now typically use, you'll be adding noise at every step of the way. It adds up, so intermediate representations in fixpoint might need *lots* more precicion than 24 bits. Doing complex filter topologies, you'd theoretically need add noise every step of the way, if you can't prove every step of the way scales the noise down. Which you usually can't do, or won't have the knowhow to show. Also, if you do subtractive dither — the ideal, and my favourite — no general theory exists of how to use it within entire processing topologies.

So maybe you'd have to go even towards the 64-bit range. Really wide. Especially since audio editing software and editing practice has been going towards stupendously many simultaneously sounding little clips of sound, summed together. There the background noise compounds not only from individual sources, but from all of the processing applied to them. If you do the math in absolute amplitude like I like to do, you can see it really does compound, about inversa quadratically in the number of sources (also: overlaying edits), and quadratically in the number of connections in FIR filters and their internal connections. In IIR work, much faster even, and there you can't even linearize too well via dithering, so that your filter topology easily ends up nonlinear. Cf. https://urldefense.proofpoint.com/v2/url?u=https-3A__timbreluces.com_assets_sacd.pdf&d=DwIDaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=TRvFbpof3kTa2q5hdjI2hccynPix7hNL2n0I6DmlDy0&m=OaFhI1ty5xoUgtdkgSc2oha-1kI1pftPFhotOSuiq1Bukx_wnO3dseEV1kp7YjHW&s=7joJnqk5sIFtTxZVHqNqURUNHH9CNjkp9nGoTCC-nr0&e= , the analysis generalises from just delta-sigma-ADC to everything happening within your digital filter.

Then it's worse when doing floats. Because they're semi-logarithmic. You can't do optimal dithering with them. Especially you can't do it with them through any network of basic LTI signal processing operations. So you'll be left out with an untenable mess of nonlinearity.

The tome I learnt my DSP-fu from was Alan V. Oppenheim's "Digital Signal Processing", derived from his dissertation. In that there were plenty of interesting and useful ideas, ranging from the unification of continuous and discrete time Fourier theory, to finally even homomorphic signal processing as a novel idea. But in the third fourth of the treatise, also a principled treatment of certain nonlinear aspects of DSP, such as limit cycles and dead bands.

That's the stuff that matters, here. How linear the notionally linear digital circuits we build, actually are. What can be done to linearize them further. How compositional and compositionally linear can they really be.

Because just as an example, consider a least-significant-bit worth of positive bias coming from a 16-bit ADC, into a signal chain handling basic 32-bit floats. Unless every stage of your filter topology is mathematically guaranteed to attenuate DC fast enough, that bias/DC will propagate into the next stage of the filter, and in a recirculating IIR topology, might break numerical stability. Soon and definitely would cross from just affecting the mantissa, to crossing a threshold to the next value of the exponent, in a float representation. Which is then hihgly nonlinear.

Then, when that happens, you can often hear the transition. It's typically low level, but it can still be heard. It sounds like an aliasing transition, with *all* of the digitally, aliasingly induced "metallic" harmonics being induced at the same time, transiently "for no apparent reason".

This mostly doesn't happen in fixpoint when you know what you're doing. Because even 24/44 is more or less linear and so analyzable in the classical LTI framework. Because of the quadratic scaling of noise, and how we do gain structure in the studio, we can even sum lots of sound sources and edited clips over each other at 32-bit fix, without building up noise beyond the hearing threshold of a human.

But building up truly and provably silent digital filters... That takes real effort. It's a thingy most and especially I really struggle with. And that problem won't be solved with wider floats of fixeds, as if you could just code and let your numerical accuracy mayhem to the gods. No, no-no, if you actually want to code properly, it takes hard math. It really does, even beyond my capability.

But you should not *have* to scale your sums with floating point anyway.

But you do: floats are a semi-logarithitmic representation of the real line. That's what makes floats so horrific to beginwith. They aren't really suitable for LTI processing, but, if anything, to something like astronomy, where we deal with widely differing degrees of scale. Things unlinear, unlike how we deal with linear wave phenomena such as sound, and how linearly we as people tend to perceive them.

(And sorry, I might be responding to myself. If so, you ought to be chevroning my post as well...well. Top-posting in particular is difficult to answer to, in a principled fashion. Every tail of another's post ought to be cut short. <3 )
--
Sampo Syreeni, aka decoy - [email protected], https://urldefense.proofpoint.com/v2/url?u=http-3A__decoy.iki.fi_front&d=DwIDaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=TRvFbpof3kTa2q5hdjI2hccynPix7hNL2n0I6DmlDy0&m=OaFhI1ty5xoUgtdkgSc2oha-1kI1pftPFhotOSuiq1Bukx_wnO3dseEV1kp7YjHW&s=zlBxzf7-hdqK-ldIqQmCxON3H37iaJ7RX6d0nCIHFlM&e= +358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Reply via email to