On 2023-04-24, robert bristow-johnson wrote:
It's just easier and mathematically simpler to work in fixpoint.
Whoa! That's very interesting! Seems to me that the common sentiment
was to the contrary. With floating point you don't have to worry about
scaling and trading off headroom with quantization noise floor.
This is two-pronged. If you "just want it to work, as if it was reals",
then floats are easier. Indeed with denormals included too. But if you
*really* want it to work *exactly* to the limit, and you know what
you're doing, the linearity of fixpoint saves you a lot of trouble. In,
say, the numerical stability analysis of your filters, and in the
latency and bother possibly coming from underflow exceptions.
When analysed to the hilt to beginwith, fixpoint is just far simpler.
It's much more regular than floating point. When properly dithered, it's
more or less linear, which floating point is not, and can't be made so
in any known way. You can shove the conventional LTI theory at fixpoint
even in filter topology, whereas with floats, especially with denormals,
you can not.
The basic example of this is a slowly, exponentially decaying reverb
tail. Something like that is a numerical nightmare in float arithmetic.
Your sound will inevitably decay into the denormalised range, as the
typical case arising from zero input. Then you'll have to take in louder
sounds, so you're suddenly forced to sum denormals to whatever louder.
Fuck. This is then the typical thing in music, with any kind of decent
dynamic range, and pauses.
In fixpoint of sufficient width, you just dither everything on input,
mind your gain structure, and let your filters decay down into the noise
floor. In the stochastically linear fashion that theory guarantees.
Obviously it's much more difficult than this in the end. Using the
easiest additive TPDF dither we now typically use, you'll be adding
noise at every step of the way. It adds up, so intermediate
representations in fixpoint might need *lots* more precicion than 24
bits. Doing complex filter topologies, you'd theoretically need add
noise every step of the way, if you can't prove every step of the way
scales the noise down. Which you usually can't do, or won't have the
knowhow to show. Also, if you do subtractive dither — the ideal, and my
favourite — no general theory exists of how to use it within entire
processing topologies.
So maybe you'd have to go even towards the 64-bit range. Really wide.
Especially since audio editing software and editing practice has been
going towards stupendously many simultaneously sounding little clips of
sound, summed together. There the background noise compounds not only
from individual sources, but from all of the processing applied to them.
If you do the math in absolute amplitude like I like to do, you can see
it really does compound, about inversa quadratically in the number of
sources (also: overlaying edits), and quadratically in the number of
connections in FIR filters and their internal connections. In IIR work,
much faster even, and there you can't even linearize too well via
dithering, so that your filter topology easily ends up nonlinear. Cf.
https://urldefense.proofpoint.com/v2/url?u=https-3A__timbreluces.com_assets_sacd.pdf&d=DwIDaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=TRvFbpof3kTa2q5hdjI2hccynPix7hNL2n0I6DmlDy0&m=OaFhI1ty5xoUgtdkgSc2oha-1kI1pftPFhotOSuiq1Bukx_wnO3dseEV1kp7YjHW&s=7joJnqk5sIFtTxZVHqNqURUNHH9CNjkp9nGoTCC-nr0&e= , the analysis generalises from
just delta-sigma-ADC to everything happening within your digital
filter.
Then it's worse when doing floats. Because they're semi-logarithmic. You
can't do optimal dithering with them. Especially you can't do it with
them through any network of basic LTI signal processing operations. So
you'll be left out with an untenable mess of nonlinearity.
The tome I learnt my DSP-fu from was Alan V. Oppenheim's "Digital Signal
Processing", derived from his dissertation. In that there were plenty of
interesting and useful ideas, ranging from the unification of continuous
and discrete time Fourier theory, to finally even homomorphic signal
processing as a novel idea. But in the third fourth of the treatise,
also a principled treatment of certain nonlinear aspects of DSP, such as
limit cycles and dead bands.
That's the stuff that matters, here. How linear the notionally linear
digital circuits we build, actually are. What can be done to linearize
them further. How compositional and compositionally linear can they
really be.
Because just as an example, consider a least-significant-bit worth of
positive bias coming from a 16-bit ADC, into a signal chain handling
basic 32-bit floats. Unless every stage of your filter topology is
mathematically guaranteed to attenuate DC fast enough, that bias/DC will
propagate into the next stage of the filter, and in a recirculating IIR
topology, might break numerical stability. Soon and definitely would
cross from just affecting the mantissa, to crossing a threshold to the
next value of the exponent, in a float representation. Which is then
hihgly nonlinear.
Then, when that happens, you can often hear the transition. It's
typically low level, but it can still be heard. It sounds like an
aliasing transition, with *all* of the digitally, aliasingly induced
"metallic" harmonics being induced at the same time, transiently "for no
apparent reason".
This mostly doesn't happen in fixpoint when you know what you're doing.
Because even 24/44 is more or less linear and so analyzable in the
classical LTI framework. Because of the quadratic scaling of noise, and
how we do gain structure in the studio, we can even sum lots of sound
sources and edited clips over each other at 32-bit fix, without building
up noise beyond the hearing threshold of a human.
But building up truly and provably silent digital filters... That takes
real effort. It's a thingy most and especially I really struggle with.
And that problem won't be solved with wider floats of fixeds, as if you
could just code and let your numerical accuracy mayhem to the gods. No,
no-no, if you actually want to code properly, it takes hard math. It
really does, even beyond my capability.
But you should not *have* to scale your sums with floating point
anyway.
But you do: floats are a semi-logarithitmic representation of the real
line. That's what makes floats so horrific to beginwith. They aren't
really suitable for LTI processing, but, if anything, to something like
astronomy, where we deal with widely differing degrees of scale. Things
unlinear, unlike how we deal with linear wave phenomena such as sound,
and how linearly we as people tend to perceive them.
(And sorry, I might be responding to myself. If so, you ought to be
chevroning my post as well...well. Top-posting in particular is
difficult to answer to, in a principled fashion. Every tail of another's
post ought to be cut short. <3 )
--
Sampo Syreeni, aka decoy - [email protected], https://urldefense.proofpoint.com/v2/url?u=http-3A__decoy.iki.fi_front&d=DwIDaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=TRvFbpof3kTa2q5hdjI2hccynPix7hNL2n0I6DmlDy0&m=OaFhI1ty5xoUgtdkgSc2oha-1kI1pftPFhotOSuiq1Bukx_wnO3dseEV1kp7YjHW&s=zlBxzf7-hdqK-ldIqQmCxON3H37iaJ7RX6d0nCIHFlM&e=
+358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2