Re: Are there now 64-bit processors that deal with denorms routinely with no exception or interrupt?

Sampo Syreeni Wed, 10 May 2023 19:42:21 -0700

On 2023-04-24, robert bristow-johnson wrote:

It's just easier and mathematically simpler to work in fixpoint.
Whoa! That's very interesting! Seems to me that the common sentimentwas to the contrary. With floating point you don't have to worry aboutscaling and trading off headroom with quantization noise floor.

This is two-pronged. If you "just want it to work, as if it was reals",then floats are easier. Indeed with denormals included too. But if you*really* want it to work *exactly* to the limit, and you know whatyou're doing, the linearity of fixpoint saves you a lot of trouble. In,say, the numerical stability analysis of your filters, and in thelatency and bother possibly coming from underflow exceptions.

When analysed to the hilt to beginwith, fixpoint is just far simpler.It's much more regular than floating point. When properly dithered, it'smore or less linear, which floating point is not, and can't be made soin any known way. You can shove the conventional LTI theory at fixpointeven in filter topology, whereas with floats, especially with denormals,you can not.

The basic example of this is a slowly, exponentially decaying reverbtail. Something like that is a numerical nightmare in float arithmetic.Your sound will inevitably decay into the denormalised range, as thetypical case arising from zero input. Then you'll have to take in loudersounds, so you're suddenly forced to sum denormals to whatever louder.Fuck. This is then the typical thing in music, with any kind of decentdynamic range, and pauses.

In fixpoint of sufficient width, you just dither everything on input,mind your gain structure, and let your filters decay down into the noisefloor. In the stochastically linear fashion that theory guarantees.

Obviously it's much more difficult than this in the end. Using theeasiest additive TPDF dither we now typically use, you'll be addingnoise at every step of the way. It adds up, so intermediaterepresentations in fixpoint might need *lots* more precicion than 24bits. Doing complex filter topologies, you'd theoretically need addnoise every step of the way, if you can't prove every step of the wayscales the noise down. Which you usually can't do, or won't have theknowhow to show. Also, if you do subtractive dither — the ideal, and myfavourite — no general theory exists of how to use it within entireprocessing topologies.

So maybe you'd have to go even towards the 64-bit range. Really wide.Especially since audio editing software and editing practice has beengoing towards stupendously many simultaneously sounding little clips ofsound, summed together. There the background noise compounds not onlyfrom individual sources, but from all of the processing applied to them.If you do the math in absolute amplitude like I like to do, you can seeit really does compound, about inversa quadratically in the number ofsources (also: overlaying edits), and quadratically in the number ofconnections in FIR filters and their internal connections. In IIR work,much faster even, and there you can't even linearize too well viadithering, so that your filter topology easily ends up nonlinear. Cf.https://urldefense.proofpoint.com/v2/url?u=https-3A__timbreluces.com_assets_sacd.pdf&d=DwIDaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=TRvFbpof3kTa2q5hdjI2hccynPix7hNL2n0I6DmlDy0&m=OaFhI1ty5xoUgtdkgSc2oha-1kI1pftPFhotOSuiq1Bukx_wnO3dseEV1kp7YjHW&s=7joJnqk5sIFtTxZVHqNqURUNHH9CNjkp9nGoTCC-nr0&e= , the analysis generalises fromjust delta-sigma-ADC to everything happening within your digitalfilter.

Then it's worse when doing floats. Because they're semi-logarithmic. Youcan't do optimal dithering with them. Especially you can't do it withthem through any network of basic LTI signal processing operations. Soyou'll be left out with an untenable mess of nonlinearity.

The tome I learnt my DSP-fu from was Alan V. Oppenheim's "Digital SignalProcessing", derived from his dissertation. In that there were plenty ofinteresting and useful ideas, ranging from the unification of continuousand discrete time Fourier theory, to finally even homomorphic signalprocessing as a novel idea. But in the third fourth of the treatise,also a principled treatment of certain nonlinear aspects of DSP, such aslimit cycles and dead bands.

That's the stuff that matters, here. How linear the notionally lineardigital circuits we build, actually are. What can be done to linearizethem further. How compositional and compositionally linear can theyreally be.

Because just as an example, consider a least-significant-bit worth ofpositive bias coming from a 16-bit ADC, into a signal chain handlingbasic 32-bit floats. Unless every stage of your filter topology ismathematically guaranteed to attenuate DC fast enough, that bias/DC willpropagate into the next stage of the filter, and in a recirculating IIRtopology, might break numerical stability. Soon and definitely wouldcross from just affecting the mantissa, to crossing a threshold to thenext value of the exponent, in a float representation. Which is thenhihgly nonlinear.

Then, when that happens, you can often hear the transition. It'stypically low level, but it can still be heard. It sounds like analiasing transition, with *all* of the digitally, aliasingly induced"metallic" harmonics being induced at the same time, transiently "for noapparent reason".

This mostly doesn't happen in fixpoint when you know what you're doing.Because even 24/44 is more or less linear and so analyzable in theclassical LTI framework. Because of the quadratic scaling of noise, andhow we do gain structure in the studio, we can even sum lots of soundsources and edited clips over each other at 32-bit fix, without buildingup noise beyond the hearing threshold of a human.

But building up truly and provably silent digital filters... That takesreal effort. It's a thingy most and especially I really struggle with.And that problem won't be solved with wider floats of fixeds, as if youcould just code and let your numerical accuracy mayhem to the gods. No,no-no, if you actually want to code properly, it takes hard math. Itreally does, even beyond my capability.

But you should not *have* to scale your sums with floating pointanyway.

But you do: floats are a semi-logarithitmic representation of the realline. That's what makes floats so horrific to beginwith. They aren'treally suitable for LTI processing, but, if anything, to something likeastronomy, where we deal with widely differing degrees of scale. Thingsunlinear, unlike how we deal with linear wave phenomena such as sound,and how linearly we as people tend to perceive them.

(And sorry, I might be responding to myself. If so, you ought to bechevroning my post as well...well. Top-posting in particular isdifficult to answer to, in a principled fashion. Every tail of another'spost ought to be cut short. <3 )

--

Sampo Syreeni, aka decoy - [email protected], https://urldefense.proofpoint.com/v2/url?u=http-3A__decoy.iki.fi_front&d=DwIDaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=TRvFbpof3kTa2q5hdjI2hccynPix7hNL2n0I6DmlDy0&m=OaFhI1ty5xoUgtdkgSc2oha-1kI1pftPFhotOSuiq1Bukx_wnO3dseEV1kp7YjHW&s=zlBxzf7-hdqK-ldIqQmCxON3H37iaJ7RX6d0nCIHFlM&e=+358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Re: Are there now 64-bit processors that deal with denorms routinely with no exception or interrupt?

Reply via email to