Re: Are there now 64-bit processors that deal with denorms routinely with no exception or interrupt?

Sampo Syreeni Sat, 05 Aug 2023 13:54:55 -0700

On 2023-04-08, robert bristow-johnson wrote:

Of any width. You would have had it even in 32-bit fixpoint, and evenin 24-bit, if you minded your gain structure.
And that's what the promise is of 64-bit floats.


Earlier, that was the promise of 32-bit floats.

In the end you can't do proper error control without doing the numericalanalysis, at any bit depth or in any format. You just cannot avoid thework by naïvely going to higher and higher bit depths, because that'snot how our algorithms work. Especially the ones which are recursive,such as IIR filters in general topology: done even slightly wrong, theylead exponential amplification of any rounding error, noise,nonlinearity, and whatnot.

Done *right* after a proper, wholesale numerical analysis of thesignal chain, they can very well work at 16 bits signed, even for hifiwork. Done *wrong*, without the analysis, they can slip out of controland go into limit cycling in a few milliseconds, at thousands of bits ofotherwise well-calibrated floating point precision.

As two prototypical examples, try out a critical all-pass filter, or anLTI step-former, also perhaps critical, but sometimes even unstable inits topological intestine. Both of them seem critical or sometimes evenstable from end-to-end, but what happens inside is a different matteraltogether: you'll have signals cancelling each other out,mathematically speaking, but numerically, if they don't cancel out *to**the* *tee*, your topology will run out of control exponentially fast.No amount of headroom will suffice for more than milliseconds, at commonsampling rates. And since the signals which internally need to cancelout perfectly are at different amplitude levels, suddenly floating pointjust *kills* your numerical analysis, because it's *very* difficult toachieve full cancellation over separate exponents of the numericalrepresentation.

So we never have to worry about the gain structure until the samplesare converted to integer so to be outputted to a DAC or written to ared-book CD.

I believe we really do. And based on only the two examples above, I'dargue music-dsp and audio dsp as a whole are kind of in the dark here,to date. Too many of us just *make* nice sounding things and codebases,without taking numerical analysis of yore to heart. In what little I'veimplemented, I've turned a blind eye to that theory as well, yet everysingle time, something bad happened in the end. I couldn't guarantee mycode to produce what was expected, under stress and corner solutions andwhatnot — which obviously *was* needed since I'm a techno kind oftweaker.

Or, converted to mp3 or something like that.

The people who code that converter stuff, either tend to do theirnumerical analysis right, or rely on libraries, where the author did itfor them. Just take a look at Valin of XIPH and Opus's work.

There's absolutely *no* necessity to go into floats, theirnonlinearity, and the idea of denormalised numbers, at such wordwidths.
It's not a necessity like it was with the DSP56000. It's takingadvantage of the feature that you have a 64-bit CPU, 64 bit wide databus, dozens or hundreds of gigabytes of 64 bit wide memory. Now, howcheap is that environment now? If you have that, then why not use it?

Sure, use it for what it's good for: parallel acceleration. But don'tuse it as an excuse to neglect numerical analysis in algorithmdevelopment. The first approach will work, and has. The second will not.

The only why not that I can think of is some fucking denorm causing anexception and putting in a glitch in your output.

That cancellation problem in critical filters I told above is another:it's very difficult to control on floats. Especially if you dither yoursignals as you should: every here and there you'll get a crossing of thefloat exponent bound, and suddenly you lose relative accuracy betweenthe signals. Then that loss of cancellation increases explonentially; insome filter topologies as fast as two-fold per sample. That means itreaches the first nonlinearity imposed by the floating pointrepresentation at precisely one time step, and develops from there.

Denormals aren't pretty either. They are technically the lowest, mostlinear range of the float range, and so help. But only if both youroperands are already denormal, so that they behave as fixpoints would.No floating point system and definitely no IEEE standard has *any* meansof ascertaining whether both operands to any operation are in the lowerlinear regime, or if one is still "normal", i.e. in the piecewiselinear, in toto nonlinear regime.

A simple IIR will hit denorms when dead silence is going in after somenon-silence excited the states of the filter to non-zero values.

Assuming you're talking about an RC-kinda circuit of first order, yes,it does exponentially go down.

Now sum two of them together at a one sample delay, presenting theresulting combined signal in a floating point representation. Vary theidentical RC-constants and put in a pulse train to the parallel circut.

It ought to show surprisingly much nonlinear distortion from thefloat representation alone. It ought to show a lot even when dithered,because unlike fixpoint, float is much more difficult to ditherproperly.

So, fixpoint, with a proper gain structure, actually is easier toanalyse and work with than float. If you want to do it properly. Floatis the sloppy way which sometimes lets you gloss over the analysis, butin the end, it leads you into trouble.

Or if they do, they actually hit a nasty, low level, knee of
nonlinearity.
That's right and I don't want that to cause a fucking exception. ButI wanna keep having the utility of denorms.

So as Buffy would have it, what *is* *it*? What is the utility ofdenorms? I get it that they're there for a reason, but what do they*mean* and how do they *serve*?

I'd argue they're there for amateurs. I think even the designers of theIEEE standards said as much in text and supra: they're supposed to "notlose precision as fast near zero", i.e. they're supposed to finally gofrom (partly) geometric to linear representation near zero. To arepresentation closer to how we're taught real numbers should behave,and so towards more intuitionistic design of numerical algorithms, thanwhat is actually demanded if we willy-nilly use a floating point system,with its *much* more demanding numerical analysis.

Also, as I said above, multiple times, I'd argue that even 32-bitfloating point actually already *contains* within it a 23-24 fixed pointrange, perfect for representing sound. If only the gain structure iswell thought-out. It's nice at four even bytes, no need to pack it intothree, even if you could. But no need to actually utilize the exponenteven, leading us from discretely linear, universally ditherable LTI andother audio signal processing theory into inherently nonlinearterritory.

Listen, people here that know me from the 1990s, know that I was astaunch fix-point advocate.


So why not now, anymore?

If given an assignment of developing an audio processing system usingfixed-point math, I will not shrink away from the challenge, but**if** the project is "Hey we got this 64-bit ARM with FPU in it andgobs of memory, I don't want my code to be checking for saturation and"minding the gain structure". Fuck no.


My point is that you can't avoid the analysis even with 64-bit floats.

Maybe the real question is, what should the chips be doing *both* infixpoint *and* in float? Checking for saturation certainly seems to bethe norm in the GPU's of now, evendownto 8-bit float. (Yes, it seems tobe a thing, at least on the deep learning side of things. But there itmakes sense, unlike with audio.)

I will never expect audio calculations to go astronomic in scale.But, if my tool is a 64-bit processor that can do 64x64 to 64-bitresult in the same nanosecond instruction cycle as anything else (like32-bit fixed-point processing), why would I toss that headroom andlegroom away?

You might want to do that because a well-thoughtout gain structure at 16bits suddenly yields you four times the performance from your 64-bitaudio workstation. Maybe even more, because memory bandwidth tends to bethe limiting factor of today, pipelining delay in cache, in SDRAMarchitectures, alignment in recall in cache and TLD, whatnot. And youknow, wasn't it so that there's many of the algorithms we'd be talkingabout here in both computer architecture (especially GPGPU's) andalgorithms (say, matrix multiplication), actually scale a bit betterthan naïvely thought before? Like, doing our audio matrices as 8x8actually is a bit more efficient than doing a full 64?

It's only when a **final** sample value is getting output, that Ishould need to worry about gain, saturation, quantization, andnoise-shaping.

No, it is not, unless you can somehow represent all of the sample valuesperfectly along the way from perfect digitla input to perfect digitaloutput. As you then can't, you need to analyse fully what happens to theerrors. Do they accumulate or do they die down, over the overall gainstructure, do they cancel exactly, do they accumulate... You need to dooverall numerical analysis over the whole signal chain, and sometimes itneeds to be accurate to the bit.

That is much more difficult to do in a floating point representationthan in the fixpoint one, approximating the real number system and itsadditive group. It is especially and more differentially difficult to dounder the LTI-minded Shannon-Nyquist theorem, the Channel Rate Theoremand Cramer—Rao -theory, which additionally admit full subtractivedithering, and then the more unsatisfactory theory of cumulativeadditive dithering, perhaps best developed by Lipschitz and Vanderkooy,in their critique of Sony's SACD.

I shouldn't have to worry about it anywhere else. Not if I'm using a64-bit ARM.

You might not want tom but you still must. It's a bitch to be sure, and*I'm* too lazy to really do the legwork. But you still must. No otherway will land with you with a truly stable and trustworthy algorithm.

If need be, and if you *really* want me to do so, I can build you anexample of what I'm talking about. It'd take its time. But I sortathink, as a known better of me on the topic, you know where I'm comingfrom.

--

Sampo Syreeni, aka decoy - [email protected], https://urldefense.proofpoint.com/v2/url?u=http-3A__decoy.iki.fi_front&d=DwIDaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=TRvFbpof3kTa2q5hdjI2hccynPix7hNL2n0I6DmlDy0&m=deT6sVRiKd6l58IDeFVTtmGBtmd45PXA8YeBmpzpJfZTjxf61yzJuUz76yAcZJBA&s=xMlSk-3uCw-q4clI_6Bsml_mFW2mFIdpOhxcE9TYXM4&e=+358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Re: Are there now 64-bit processors that deal with denorms routinely with no exception or interrupt?

Reply via email to