On 2023-04-08, robert bristow-johnson wrote:

Of any width. You would have had it even in 32-bit fixpoint, and even in 24-bit, if you minded your gain structure.

And that's what the promise is of 64-bit floats.

Earlier, that was the promise of 32-bit floats.

In the end you can't do proper error control without doing the numerical analysis, at any bit depth or in any format. You just cannot avoid the work by naïvely going to higher and higher bit depths, because that's not how our algorithms work. Especially the ones which are recursive, such as IIR filters in general topology: done even slightly wrong, they lead exponential amplification of any rounding error, noise, nonlinearity, and whatnot.

Done *right* after a proper, wholesale numerical analysis of the signal chain, they can very well work at 16 bits signed, even for hifi work. Done *wrong*, without the analysis, they can slip out of control and go into limit cycling in a few milliseconds, at thousands of bits of otherwise well-calibrated floating point precision.

As two prototypical examples, try out a critical all-pass filter, or an LTI step-former, also perhaps critical, but sometimes even unstable in its topological intestine. Both of them seem critical or sometimes even stable from end-to-end, but what happens inside is a different matter altogether: you'll have signals cancelling each other out, mathematically speaking, but numerically, if they don't cancel out *to* *the* *tee*, your topology will run out of control exponentially fast. No amount of headroom will suffice for more than milliseconds, at common sampling rates. And since the signals which internally need to cancel out perfectly are at different amplitude levels, suddenly floating point just *kills* your numerical analysis, because it's *very* difficult to achieve full cancellation over separate exponents of the numerical representation.

So we never have to worry about the gain structure until the samples are converted to integer so to be outputted to a DAC or written to a red-book CD.

I believe we really do. And based on only the two examples above, I'd argue music-dsp and audio dsp as a whole are kind of in the dark here, to date. Too many of us just *make* nice sounding things and codebases, without taking numerical analysis of yore to heart. In what little I've implemented, I've turned a blind eye to that theory as well, yet every single time, something bad happened in the end. I couldn't guarantee my code to produce what was expected, under stress and corner solutions and whatnot — which obviously *was* needed since I'm a techno kind of tweaker.

Or, converted to mp3 or something like that.

The people who code that converter stuff, either tend to do their numerical analysis right, or rely on libraries, where the author did it for them. Just take a look at Valin of XIPH and Opus's work.

There's absolutely *no* necessity to go into floats, their nonlinearity, and the idea of denormalised numbers, at such word widths.

It's not a necessity like it was with the DSP56000. It's taking advantage of the feature that you have a 64-bit CPU, 64 bit wide data bus, dozens or hundreds of gigabytes of 64 bit wide memory. Now, how cheap is that environment now? If you have that, then why not use it?

Sure, use it for what it's good for: parallel acceleration. But don't use it as an excuse to neglect numerical analysis in algorithm development. The first approach will work, and has. The second will not.

The only why not that I can think of is some fucking denorm causing an exception and putting in a glitch in your output.

That cancellation problem in critical filters I told above is another: it's very difficult to control on floats. Especially if you dither your signals as you should: every here and there you'll get a crossing of the float exponent bound, and suddenly you lose relative accuracy between the signals. Then that loss of cancellation increases explonentially; in some filter topologies as fast as two-fold per sample. That means it reaches the first nonlinearity imposed by the floating point representation at precisely one time step, and develops from there.

Denormals aren't pretty either. They are technically the lowest, most linear range of the float range, and so help. But only if both your operands are already denormal, so that they behave as fixpoints would. No floating point system and definitely no IEEE standard has *any* means of ascertaining whether both operands to any operation are in the lower linear regime, or if one is still "normal", i.e. in the piecewise linear, in toto nonlinear regime.

A simple IIR will hit denorms when dead silence is going in after some non-silence excited the states of the filter to non-zero values.

Assuming you're talking about an RC-kinda circuit of first order, yes, it does exponentially go down.

Now sum two of them together at a one sample delay, presenting the resulting combined signal in a floating point representation. Vary the identical RC-constants and put in a pulse train to the parallel circut.

It ought to show surprisingly much nonlinear distortion from the float representation alone. It ought to show a lot even when dithered, because unlike fixpoint, float is much more difficult to dither properly.

So, fixpoint, with a proper gain structure, actually is easier to analyse and work with than float. If you want to do it properly. Float is the sloppy way which sometimes lets you gloss over the analysis, but in the end, it leads you into trouble.

Or if they do, they actually hit a nasty, low level, knee of
nonlinearity.

That's right and I don't want that to cause a fucking exception. But I wanna keep having the utility of denorms.

So as Buffy would have it, what *is* *it*? What is the utility of denorms? I get it that they're there for a reason, but what do they *mean* and how do they *serve*?

I'd argue they're there for amateurs. I think even the designers of the IEEE standards said as much in text and supra: they're supposed to "not lose precision as fast near zero", i.e. they're supposed to finally go from (partly) geometric to linear representation near zero. To a representation closer to how we're taught real numbers should behave, and so towards more intuitionistic design of numerical algorithms, than what is actually demanded if we willy-nilly use a floating point system, with its *much* more demanding numerical analysis.

Also, as I said above, multiple times, I'd argue that even 32-bit floating point actually already *contains* within it a 23-24 fixed point range, perfect for representing sound. If only the gain structure is well thought-out. It's nice at four even bytes, no need to pack it into three, even if you could. But no need to actually utilize the exponent even, leading us from discretely linear, universally ditherable LTI and other audio signal processing theory into inherently nonlinear territory.

Listen, people here that know me from the 1990s, know that I was a staunch fix-point advocate.

So why not now, anymore?

If given an assignment of developing an audio processing system using fixed-point math, I will not shrink away from the challenge, but **if** the project is "Hey we got this 64-bit ARM with FPU in it and gobs of memory, I don't want my code to be checking for saturation and "minding the gain structure". Fuck no.

My point is that you can't avoid the analysis even with 64-bit floats.

Maybe the real question is, what should the chips be doing *both* in fixpoint *and* in float? Checking for saturation certainly seems to be the norm in the GPU's of now, evendownto 8-bit float. (Yes, it seems to be a thing, at least on the deep learning side of things. But there it makes sense, unlike with audio.)

I will never expect audio calculations to go astronomic in scale. But, if my tool is a 64-bit processor that can do 64x64 to 64-bit result in the same nanosecond instruction cycle as anything else (like 32-bit fixed-point processing), why would I toss that headroom and legroom away?

You might want to do that because a well-thoughtout gain structure at 16 bits suddenly yields you four times the performance from your 64-bit audio workstation. Maybe even more, because memory bandwidth tends to be the limiting factor of today, pipelining delay in cache, in SDRAM architectures, alignment in recall in cache and TLD, whatnot. And you know, wasn't it so that there's many of the algorithms we'd be talking about here in both computer architecture (especially GPGPU's) and algorithms (say, matrix multiplication), actually scale a bit better than naïvely thought before? Like, doing our audio matrices as 8x8 actually is a bit more efficient than doing a full 64?

It's only when a **final** sample value is getting output, that I should need to worry about gain, saturation, quantization, and noise-shaping.

No, it is not, unless you can somehow represent all of the sample values perfectly along the way from perfect digitla input to perfect digital output. As you then can't, you need to analyse fully what happens to the errors. Do they accumulate or do they die down, over the overall gain structure, do they cancel exactly, do they accumulate... You need to do overall numerical analysis over the whole signal chain, and sometimes it needs to be accurate to the bit.

That is much more difficult to do in a floating point representation than in the fixpoint one, approximating the real number system and its additive group. It is especially and more differentially difficult to do under the LTI-minded Shannon-Nyquist theorem, the Channel Rate Theorem and Cramer—Rao -theory, which additionally admit full subtractive dithering, and then the more unsatisfactory theory of cumulative additive dithering, perhaps best developed by Lipschitz and Vanderkooy, in their critique of Sony's SACD.

I shouldn't have to worry about it anywhere else. Not if I'm using a 64-bit ARM.

You might not want tom but you still must. It's a bitch to be sure, and *I'm* too lazy to really do the legwork. But you still must. No other way will land with you with a truly stable and trustworthy algorithm.

If need be, and if you *really* want me to do so, I can build you an example of what I'm talking about. It'd take its time. But I sorta think, as a known better of me on the topic, you know where I'm coming from.
--
Sampo Syreeni, aka decoy - [email protected], https://urldefense.proofpoint.com/v2/url?u=http-3A__decoy.iki.fi_front&d=DwIDaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=TRvFbpof3kTa2q5hdjI2hccynPix7hNL2n0I6DmlDy0&m=deT6sVRiKd6l58IDeFVTtmGBtmd45PXA8YeBmpzpJfZTjxf61yzJuUz76yAcZJBA&s=xMlSk-3uCw-q4clI_6Bsml_mFW2mFIdpOhxcE9TYXM4&e= +358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Reply via email to