On 2023-04-08, robert bristow-johnson wrote:
Of any width. You would have had it even in 32-bit fixpoint, and even
in 24-bit, if you minded your gain structure.
And that's what the promise is of 64-bit floats.
Earlier, that was the promise of 32-bit floats.
In the end you can't do proper error control without doing the numerical
analysis, at any bit depth or in any format. You just cannot avoid the
work by naïvely going to higher and higher bit depths, because that's
not how our algorithms work. Especially the ones which are recursive,
such as IIR filters in general topology: done even slightly wrong, they
lead exponential amplification of any rounding error, noise,
nonlinearity, and whatnot.
Done *right* after a proper, wholesale numerical analysis of the
signal chain, they can very well work at 16 bits signed, even for hifi
work. Done *wrong*, without the analysis, they can slip out of control
and go into limit cycling in a few milliseconds, at thousands of bits of
otherwise well-calibrated floating point precision.
As two prototypical examples, try out a critical all-pass filter, or an
LTI step-former, also perhaps critical, but sometimes even unstable in
its topological intestine. Both of them seem critical or sometimes even
stable from end-to-end, but what happens inside is a different matter
altogether: you'll have signals cancelling each other out,
mathematically speaking, but numerically, if they don't cancel out *to*
*the* *tee*, your topology will run out of control exponentially fast.
No amount of headroom will suffice for more than milliseconds, at common
sampling rates. And since the signals which internally need to cancel
out perfectly are at different amplitude levels, suddenly floating point
just *kills* your numerical analysis, because it's *very* difficult to
achieve full cancellation over separate exponents of the numerical
representation.
So we never have to worry about the gain structure until the samples
are converted to integer so to be outputted to a DAC or written to a
red-book CD.
I believe we really do. And based on only the two examples above, I'd
argue music-dsp and audio dsp as a whole are kind of in the dark here,
to date. Too many of us just *make* nice sounding things and codebases,
without taking numerical analysis of yore to heart. In what little I've
implemented, I've turned a blind eye to that theory as well, yet every
single time, something bad happened in the end. I couldn't guarantee my
code to produce what was expected, under stress and corner solutions and
whatnot — which obviously *was* needed since I'm a techno kind of
tweaker.
Or, converted to mp3 or something like that.
The people who code that converter stuff, either tend to do their
numerical analysis right, or rely on libraries, where the author did it
for them. Just take a look at Valin of XIPH and Opus's work.
There's absolutely *no* necessity to go into floats, their
nonlinearity, and the idea of denormalised numbers, at such word
widths.
It's not a necessity like it was with the DSP56000. It's taking
advantage of the feature that you have a 64-bit CPU, 64 bit wide data
bus, dozens or hundreds of gigabytes of 64 bit wide memory. Now, how
cheap is that environment now? If you have that, then why not use it?
Sure, use it for what it's good for: parallel acceleration. But don't
use it as an excuse to neglect numerical analysis in algorithm
development. The first approach will work, and has. The second will not.
The only why not that I can think of is some fucking denorm causing an
exception and putting in a glitch in your output.
That cancellation problem in critical filters I told above is another:
it's very difficult to control on floats. Especially if you dither your
signals as you should: every here and there you'll get a crossing of the
float exponent bound, and suddenly you lose relative accuracy between
the signals. Then that loss of cancellation increases explonentially; in
some filter topologies as fast as two-fold per sample. That means it
reaches the first nonlinearity imposed by the floating point
representation at precisely one time step, and develops from there.
Denormals aren't pretty either. They are technically the lowest, most
linear range of the float range, and so help. But only if both your
operands are already denormal, so that they behave as fixpoints would.
No floating point system and definitely no IEEE standard has *any* means
of ascertaining whether both operands to any operation are in the lower
linear regime, or if one is still "normal", i.e. in the piecewise
linear, in toto nonlinear regime.
A simple IIR will hit denorms when dead silence is going in after some
non-silence excited the states of the filter to non-zero values.
Assuming you're talking about an RC-kinda circuit of first order, yes,
it does exponentially go down.
Now sum two of them together at a one sample delay, presenting the
resulting combined signal in a floating point representation. Vary the
identical RC-constants and put in a pulse train to the parallel circut.
It ought to show surprisingly much nonlinear distortion from the
float representation alone. It ought to show a lot even when dithered,
because unlike fixpoint, float is much more difficult to dither
properly.
So, fixpoint, with a proper gain structure, actually is easier to
analyse and work with than float. If you want to do it properly. Float
is the sloppy way which sometimes lets you gloss over the analysis, but
in the end, it leads you into trouble.
Or if they do, they actually hit a nasty, low level, knee of
nonlinearity.
That's right and I don't want that to cause a fucking exception. But
I wanna keep having the utility of denorms.
So as Buffy would have it, what *is* *it*? What is the utility of
denorms? I get it that they're there for a reason, but what do they
*mean* and how do they *serve*?
I'd argue they're there for amateurs. I think even the designers of the
IEEE standards said as much in text and supra: they're supposed to "not
lose precision as fast near zero", i.e. they're supposed to finally go
from (partly) geometric to linear representation near zero. To a
representation closer to how we're taught real numbers should behave,
and so towards more intuitionistic design of numerical algorithms, than
what is actually demanded if we willy-nilly use a floating point system,
with its *much* more demanding numerical analysis.
Also, as I said above, multiple times, I'd argue that even 32-bit
floating point actually already *contains* within it a 23-24 fixed point
range, perfect for representing sound. If only the gain structure is
well thought-out. It's nice at four even bytes, no need to pack it into
three, even if you could. But no need to actually utilize the exponent
even, leading us from discretely linear, universally ditherable LTI and
other audio signal processing theory into inherently nonlinear
territory.
Listen, people here that know me from the 1990s, know that I was a
staunch fix-point advocate.
So why not now, anymore?
If given an assignment of developing an audio processing system using
fixed-point math, I will not shrink away from the challenge, but
**if** the project is "Hey we got this 64-bit ARM with FPU in it and
gobs of memory, I don't want my code to be checking for saturation and
"minding the gain structure". Fuck no.
My point is that you can't avoid the analysis even with 64-bit floats.
Maybe the real question is, what should the chips be doing *both* in
fixpoint *and* in float? Checking for saturation certainly seems to be
the norm in the GPU's of now, evendownto 8-bit float. (Yes, it seems to
be a thing, at least on the deep learning side of things. But there it
makes sense, unlike with audio.)
I will never expect audio calculations to go astronomic in scale.
But, if my tool is a 64-bit processor that can do 64x64 to 64-bit
result in the same nanosecond instruction cycle as anything else (like
32-bit fixed-point processing), why would I toss that headroom and
legroom away?
You might want to do that because a well-thoughtout gain structure at 16
bits suddenly yields you four times the performance from your 64-bit
audio workstation. Maybe even more, because memory bandwidth tends to be
the limiting factor of today, pipelining delay in cache, in SDRAM
architectures, alignment in recall in cache and TLD, whatnot. And you
know, wasn't it so that there's many of the algorithms we'd be talking
about here in both computer architecture (especially GPGPU's) and
algorithms (say, matrix multiplication), actually scale a bit better
than naïvely thought before? Like, doing our audio matrices as 8x8
actually is a bit more efficient than doing a full 64?
It's only when a **final** sample value is getting output, that I
should need to worry about gain, saturation, quantization, and
noise-shaping.
No, it is not, unless you can somehow represent all of the sample values
perfectly along the way from perfect digitla input to perfect digital
output. As you then can't, you need to analyse fully what happens to the
errors. Do they accumulate or do they die down, over the overall gain
structure, do they cancel exactly, do they accumulate... You need to do
overall numerical analysis over the whole signal chain, and sometimes it
needs to be accurate to the bit.
That is much more difficult to do in a floating point representation
than in the fixpoint one, approximating the real number system and its
additive group. It is especially and more differentially difficult to do
under the LTI-minded Shannon-Nyquist theorem, the Channel Rate Theorem
and Cramer—Rao -theory, which additionally admit full subtractive
dithering, and then the more unsatisfactory theory of cumulative
additive dithering, perhaps best developed by Lipschitz and Vanderkooy,
in their critique of Sony's SACD.
I shouldn't have to worry about it anywhere else. Not if I'm using a
64-bit ARM.
You might not want tom but you still must. It's a bitch to be sure, and
*I'm* too lazy to really do the legwork. But you still must. No other
way will land with you with a truly stable and trustworthy algorithm.
If need be, and if you *really* want me to do so, I can build you an
example of what I'm talking about. It'd take its time. But I sorta
think, as a known better of me on the topic, you know where I'm coming
from.
--
Sampo Syreeni, aka decoy - [email protected], https://urldefense.proofpoint.com/v2/url?u=http-3A__decoy.iki.fi_front&d=DwIDaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=TRvFbpof3kTa2q5hdjI2hccynPix7hNL2n0I6DmlDy0&m=deT6sVRiKd6l58IDeFVTtmGBtmd45PXA8YeBmpzpJfZTjxf61yzJuUz76yAcZJBA&s=xMlSk-3uCw-q4clI_6Bsml_mFW2mFIdpOhxcE9TYXM4&e=
+358-40-3751464, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2