These are good points ... about fixed-point math in hardware. The Texas Instruments TMS320 family of DSP chips have both fixed-point and floating-point variations. Even a "16-bit" fixed-point DSP will have 32-bit results from multiplications, and 8 bits of overhead in the registers storing intermediate values, such that you have 40 bits within an algorithm implementing an audio effect. The higher precision implementations get to 80 bits. The gain structure needs to work out so that signals do not extend into the overhead region by the output, but intermediate values can be 48 dB over "full scale" when considering the final output.
These systems are designed so that a 16-bit (or 32-bit) sample is a fraction of 1.0, e.g., from -1.0 to < +1.0 in range. The 8-bit overhead allows for absolute values to exceed ±1.0 but the gain-staging needs to bring them back into range before the output. Overhead bits are added above the MSB. Extra precision is handled with more bits below the LSB. It's very easy to get 64-bit float numerical performance from a "32-bit" fixed-point DSP (given the 80-bit intermediate values). When dealing with a pure fixed-point implementation, common blocks like FFT have to deal with scaling because there isn't the automatic scaling of floating point. I designed a product with the TMS320VC5506 chip, and there were two variations of FFT: A slow one that scaled between every stage to prevent overflow, and a faster one that did not scale. If your gain structure was designed to account for the number of stages in your FFT, then scaling would not be necessary, and processing could be much faster. Although there are extra challenges (like gain staging), fixed point never has to deal with denormals. Fixed-point DSP instruction sets also have the concept of "saturation," where large values that would exceed the registers' bit depth are capped at the limit, rather than wrapping around like they would with general CPU integer math. Brian Willoughby On Apr 8, 2023, at 2:31 PM, Sampo Syreeni <[email protected]> wrote: > On 2023-03-27, robert bristow-johnson wrote: >> I think denorms are a good idea. They should be handled routinely by now. > > I do too. However I think in the DSP circuit they are also a kind of stopgap. > If you want to do floats at all, you ought to be able to known in which > regime of semi-logarithmic processing you'll be working at. And why you do so. > > The thing in audio signal processing in the 64-bit age of now is that you > *can* have the linear range for anything and everything, without packing it > into a float. Of any width. You would have had it even in 32-bit fixpoint, > and even in 24-bit, if you minded your gain structure. There's absolutely > *no* necessity to go into floats, their nonlinearity, and the idea of > denormalised numbers, at such word widths. > > And in fact most well thought-out numerical algorithms ever do, or should go. > If you go through the numerical analysis of most proper filter algorithms in > existence, they don't hit the denormalisation bound. Or if they do, they > actually hit a nasty, low level, knee of nonlinearity. They'd do better if > they just went with well-calibrated 24-32 bit fixpoint. > > Not to mention 64-bit stuff. Even in floats. Because the IEEE floating point > standard actually allows for precisely 24 bits for the significand at 32 bits > total, which is more than proof for linear audio signal processing without > utilizing the exponent at all, and then the 64-bit version gives you 56 bits > of significand, which only local astronomical simulations really require; > only distant ones really then require the exponent. At all.
