These are good points ... about fixed-point math in hardware.

The Texas Instruments TMS320 family of DSP chips have both fixed-point and 
floating-point variations. Even a "16-bit" fixed-point DSP will have 32-bit 
results from multiplications, and 8 bits of overhead in the registers storing 
intermediate values, such that you have 40 bits within an algorithm 
implementing an audio effect. The higher precision implementations get to 80 
bits. The gain structure needs to work out so that signals do not extend into 
the overhead region by the output, but intermediate values can be 48 dB over 
"full scale" when considering the final output.

These systems are designed so that a 16-bit (or 32-bit) sample is a fraction of 
1.0, e.g., from -1.0 to < +1.0 in range. The 8-bit overhead allows for absolute 
values to exceed ±1.0 but the gain-staging needs to bring them back into range 
before the output. Overhead bits are added above the MSB. Extra precision is 
handled with more bits below the LSB. It's very easy to get 64-bit float 
numerical performance from a "32-bit" fixed-point DSP (given the 80-bit 
intermediate values).

When dealing with a pure fixed-point implementation, common blocks like FFT 
have to deal with scaling because there isn't the automatic scaling of floating 
point. I designed a product with the TMS320VC5506 chip, and there were two 
variations of FFT: A slow one that scaled between every stage to prevent 
overflow, and a faster one that did not scale. If your gain structure was 
designed to account for the number of stages in your FFT, then scaling would 
not be necessary, and processing could be much faster.

Although there are extra challenges (like gain staging), fixed point never has 
to deal with denormals. Fixed-point DSP instruction sets also have the concept 
of "saturation," where large values that would exceed the registers' bit depth 
are capped at the limit, rather than wrapping around like they would with 
general CPU integer math.

Brian Willoughby


On Apr 8, 2023, at 2:31 PM, Sampo Syreeni <[email protected]> wrote:
> On 2023-03-27, robert bristow-johnson wrote:
>> I think denorms are a good idea. They should be handled routinely by now.
> 
> I do too. However I think in the DSP circuit they are also a kind of stopgap. 
> If you want to do floats at all, you ought to be able to known in which 
> regime of semi-logarithmic processing you'll be working at. And why you do so.
> 
> The thing in audio signal processing in the 64-bit age of now is that you 
> *can* have the linear range for anything and everything, without packing it 
> into a float. Of any width. You would have had it even in 32-bit fixpoint, 
> and even in 24-bit, if you minded your gain structure. There's absolutely 
> *no* necessity to go into floats, their nonlinearity, and the idea of 
> denormalised numbers, at such word widths.
> 
> And in fact most well thought-out numerical algorithms ever do, or should go. 
> If you go through the numerical analysis of most proper filter algorithms in 
> existence, they don't hit the denormalisation bound. Or if they do, they 
> actually hit a nasty, low level, knee of nonlinearity. They'd do better if 
> they just went with well-calibrated 24-32 bit fixpoint.
> 
> Not to mention 64-bit stuff. Even in floats. Because the IEEE floating point 
> standard actually allows for precisely 24 bits for the significand at 32 bits 
> total, which is more than proof for linear audio signal processing without 
> utilizing the exponent at all, and then the 64-bit version gives you 56 bits 
> of significand, which only local astronomical simulations really require; 
> only distant ones really then require the exponent. At all.

Reply via email to