On Friday, 24 May 2019 at 12:01:55 UTC, Alex wrote:
Well, the QuarterWave was suppose to generate just a quarter
since that is all that is required for these functions due to
symmetry and periodicity. I started with a half to get that
working then figure out the sign flipping.
Sure, it is a tradeoff. You pollute the cache less this way, but
you have to figure out the sign and the lookup-direction.
The trick is then to turn the phase into an unsigned integer then
you get:
1. the highest bit will tell you that you need to use the inverse
sign for the result.
2. the next highest bit will tell you that you need too look up
in the reverse direction
What is key to performance here is that x86 can do many simple
integer/bit operations in parallel, but only a few floating point
operations.
Also avoid all conditionals. Use bitmasking instead, something
along the line of:
const ulong phase = mantissa^((1UL<<63)-((mantissa>>62)&1));
const uint quarterphase = (phase>>53)&511;
(Haven't checked the correctness of this, but this shows the
general principle.)
Ola.