Hi all, I was looking for ways to improve the MaverickCrunch division routine on ARM ep93xx, and noticed that there are few other architectures that don't have a hardware divide.
IA-64 has a "frcpa" instruction that returns an estimate of the reciprocal of a float or double. Likewise, RS-6000 has a "fres" that also returns an estimate of the reciprocal of a float or double. x86 seems to have something similar with SSE - called "rcpps" - that also returns the estimated reciprocal. They all seem to make use of FMAC / FNMAC instructions to calculate the correct answer for x/y, through an Newton-Raphson and MAC Instructions. And the algorithms they use in GCC are different, due to the accuracy of the reciprocal estimate. http://en.wikipedia.org/wiki/N-th_root_algorithm http://en.wikipedia.org/wiki/Multiply-accumulate They also seem to use a similar algorithm to implement their sqrt function... My question is, are there any other architectures in GCC that don't have a reciprocal estimate instruction, but have a FMAC? I'd like to implement something similar for MaverickCrunch, using the integer 32-bit MAC functions, but there is no reciprocal estimate function on the MaverickCrunch. I guess a lookup table could be implemented, but how many entries will need to be generated, and how accurate will it have to be IEEE754 compliant (in the swdiv routine)? Also, where should I be sticking such an instruction / table? Should I put it in the kernel, and trap an invalid instruction? Alternatively, should I put it in libgcc or in glibc/uclibc?