> You are using a 64-bit multiply in a path that is designed for 32-bit > processors, which makes me feel that it will be slower.
Slower than the divide it's replacing? The following 32-bit processors have 32x32->64-bit multiply: x86 ARM (as of ARMv4 = ARM7TDMI, the lowest version in common use) SPARCv7, SPARCv8 MIPS32 MC68020 PA-RISC 1.1 (XMPYU) avr32 PowerPC (MULHWU) VAX (EMUL) I could keep going through the full list of architectures in arch/, but it's starting to get slow and I haven't hit one *without* a widening multiply yet. (And if it doesn't have hardware divide, I expect the multiply is still faster.) Ah! Found one! ColdFire MCF5272 has 32/32-bit divide, but only 32x32->32 multiply. However, DIVU takes 20 or 35 cycles, which is pretty close to the time to synthesize the multiply out of 4 16x16->32 pieces (4 cycles each). I could do some Kconfig hacking and make the code path architecture-dependent. Do you think it's worth it? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/