On Thu, 2005-07-28 at 17:34 +0100, Maciej W. Rozycki wrote: > On Thu, 28 Jul 2005, Steven Rostedt wrote: > > > I've been playing with different approaches, (still all hot cache > > though), and inspecting the generated code. It's not that the gcc > > generated code is always better for the normal case. But since it sees > > more and everything is not hidden in asm, it can optimise what is being > > used, and how it's used. > > Since you're considering GCC-generated code for ffs(), ffz() and friends, > how about trying __builtin_ffs(), __builtin_clz() and __builtin_ctz() as > apropriate? Reasonably recent GCC may actually be good enough to use the > fastest code depending on the processor submodel selected. >
OK, I guess when I get some time, I'll start testing all the i386 bitop functions, comparing the asm with the gcc versions. Now could someone explain to me what's wrong with testing hot cache code. Can one instruction retrieve from memory better than others? I was trying to see which whas slower in CPU, but if an algorithm aligns with the cache or something that is faster, my hot cache testing will not catch that. But I don't know how to write a test that would test over and over again something that is not in cache. It would seem that I would have to find a way to flush the L1 and L2 cache each time. But it still seems to be adding too many variables to the equation to get good meaningful benchmarks. -- Steve - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/