On Wednesday 08 March 2006 00:26, Benjamin LaHaise wrote: > Hi Andi, > > On x86-64 one inefficiency that shows up on profiles is the handling of > struct page conversion to/from idx and addresses. This is mostly due to > the fact that struct page is currently 56 bytes on x86-64, so gcc has to > emit a slow division or multiplication to convert.
Huh? unsigned long f1(unsigned long x) { return x * 56; } unsigned long f2(unsigned long x) { return x / 56; } gives f1: leaq 0(,%rdi,8), %rax salq $6, %rdi subq %rax, %rdi movq %rdi, %rax ret and f2: .LFB3: shrq $3, %rdi movabsq $2635249153387078803, %rdx movq %rdi, %rax mulq %rdx movq %rdx, %rax ret (it converts it to x * 1/56 ) AFAIK mul has a latency of < 10 cycles even on P4 so I can't imagine it's a real problem. Something must be wrong with your measurements. Or maybe it's something else in the conversion functions that's the problem. The hash lookup? Still I don't quite believe it, the hash is relatively small. That said I know ways to make page_to_pfn()/pfn_to_page() faster In particular some of the terms in the equation that are always recomputed could be cached. I used to have a patch for that some time ago, but it had some problems and I ran out of time so I dropped it. > By switching to using > WANT_PAGE_VIRTUAL in asm/page.h, struct page grows to 64 bytes. Address > calculation becomes cheaper because it is a memory load from the already > hot struct page. For netperf, this shows up as a ~150 Mbit/s improvement. My guess would be that on more macro loads it would be a loss due to more cache misses. -Andi - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html