On Wednesday 08 March 2006 00:26, Benjamin LaHaise wrote:
> Hi Andi,
> 
> On x86-64 one inefficiency that shows up on profiles is the handling of 
> struct page conversion to/from idx and addresses.  This is mostly due to 
> the fact that struct page is currently 56 bytes on x86-64, so gcc has to 
> emit a slow division or multiplication to convert. 

Huh? 

unsigned long f1(unsigned long x)
{
        return x * 56;
}

unsigned long f2(unsigned long x)
{
        return x / 56;
}

gives

f1:
        leaq    0(,%rdi,8), %rax
        salq    $6, %rdi
        subq    %rax, %rdi
        movq    %rdi, %rax
        ret

and

f2:
.LFB3:
        shrq    $3, %rdi
        movabsq $2635249153387078803, %rdx
        movq    %rdi, %rax
        mulq    %rdx
        movq    %rdx, %rax
        ret

(it converts it to x * 1/56 )

AFAIK mul has a latency of < 10 cycles even on P4 so I can't imagine
it's a real problem. Something must be wrong with your measurements.

Or maybe it's something else in the conversion functions that's
the problem. The hash lookup? Still I don't quite believe
it, the hash is relatively small.

That said I know ways to make page_to_pfn()/pfn_to_page() faster
In particular some of the terms in the equation that are always 
recomputed could be cached. I used to have a patch for that
some time ago, but it had some problems and I ran out of time
so I dropped it.

> By switching to using  
> WANT_PAGE_VIRTUAL in asm/page.h, struct page grows to 64 bytes.  Address 
> calculation becomes cheaper because it is a memory load from the already 
> hot struct page.  For netperf, this shows up as a ~150 Mbit/s improvement.

My guess would be that on more macro loads it would be a loss due 
to more cache misses.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to