On Mon, Mar 06, 2006 at 04:50:39PM -0800, Andrew Morton wrote:
> Am a bit surprised at those numbers.

> Because userspace has to do peculiar things to get its pages taken off the
> LRU.  What exactly was that application doing?

It's just a simple send() and recv() pair of processes.  Networking uses 
pages for the buffer on user transmits.  Those pages tend to be freed 
in irq context on transmit or in the receiver if the traffic is local.

> The patch adds slight overhead to the common case while providing
> improvement to what I suspect is a very uncommon case?

At least on any modern CPU with branch prediction, the test is essentially 
free (2 memory reads that pipeline well, iow 1 cycle, maybe 2).  The 
upside is that you get to avoid the atomic (~17 cycles on a P4 with a 
simple test program, the penalty doubles if there is one other instruction 
that operates on memory in the loop), disabling interrupts (~20 cycles?, I 
don't remember) another atomic for the spinlock, another atomic for 
TestClearPageLRU() and the pushf/popf (expensive as they rely on whatever 
instruction that might still be in flight to complete and add the penalty 
for changing irq state).  That's at least 70 cycles without including the 
memory barrier side effects which can cost 100 cycles+.  Add in the costs 
for the cacheline bouncing of the lru_lock and we're talking *expensive*.

So, a 1-2 cycle cost for a case that normally takes from 17 to 100+ cycles?  
I think that's worth it given the benefits.

Also, I think the common case (page cache read / map) is something that 
should be done differently, as those atomics really do add up to major 
pain.  Using rcu for page cache reads would be truely wonderful, but that 
will take some time.

                -ben
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to