On Thu, Nov 30, 2006 at 09:55:52AM +0100, Eric Dumazet wrote: > On Thursday 30 November 2006 02:25, Paul E. McKenney wrote: > > On Wed, Nov 22, 2006 at 04:02:29PM +0100, Eric Dumazet wrote: > > > On some workloads, (for example when lot of close() syscalls are done), > > > RCU qlen can be quite large, and RCU heads are no longer in cpu cache > > > when rcu_do_batch() is called. > > > > > > This patches adds a prefetch() in rcu_do_batch() to give CPU a hint to > > > bring back cache lines containing 'struct rcu_head's. > > > > > > Most list manipulations macros include prefetch(), but not open coded > > > ones (at least with current C compilers :) ) > > > > > > I got a nice speedup on a trivial benchmark (3.48 us per iteration > > > instead of 3.95 us on a 1.6 GHz Pentium-M) > > > while (1) { pipe(p); close(fd[0]); close(fd[1]);} > > > > Interesting! How much of the speedup was due to the prefetch() and how > > much to removing the extra store to rdp->donelist? > > I only benchmarked the prefetch() case. > > Then, when cooking the patch I found I could do the rdp->donelist affectation > after the loop. I am not sure it's worth to do another benchmark for this > trivial optimization (Please dont tell me its not a valid one :) )
It would be a good idea to check it out. Modern CPUs can be a bit on the tricky side. I have seen cases where removing instructions slowed things down. And it can't be -that- hard to run the other two cases! Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/