Eric Dumazet wrote:
David Stevens a écrit :

The word "small" is coming up a lot in this discussion, and
I think packet size really has nothing to do with it. Multiple
streams generating packets of any size would benefit; the
key ingredient is a queue length greater than 1.

I think the intent is to remove queue lock cycles by taking
the whole list (at least up to the count of free ring buffers)
when the queue is greater than one packet, thus effectively
removing the lock expense for n-1 packets.


Yes, but on modern cpus, locked operations are basically free once the CPU already has the cache line in exclusive access in its L1 cache.

But will it here? Any of the CPUs are trying to add things to the qdisc, but only one CPU is pulling from it right? Even if the "pulling from it" is happening in a loop, there can be scores or more other cores trying to add things to the queue, which would cause that cache line to migrate.

I am not sure adding yet another driver API will help very much.
It will for sure adds some bugs and pain.

That could very well be.

A less expensive (and less prone to bugs) optimization would be to prefetch one cache line for next qdisc skb, as a cache line miss is far more expensive than a locked operation (if lock already in L1 cache of course)

Might they not build on on top of the other?

rick jones


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to