From: Andi Kleen <[EMAIL PROTECTED]> Date: Wed, 10 Oct 2007 02:37:16 +0200
> On Tue, Oct 09, 2007 at 05:04:35PM -0700, David Miller wrote: > > We have to keep in mind, however, that the sw queue right now is 1000 > > packets. I heavily discourage any driver author to try and use any > > single TX queue of that size. > > Why would you discourage them? > > If 1000 is ok for a software queue why would it not be ok > for a hardware queue? Because with the software queue, you aren't accessing 1000 slots shared with the hardware device which does shared-ownership transactions on those L2 cache lines with the cpu. Long ago I did a test on gigabit on a cpu with only 256K of L2 cache. Using a smaller TX queue make things go faster, and it's exactly because of these L2 cache effects. > 1000 packets is a lot. I don't have hard data, but gut feeling > is less would also do. I'll try to see how backlogged my 10Gb tests get when a strong sender is sending to a weak receiver. > And if the hw queues are not enough a better scheme might be to > just manage this in the sockets in sendmsg. e.g. provide a wait queue that > drivers can wake up and let them block on more queue. TCP does this already, but it operates in a lossy manner. > I don't really see the advantage over the qdisc in that scheme. > It's certainly not simpler and probably more code and would likely > also not require less locks (e.g. a currently lockless driver > would need a new lock for its sw queue). Also it is unclear to me > it would be really any faster. You still need a lock to guard hw TX enqueue from hw TX reclaim. A 256 entry TX hw queue fills up trivially on 1GB and 10GB, but if you increase the size much more performance starts to go down due to L2 cache thrashing. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html