From: Andi Kleen <[EMAIL PROTECTED]> Date: Wed, 1 Feb 2006 19:28:46 +0100
> http://www.lemis.com/grog/Documentation/vj/lca06vj.pdf I did a writeup in my blog about all of this, another good reason to actively follow my blog: http://vger.kernel.org/~davem/cgi-bin/blog.cgi/index.html Go read. > -Andi (who prefers sourceware over slideware) People are definitely hung up on the details, and that means they are analyzing Van's work from the absolute _wrong_ angle. This surprised me, what I expected was for anyone knowledgable about networking to get this immediately, and as for the details, have an attitude of "I don't care how, let's find a way to make this work!" But since you're so hung up on the details, the basic idea is that there is a tiny classifier in the RX IRQ processing of the driver. We have to touch that first cache line of the packet headers anyway, so the classification comes for free. You'll notice that even though he's running this tiny classifier in the hard IRQ context, in order to put the packet on the right RX net channel, IRQ overhead remains the same. So when a TCP socket enters established state, we add an entry into the classifier. The classifier is even smart enough to look for a listening socket if the fully established classification fails. Van is not against NAPI, in fact he's taking NAPI to the next level. Softirq handling is overhead, and as this work shows, it is totally unnecessary overhead. Yes we do TCP prequeue now, and that's where the second stage net channel stuff hooks into. But prequeue as we have it now is not enough, we still run softirq, and IP input processing from softirq not from user socket context. The RX net channel bypasses all of that crap. The way we do softirq now we can feed one cpu with softirq work given a single card, with Van's stuff we can feed socket users on multiple cpus with a single card. The net channel data structure SMP friendliness really helps here. In one shot it does the input route lookup and the socket lookup. We just attach the packet to the socket's RX net channel, all from hard IRQ context, at zero cost (see above). This is just like the grand unified flow cache idea that we've been tossing around for the past few years. And the beauty of all of this is that it complements ideas like LRO, I/O AT, and cpu architectures like Niagara. How in the world can you not understand how incredible this is? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html