Re: Changes in the network interface queueing handoff model

Robert Watson Sun, 30 Jul 2006 12:23:48 -0700

On Sun, 30 Jul 2006, Sam Leffler wrote:

I have a fair amount of experience with the linux model and it works ok.The main complication I've seen is when a driver needs to process multiplequeues of packets things get more involved. This is seen in 802.11 driverswhere there are two q's, one for data frames and one for management frames.With the current scheme you have two separate queues and the start methodhandles prioritization by polling the mgt q before the data q. If insteadthe packet is passed to the start method then it needs to be tagged in someway so the it's prioritized properly. Otherwise you end up with multiplestart methods; one per type of packet. I suspect this will be ok but theend result will be that we'll need to add a priority field to mbufs (unlesswe pass it as an arge to the start method).
All this is certainly doable but I think just replacing one mechanism withthe other (as you specified) is insufficient.

Hmm. This is something that I had overlooked. I was loosely aware that theif_sl code made use of multiple queues, but was under the impression that theclassification to queues occured purely in the SLIP code. Indeed, it does,but structurally, SLIP is split over the link layer (if_output) and driverlayer (if_start), which I had forgotten. I take it from your comments that802.11 also does this, which I was not aware of.

I'm a little uncomfortable with our current m_tag model, as it requiressignificant numbers of additional allocations and frees for each packet, aswell as walking link lists. It's fine for occasional discretionary use (i.e.,MAC labels), but I worry about cases where it is used with every packet, andwe start seeing moderately non-zero numbers of tags on every packet. I thinkI would be more comfortable with an explicit queue identifier argument toif_start, where the link layer and driver layer agree on how to identifyqueues.


As a straw man, how would the following strike you:

        int     if_startmbuf(struct ifnet *ifp, struct mbuf *m, int ifqid);

where for most link layers, the value would be zero, but for some linklayer/driver combinations, it would identify a specific queue which the linklayer believes the mbuf should be assigned, if implemented?

Attached is a patch that maintains the current if_start, but addsif_startmbuf. If a device driver implements if_startmbuf and the globalsysctl net.startmbuf_enabled is set to 1, then the if_startmbuf path in thedriver will be used. Otherwise, if_start is used. I have modified theif_em driver to implement if_startmbuf also. If there is no packet backlogin the if_snd queue, it directly places the packet in the transmitdescriptor ring. If there is a backlog, it uses the if_snd queue protectedby driver mutex, rather than a separate ifq mutex.
In some basic local micro-benchmarks, I saw a 5% improvement in UDP 0-bytepaylod PPS on UP, and a 10% improvement on SMP. I saw a 1.7% performanceimprovement in the bulk serving of 1k files over HTTP. These are onlymicro-benchmarks, and reflect a configuration in which the CPU is unable tokeep up with the output rate of the 1gbps ethernet card in the device, soreductions in host CPU usage are immediately visible in increased output asthe CPU is able to better keep up with the network hardware. Otherconfigurations are also of interest of interesting, especially ones inwhich the network device is unable to keep up with the CPU, resulting inmore queueing.
Conceptual review as well as banchmarking, etc, would be most welcome.
Why is the startmbuf knob global and not per-interface? Seems like you wantto convert drivers one at a time?

I may have under-described what I have implemented. The decision is currentlymade based on two factors: a global frob, and per-interface definition ofif_startmbuf being non-zero. The global frob is intended to make it easy tobenchmark the difference. I should modify the patch so that the global frobdoesn't override the driver back to if_start in the event that if_startmbuf isdefined and if_start isn't. The global frob is intended to be removed in thelong run, and I intend for us to continue to support both the old and newstart methods for the forseeable future, since I don't intend to update everydevice driver we have to the new method, at least not personally :-).

FWIW the original model was driven by the expectation that you could raisethe spl so the tx path was entirely synchronized from above. With the SMPngwork we're synchronizing transfer through each control layer. If the driversoftc lock (or similar) were exposed to upper layers we could possiblyreturn the "lock the tx path" model we had before and eliminate all thelocking your changes target. But that would be a big layering violation andwould add significant contention in the SMP case.

In some ways, what I propose comes to much the same thing: the change Ipropose basically delegates the queueing and synchronization decisions to thedevice driver, which might choose either to use the lock already in the ifq,to use its own lock, or to use some other synchronization strategy. In thecase of if_em, I've implemented bypass of software queueing entirely in thecommon case, but in the event that the hardware ring backs up, then we stillfall back to the if_snd queue, only we lock it using the device driver'stransmit path mutex. Delegating the synchronization down the stack comes withrisks, as device driver writers will inevitably take liberties: on the otherhand, it appears that devices are quite diverse, and those liberties haveadvantages.

I think the key observation is that most network hardware today takespackets directly from private queues so the fast path needs to push thingsdown to those queues w/ minimal overhead. This includes devices thatimplement QoS in h/w w/ multiple queues.

Yes -- however, you're right that the link layer needs to be able to pass moreinformation down. I'd like it to be able to do so without an m_tagallocation, though, which suggests (as you point out) an explicit argument toif_startmbuf.


Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Changes in the network interface queueing handoff model

Reply via email to