[ ... why you want something more than interrupt coelescing ... ] Don Bowman wrote: > Actually I have pushed it to the livelock case. I'm shocked at how > easy this is to do with BSD (I'm used to system like vxworks with > much lower over head interrupt processing). > I found that for a 2x XEON @ 2GHz that I can achieve this @ ~100Mbps > of minimal size UDP fragments. Tuning the driver dramatically improved > the situation. Reducing the size of its receive ring to the proper > amount also helps since it will then run out of buffers and > drop packets. This isn't extreme load, it isn't really particularly > heavy load, its only like ~200Kpps. I suspect the defragmenting > is the issue, so I tried it again with ARP's. This helped a lot. > > I'm still not clear on how the receiver polling helps me, it also > makes a constant rate consumption of packets. If I set the bds > to the max, then I will only be interrupted @ constant rate by > the device.
OK. This really has nothing to do with interrupt processing latency, except that such latency increases pool retention time, and reduces the overall load-bearing capacity of a single system. In other words, latency effects the number of connections and the total amount of data in transit, but not whether or not that data gets through. So it's not a direct cause of livelock, even if it can be an indirect cause. There are a couple of livelock points. The best paper describing this is: Eliminating Receiver Livelock in an Interrupt-Driven Kernel Jeffrey C. Mogul, K.K. Ramakrishnan http://citeseer.nj.nec.com/mogul96eliminating.html This isn't the earliest paper on the topic but it's authoritative. All of these boil down to packets not getting all the way to the application that has the socket open, for whaetver reason, or the inability of the system to send responses to the packets which it has received. Basically, a receiver livelock can occur any place that there is not a negative feedback loop to control packet sources, once you achieve resource saturation. In BSD network processing, there are a number of livelock points, where there is no negative feedback loop. Following the packet in from the wire, these are: o Received packets are copied across the bus to a main memory ring buffer, even when the packets are not being processed, for some cards. The correct thing to do is to not copy the packets in this case, reserving the bus for other data transfers (e.g. an NFS server can not do disk transfers if it is spending all bus cycles doing packet transfers). This is a network controller firmware issue, and has to be addressed there (many, but not all, network adapters "do the right thing"). o Hardware interrupts occur when packets are transferred successfully from network adapter memory to main memory, which causes the host system to run the interrupt handler code, instead of running other code, like the protocol processing or the application that owns the connection. This is the highest priority, so you need to be able to squelch interrupt processing. o Protocol processing is the next highest priority item; if you successfully squelch hardware interrupts when you hit load capacity, you can still deny applications time to run by, instead of spending all of your time handling interrupts, spending all your time doing protocol processing. The application does not have an opportunity to run, to deal with the packets which have been received. o Application processing is the lowest priority item. When you hit a resource limit, such as number of mbufs available in the system as a whole, such that you cannot allocate send chains for responding to the traffic requests you are getting, then you back up input requests, and, again, you are in trouble. All of these lead to receiver livelock. There are several ways to attack this issue, but given an "infinite" ability to generate load, the only one that's effective at handling at least *some* load is to insert negative feedback loops between the stall points in the network processing model, so that the next stage can squelch the incoming packets. The normal way to deal with this is to drop the packets as early as possible, before investing a lot of resources in processing them. The best way (so far) of doing this is LRP ("Lazy Receiver Processing"), which is described in: Lazy Receiver Processing (LRP): A Network Subsystem Architecture for Server Systems Peter Druschel, Gaurav Banga http://citeseer.nj.nec.com/druschel96lazy.html This paper described more of the details of receiver livelock, and the problems with the BSD processing model, and has a number of nice pictures that I cab't do justice to with just "ASCII Art". What DEVICE_POLLING does is addresses the squelching of incoming packets at the hardware interrupt level, so that they are dropped by the network adapter. Depending on the adapter design, they may still be copied to host memory from adapter memory, particularly when there is not an explicit acknowledgement, just a ring buffer in host memory. You don't want to buy these network adapters, though you will be unlikely to see the problem in an HTTP server, since it will serve most of its content from cache, rather than across the device bus, unless it is a streaming media server or serves very large static files. 8-). In addition, the DEVICE_POLLING code contains scheduler modifications, which permit you to reserve a certain amount of time for the applciation to run, with interupts disabled, so that the applications can clear the input queues, and send out responses. In other words, DEVICE_POLLING addresses two of the five locations where there is not a negative feedback loop, but does it in a suboptimal way: by weighted round-robin timeshare of the main CPU. In point of fact, you are more likely to hit resource limits, than you are to hit livelock at the other points, unless you tune your kernel for higher processing rates, at which point latency will fill the pol, and you may hit one of the other 3 hard livelock conditions (there is also the potential for interaction livelock, for example, your application may be a streamcast application, and can't send to any if one outbound channel is full, etc.). Actually, you *want* to tune your system up, so that these become issues, and then address them, as well. For example, to avoid CPU starvation on a non-single-purpose server host, where applciations are reserved more time than they need on average, you can only approximate a best fit with DEVICE_POLLING, by adjusting the scheduler modifications (they are roughly equal to the SVR4 "fixed" scheduling class, where work is time division multiplexed onto the CPU). To address this, you really want to deal with pushing data to the application on the basis of the queue depts from kernel to user space: a WFQ ("Weighted Fair-Share Queueing") mechanism, to ensure application runtime proportional to time needed to handle a given load. All that can come later. For right now, since you don't have access to my source tree (8-)), you should content yourself with DEVICE_POLLING. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message