Re: [fbsd] Re: [fbsd] Network performance in a dual CPU system

Robert Watson Thu, 27 Apr 2006 07:57:33 -0700


On Thu, 27 Apr 2006, Jeremie Le Hen wrote:

I missed the original thread, but in answer to the question: if you setnet.isr.direct=1, then FreeBSD 6.x will run the netisr code in the ithreadof the network device driver. This will allow the IP forwarding andrelated paths in two threads instead of one, potentially allowing greaterparallelism. Of course, you also potentially contend more locks, you mayincrease the time it takes for the ithread to respond to new interrupts,etc, so it's not quite cut and dry, but with a workload like the one shownabove, it might make quite a difference.
Actually you already replied in the original thread, explaining mostly
the same thing.

:-)

BTW, what I understand is that net.isr.direct=1 prevents from multiplexingall packets on the netisr thread and instead makes the ithread do the job.In this case, what happens to the netisr thread ? Does it still have somework to do or is it removed ?

Yes -- basically, what this setting does is turn a deferred dispatch of theprotocol level processing into a direct function invocation. So instead ofinserting the new IP packet into an IP processing queue from the ethernet codeand waking up the netisr which calls the IP input routine, we directly callthe IP input routine. This has a number of potentially positive effects:


- Avoid the queue/dequeue operation
- Avoid a context switch
- Allow greater parallelism since protocol layer processing is not limited to
  the netisr thread

It also has some downsides:

- Perform more work in the ithread -- since any given thread is limited to a
  single CPU's worth of processing resources, if the link layer and protocol
  layer processing add up to more than one CPU, you slow them down
- Increase the time it takes to pull packets out of the card -- we process
  each packet to completion rather than pulling them out in sets and batching
  them.  This pushes drop on overload into the card instead of the IP queue,
  which has some benefits and some costs.

The netisr is still there, and will still be used for certain sorts of things.In particular, we use the netisr when doing arbitrary decapsulation, as thisplaces an upper bound on thread stack use. For example: if you have an IP inIP in IP in IP tunneled packet, if you always used direct dispatch, then you'dpotentially get a deeply nested stack. By looping it back into the queue andpicking it up from the top level of the netisr dispatch, we avoid nesting thestacks, which could lead to stack overflow. We don't context switch in thatloop, so avoid context switch costs. We also use the netisr for loopbacknetwork traffic. So, in short, the netisr is still there, it just has reducedwork scheduled in it.

Another potential model for increasing parallelism in the input path is tohave multiple netisr threads -- this raises an interesting question relatingto ordering. right now, we use source ordering -- that is, we order packetsin the network subsystem essentially in the order they come from a particularsource. So we guarantee that if four packets come in em0, they get processedin the order they are received from em0. They may arbitrarily interlace withpackets coming from other interfaces, such as em1, lo0, etc. The reason forthe strong source ordering is that some protocols, TCP in particular, respondreally badly to misordering, which they detect as a loss and force retransmitfor. If we introduce multiple netisrs naively by simply having the differentthreads working from the same IP input queue, then we can potentially pullpackets from the same source into different workers, and process them atdifferent rates, resulting in misordering being introduced. While we'dprocess packets with greater parallelism, and hence possibly faster, we'dtoast the end-to-end protocol properties and make everyone really unhappy.

There are a few common ways people have addressed this -- it's actually verysimilar to the link parallelism problem. For example, using bonded ethernetlinks, packets are assigned to a particular link based on a hash of theirsource address, so that individual streams from the same source remain inorder with respect to themselves. An obvious approach would be to assignparticular ifnets to particular netisrs, since that would maintain our currentsource ordering assumptions, but allow the ithreads and netisrs to float todifferent CPUs. A catch in this approach is load balancing: if two ifnets areassigned to the same netisr, then they can't run in parallel. This line ofthought can, and does, continue. :-) The direct dispatch model maintainssource ordering in a manner similar to having a per-source netisr, which workspretty well, and also avoids context switches. The main downside is reducingparallelism between the ithread and the netisr, which for some configurationscan be a big deal (i.e., if ithread uses 60% cpu, and netisr uses 60% cpu,you've limited them both to 50% cpu by combining them in a single thread).


Robert N M Watson
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: [fbsd] Re: [fbsd] Network performance in a dual CPU system

Reply via email to