On Thu, 27 Apr 2006, Robert Watson wrote:
Yes -- basically, what this setting does is turn a deferred dispatch of the
protocol level processing into a direct function invocation.
This reminds me of a problem I saw about a year ago, where the number of
entries in the DMA ring was much greater (IIRC 256) than the number of
entries in the IP input queue (IIRC hardcoded at 50). So what would end up
happening under high loads was that lots of packets would get dumped when
you tried to enqueue them onto the IP input queue.
If you are finding that direct dispatch is giving you a really big
performance increase on some workloads, you might like to check that the
reason isn't simply that you have avoided overflowing this queue.
- Increase the time it takes to pull packets out of the card -- we process
each packet to completion rather than pulling them out in sets and batching
them. This pushes drop on overload into the card instead of the IP queue,
which has some benefits and some costs.
The nice thing about doing it this way is that it is less prone to
performance degradation under overload, since you don't dequeue (and hence
do work on) packets which will be later discarded.
The reason for the strong source ordering is that some protocols, TCP in
particular, respond really badly to misordering, which they detect as a
loss and force retransmit for. If we introduce multiple netisrs naively
by simply having the different threads working from the same IP input
queue, then we can potentially pull packets from the same source into
different workers, and process them at different rates, resulting in
misordering being introduced. While we'd process packets with greater
parallelism, and hence possibly faster, we'd toast the end-to-end
protocol properties and make everyone really unhappy.
Would it be possible to improve the behaviour of the TCP protocol
implementation so that out-of-order reception was acceptable?
# Someone else asked a question about polling.
Pretty much all modern network interfaces support interrupt moderation of
some description. There really is no need to use polling any more, as
interfaces do not cause excessive interrupt rates.
The performance difference we are seeing with polling is likely because it
does better scheduling of packet processing than the current model does.
For example, most driver implementations just spin dequeueing packets
until their DMA rings are empty, however that doesn't work so well when
you have fixed sized queues elsewhere which are filling up. If you look at
the polling code, it only dequeues a small number of packets at a time,
and allows them to be processed before it continues dequeueing.
I would bet that if the packet dispatch model gets improved, we can ditch
polling entirely, at least for modern network interfaces.
--
Luke Macpherson
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"