On Wed, 7 Oct 2009, rihad wrote:
Suggestions like increasing timer resolution are intended to spread out the
injection of packets by dummynet to attempt to reduce the peaks of
burstiness that occur when multiple queues inject packets in a burst that
exceeds the queue depth supported by combined hardware descriptor rings and
software transmit queue.
Raising HZ from 1000 to 2000 has helped. There are now 200-300 global
drops/s, as opposed to 300-1000 with HZ=1000. Or maybe net.isr.direct from 1
to 0 help. Or maybe hash_size from 64 to 256. Or maybe...
Or maybe other random factors such as traffic load corresponding to major
sports events, etc. :-)
It's also possible that combining multiple changes cancels out the effect of
one or another change. Given the rather large number of possible
combinations of things to try, I'd suggest being fairly strategic in how you
try them. Starting with just an original config + significant HZ increase is
probably the best starting point. Changing hash_size is really about reducing
CPU use, so if in the whole you're not getting close to the capacity of a core
for any given thread involved in the work, it may not make much difference
(tuning these data structures is a bit of a black art).
The two solutions, then are (a) to increase the timer resolution
significantly so that packets are injected in smaller bursts
But isn't that bad that it can actually become worse? From /sys/conf/NOTES:
# The granularity of operation is controlled by the kernel option HZ whose
# default value (1000 on most architectures) means a granularity of 1ms
# (1s/HZ). Historically, the default was 100, but finer granularity is
# required for DUMMYNET and other systems on modern hardware. There are
# reasonable arguments that HZ should, in fact, be 100 still; consider,
# that reducing the granularity too much might cause excessive overhead in
# clock interrupt processing, potentially causing ticks to be missed and thus
# actually reducing the accuracy of operation.
Right: we fire the timer on every CPU at 1/HZ seconds, which means quite a lot
of work being done. On systems where timers are proportionally more expensive
-- especially when using hardware virtualization, for example, we do recommend
tuning the timers down. And our boot loader will actually do it for you: we
auto-detect vmware, parallels, kqemu, virtualbox, etc, and adjust the timer
rate from from 1000 to 100 during the boot.
That said, in your configuration I see little argument for a lower timer rate:
you need to burst packets at frequent intervals or risk overfilling queues,
and the overheads of additional timer tickets on your system shouldn't be too
bad as you have both very fast hardware and a lot of idle time.
I would suggest making just the HZ -> 4000 change for now and see how it goes.
and (b) increase the queue capacities. The hardware queue limits likely
can't be raised w/o new hardware, but the ifnet transmit queue sizes can be
increased.
Can someone please say how to increase the "ifnet transmit queue sizes"?
Unfortunately, I fear that this is driver-specific, and in the case of bce
requires a recompile. In the driver init code in if_bce, the following code
appears:
ifp->if_snd.ifq_drv_maxlen = USABLE_TX_BD;
IFQ_SET_MAXLEN(&ifp->if_snd, ifp->if_snd.ifq_drv_maxlen);
IFQ_SET_READY(&ifp->if_snd);
Which evaluates to a architecture-specific value due to varying pagesize. You
might just try forcing it to 1024.
Timer resolution going up is almost certainly not a bad idea in your
configuration, although does require a reboot as you have observed.
OK, I'll try HZ=4000, but there are some required servers like
flowtools/radius/mysql/perl app that are also running.
That should be fine.
On a side note: one other possible interpretation of that statistic is that
you're seeing fragmentation problems. Usually in forwarding scenarios this
is unlikely. However, it wouldn't hurt to make sure you have LRO turned
off on the network interfaces you're using, assuming it's supported by the
driver.
I don't think fragments are the problem. The numbers are too small ;-)
$ netstat -s|fgrep fragment
5318 fragments received
147 fragments dropped (dup or out of space)
5157 fragments dropped after timeout
4088 output datagrams fragmented
8180 fragments created
0 datagrams that can't be fragmented
There's no such option as LRO shown, so I guess it's off:
options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4>
That probably rules that out as a source of problems then.
Robert
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"