Robert Watson wrote:
Suggestions like increasing timer resolution are intended to spread out the injection of packets by dummynet to attempt to reduce the peaks of burstiness that occur when multiple queues inject packets in a burst that exceeds the queue depth supported by combined hardware descriptor rings and software transmit queue.

Raising HZ from 1000 to 2000 has helped. There are now 200-300 global drops/s, as opposed to 300-1000 with HZ=1000. Or maybe net.isr.direct from 1 to 0 help. Or maybe hash_size from 64 to 256. Or maybe...

The two solutions, then are (a) to increase the timer resolution significantly so that packets are injected in smaller bursts

But isn't that bad that it can actually become worse?  From /sys/conf/NOTES:

# The granularity of operation is controlled by the kernel option HZ whose
# default value (1000 on most architectures) means a granularity of 1ms
# (1s/HZ).  Historically, the default was 100, but finer granularity is
# required for DUMMYNET and other systems on modern hardware.  There are
# reasonable arguments that HZ should, in fact, be 100 still; consider,
# that reducing the granularity too much might cause excessive overhead in
# clock interrupt processing, potentially causing ticks to be missed and thus
# actually reducing the accuracy of operation.


and (b) increase the queue capacities. The hardware queue limits likely can't be raised w/o new hardware, but the ifnet transmit queue sizes can be increased.

Can someone please say how to increase the "ifnet transmit queue sizes"?

Timer resolution going up is almost certainly not a bad idea  in your 
configuration, although does require a reboot as you have observed.

OK, I'll try HZ=4000, but there are some required servers like flowtools/radius/mysql/perl app that are also running.

On a side note: one other possible interpretation of that statistic is that you're seeing fragmentation problems. Usually in forwarding scenarios this is unlikely. However, it wouldn't hurt to make sure you have LRO turned off on the network interfaces you're using, assuming it's supported by the driver.

I don't think fragments are the problem. The numbers are too small ;-)
$ netstat -s|fgrep fragment
        5318 fragments received
        147 fragments dropped (dup or out of space)
        5157 fragments dropped after timeout
        4088 output datagrams fragmented
        8180 fragments created
        0 datagrams that can't be fragmented

There's no such option as LRO shown, so I guess it's off:
options=1bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4>

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to