On Fri, 28 Dec 2007, Bruce Evans wrote:

On Fri, 28 Dec 2007, Bruce Evans wrote:

In previous mail, you (Mark) wrote:

# With FreeBSD 4 I was able to run a UDP data collector with rtprio set,
# kern.ipc.maxsockbuf=20480000, then use setsockopt() with SO_RCVBUF
# in the application.  If packets were dropped they would show up
# with netstat -s as "dropped due to full socket buffers".
# # Since the packet never makes it to ip_input() I no longer have
# any way to count drops.  There will always be corner cases where
# interrupts are lost and drops not accounted for if the adapter
# hardware can't report them, but right now I've got no way to
# estimate any loss.

I found where drops are recorded for the net.isr.direct=0 case.  It is
in net.inet.ip.intr_queue.drops.  The netisr subsystem just calls
IF_HANDOFF(), and IF_HANDOFF() calls _IF_DROP() if the queue fills up.
_IF_DROP(ifq) just increments ifq->ip_drops.  The usual case for netisrs
is for the queue to be ipintrq for NETISR_IP.  The following details
don't help:

- drops for input queues don't seem to be displayed by any utilities
  (except ones for ipintrq are displayed primitively by
  sysctl net.inet.ip.intr_queue_drops).  netstat and systat only
  display drops for send queues and ip frags.
- the netisr subsystem's drop count doesn't seem to be displayed by any
  utilities except sysctl.  It only counts drops due to there not being
  a queue; other drops are counted by _IF_DROP() in the per-queue counter.
  Users have a hard time integrating all these primitively displayed drop
  counts with other error counters.
- the length of ipintrq defaults to the default ifq length of ipqmaxlen =
  IPQ_MAXLEN = 50.  This is inadequate if there is just one NIC in the
  system that has an rx ring size of >= slightly less than 50.  But 1
  Gbps NICs should have an rx ring size of 256 or 512 (I think the
  size is 256 for em; it is 256 for bge due to bogus configuration of
  hardware that can handle it being 512).  If the larger hardware rx
  ring is actually used, then ipintrq drops are almost ensured in the
  direct=0 case, so using the larger h/w ring is worse than useless
  (it also increases cache misses).  This is for just one NIC.  This
  problem is often limited by handling rx packets in small bursts, at
  a cost of extra overhead.  Interrupt moderation increases it by
  increasing burst sizes.

  This contrasts with the handling of send queues.  Send queues are
  per-interface and most drivers increase the default length from 50
  to their ring size (-1 for bogus reasons).  I think this is only an
  optimization, while a similar change for rx queues is important for
  avoiding packet loss.  For send queues, the ifq acts mainly as a
  primitive implementation of watermarks.  I have found that tx queue
  lengths need to be more like 5000 than 50 or 500 to provide enough
  buffering when applications are delayed by other applications or
  just by sleeping until the next clock tick, and use tx queues of
  length ~20000 (a couple of clock ticks at HZ = 100), but now think
  queue lengths should be restricted to more like 50 since long queues
  cannot fit in L2 caches (not to mention they are bad for latency).

The length of ipintrq can be changed using sysctl
net.inet.ip.intrq_queue_maxlen.  Changing it from 50 to 1024 turns most
or all ipintrq drops into "socket buffer full" drops
(640 kpps input packets and 434 kpps socket buffer fulls with direct=0;
 640 kpps input packets and 324 kpps socket buffer fulls with direct=1).

Bruce
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to