Just decided to follow this thread as it seems to be related to some issues we are seeing as well.
It appears that under heavy packet loads, the kernel cannot pull packets off the NIC fast enough and thus is slow to free up descriptors into which the NIC can DMA packets. This causes the NIC to drop the packet after it's internal queue fills up (and record the packet as missed) because the hardware does not have enough descriptors to write the packets into. We ahve this issue with the ixgbe 10 Gb/s card though the absolute packet rates at which we see a problem are higher than those reported here. In our test scenario the problem gets worse with many simultaneous TCP connections, but the issue is the same. Under high packet rates, the driver cannot keep up and the NIC reports missed packets. The issue is not related to data throughput though as turning on jumbo frames solves our issue for a fixed number of connections, and it seems here that reducing the packet rate makes the misses go away. More importantly, in our tests, only the receiver sees a problem, the transmitter is fine. There was also another thread about problems with UDP throughput that I suspect are caused by the same type of packet rate spikes. The question is, why is the kernel stack slow to handle these packet rates, doing some back of the envelope calculations, they don't seem too bad? Where is the time going? And, are our problem, the UDP issue, and this problem all caused by the same source of slowness or are they three unrelated issues. Manish On Fri, Sep 4, 2009 at 11:14 AM, <alexpalias-bsd...@yahoo.com> wrote: > --- On Fri, 9/4/09, Artis Caune <artis.ca...@gmail.com> wrote: > >> Is it still actual? > > Hello. Yes, this is still actual. > > 1> netstat -nbhI em0 ; uptime > Name Mtu Network Address Ipkts Ierrs Ibytes > Opkts Oerrs Obytes Coll > em0 1500 <Link#1> 00:14:22:17:80:dc 31G 93M 18T > 36G 0 27T 0 > 7:50PM up 23 days, 15:40, 1 user, load averages: 0.84, 1.05, 1.16 > > The huge number of input errors is due to a 80-100kpps flood we received via > that interface, which got the errors/sec numbers up in the 50k/s range for a > few minutes. > >> You didn't mention if you are using pf or other firewall. > > Sorry if I didn't mention it. I am using pf, but have tried "kldunload pf" > and the errors didn't disappear. > >> I have similar problem with two boxes replicating zfs >> pools, when I >> noticed input errors. >> After some investigation turns out it was pf overhead, even >> though I >> was skipping on interfaces where zfs sedn/recv. >> >> With pf enables (and skip) I can copy 50-80MB/s with >> 50-80Kpps and >> 0-100+ input drops per second. >> With pf disabled I can copy constantly with 102 or 93 MB/s >> and >> 110-131Kpps, few drops (because 1 CPU almost eaten). > > This is the kind of traffic I am seeing: > > Errors/second (5 minute average) per interface: > http://www.dataxnet.ro/alex/errors.png > Packets/second (5 minute average) per interface: > http://www.dataxnet.ro/alex/packets.png > > Those graphs were saved a few minutes ago, times are EEST (GMT+3) > > I'm sorry I don't have the Mbits/s graphs up, I haven't been collecting that > data per interface recently (it's collected per vlan). > > Alex > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"