On 06.03.2014, at 19:33, Jack Vogel <jfvo...@gmail.com> wrote: > You did not make it explicit before, but I noticed in your dtrace info that > you are using > lagg, its been the source of lots of problems, so take it out of the setup > and see if this > queue problem still happens please. > > Jack
Well, last year when upgrading another batch of servers (same hardware) to 9.2, we tried find a solution to this network problem, and we eliminated lagg where we had used it before, which did not help at all. That’s why I didn’t mention it explicitly. My point is, I can confirm that 9.2 has network problems on this same hardware with or without lagg, so it’s unlikely that removing it will bring immediate success. OTOH, I didn’t have this tx queue theory back then, so I cannot be sure that what we saw then without lagg, and what we see now with lagg, really are the same problem. I guess, for the sake of simplicity I will remove lagg on these new systems. But before I do that, to save time, I wanted to ask wether I should remove vlan interfaces too? While that didn’t help either last year, my guess is that I should take them out of the picture, unless you say otherwise. Thanks for looking into this. Markus > On Thu, Mar 6, 2014 at 2:24 AM, Markus Gebert > <markus.geb...@hostpoint.ch>wrote: > >> (creating a new thread, because I'm no longer sure this is related to >> Johan's thread that I originally used to discuss this) >> >> On 27.02.2014, at 18:02, Jack Vogel <jfvo...@gmail.com> wrote: >> >>> I would make SURE that you have enough mbuf resources of whatever size >> pool >>> that you are >>> using (2, 4, 9K), and I would try the code in HEAD if you had not. >>> >>> Jack >> >> Jack, we've upgraded some other systems on which I get more time to debug >> (no impact for customers). Although those systems use the nfsclient too, I >> no longer think that NFS is the source of the problem (hence the new >> thread). I think it's the ixgbe driver and/or card. When our problem >> occurs, it looks like it's a single tx queue that gets stuck somehow (its >> buf_ring remains full). >> >> I tracked ping using dtrace to determine the source of ENOBUFS it returns >> every few packets when things get weird: >> >> # dtrace -n 'fbt:::return / arg1 == ENOBUFS && execname == "ping" / { >> stack(); }' >> dtrace: description 'fbt:::return ' matched 25476 probes >> CPU ID FUNCTION:NAME >> 26 7730 ixgbe_mq_start:return >> if_lagg.ko`lagg_transmit+0xc4 >> kernel`ether_output_frame+0x33 >> kernel`ether_output+0x4fe >> kernel`ip_output+0xd74 >> kernel`rip_output+0x229 >> kernel`sosend_generic+0x3f6 >> kernel`kern_sendit+0x1a3 >> kernel`sendit+0xdc >> kernel`sys_sendto+0x4d >> kernel`amd64_syscall+0x5ea >> kernel`0xffffffff80d35667 >> >> >> >> The only way ixgbe_mq_start could return ENOBUFS would be when >> drbr_enqueue() encouters a full tx buf_ring. Since a new ping packet >> probably has no flow id, it should be assigned to a queue based on curcpu, >> which made me try to pin ping to single cpus to check wether it's always >> the same tx buf_ring that reports being full. This turned out to be true: >> >> # cpuset -l 0 ping 10.0.4.5 >> PING 10.0.4.5 (10.0.4.5): 56 data bytes >> 64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.347 ms >> 64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.135 ms >> >> # cpuset -l 1 ping 10.0.4.5 >> PING 10.0.4.5 (10.0.4.5): 56 data bytes >> 64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.184 ms >> 64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.232 ms >> >> # cpuset -l 2 ping 10.0.4.5 >> PING 10.0.4.5 (10.0.4.5): 56 data bytes >> ping: sendto: No buffer space available >> ping: sendto: No buffer space available >> ping: sendto: No buffer space available >> ping: sendto: No buffer space available >> ping: sendto: No buffer space available >> >> # cpuset -l 3 ping 10.0.4.5 >> PING 10.0.4.5 (10.0.4.5): 56 data bytes >> 64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.130 ms >> 64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.126 ms >> [...snip...] >> >> The system has 32 cores, if ping runs on cpu 2, 10, 18 or 26, which use >> the third tx buf_ring, ping reliably return ENOBUFS. If ping is run on any >> other cpu using any other tx queue, it runs without any packet loss. >> >> So, when ENOBUFS is returned, this is not due to an mbuf shortage, it's >> because the buf_ring is full. Not surprisingly, netstat -m looks pretty >> normal: >> >> # netstat -m >> 38622/11823/50445 mbufs in use (current/cache/total) >> 32856/11642/44498/132096 mbuf clusters in use (current/cache/total/max) >> 32824/6344 mbuf+clusters out of packet secondary zone in use >> (current/cache) >> 16/3906/3922/66048 4k (page size) jumbo clusters in use >> (current/cache/total/max) >> 0/0/0/33024 9k jumbo clusters in use (current/cache/total/max) >> 0/0/0/16512 16k jumbo clusters in use (current/cache/total/max) >> 75431K/41863K/117295K bytes allocated to network (current/cache/total) >> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) >> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) >> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) >> 0/0/0 requests for jumbo clusters denied (4k/9k/16k) >> 0/0/0 sfbufs in use (current/peak/max) >> 0 requests for sfbufs denied >> 0 requests for sfbufs delayed >> 0 requests for I/O initiated by sendfile >> 0 calls to protocol drain routines >> >> In the meantime I've checked the commit log of the ixgbe driver in HEAD >> and besides there are little differences between HEAD and 9.2, I don't see >> a commit that fixes anything related to what were seeing... >> >> So, what's the conclusion here? Firmware bug that's only triggered under >> 9.2? Driver bug introduced between 9.1 and 9.2 when new multiqueue stuff >> was added? Jack, how should we proceed? >> >> >> Markus >> >> >> >> On Thu, Feb 27, 2014 at 8:05 AM, Markus Gebert >> <markus.geb...@hostpoint.ch>wrote: >> >>> >>> On 27.02.2014, at 02:00, Rick Macklem <rmack...@uoguelph.ca> wrote: >>> >>>> John Baldwin wrote: >>>>> On Tuesday, February 25, 2014 2:19:01 am Johan Kooijman wrote: >>>>>> Hi all, >>>>>> >>>>>> I have a weird situation here where I can't get my head around. >>>>>> >>>>>> One FreeBSD 9.2-STABLE ZFS/NFS box, multiple Linux clients. Once in >>>>>> a while >>>>>> the Linux clients loose their NFS connection: >>>>>> >>>>>> Feb 25 06:24:09 hv3 kernel: nfs: server 10.0.24.1 not responding, >>>>>> timed out >>>>>> >>>>>> Not all boxes, just one out of the cluster. The weird part is that >>>>>> when I >>>>>> try to ping a Linux client from the FreeBSD box, I have between 10 >>>>>> and 30% >>>>>> packetloss - all day long, no specific timeframe. If I ping the >>>>>> Linux >>>>>> clients - no loss. If I ping back from the Linux clients to FBSD >>>>>> box - no >>>>>> loss. >>>>>> >>>>>> The errors I get when pinging a Linux client is this one: >>>>>> ping: sendto: File too large >>> >>> We were facing similar problems when upgrading to 9.2 and have stayed >> with >>> 9.1 on affected systems for now. We've seen this on HP G8 blades with >>> 82599EB controllers: >>> >>> ix0@pci0:4:0:0: class=0x020000 card=0x18d0103c chip=0x10f88086 rev=0x01 >>> hdr=0x00 >>> vendor = 'Intel Corporation' >>> device = '82599EB 10 Gigabit Dual Port Backplane Connection' >>> class = network >>> subclass = ethernet >>> >>> We didn't find a way to trigger the problem reliably. But when it occurs, >>> it usually affects only one interface. Symptoms include: >>> >>> - socket functions return the 'File too large' error mentioned by Johan >>> - socket functions return 'No buffer space' available >>> - heavy to full packet loss on the affected interface >>> - "stuck" TCP connection, i.e. ESTABLISHED TCP connections that should >>> have timed out stick around forever (socket on the other side could have >>> been closed ours ago) >>> - userland programs using the corresponding sockets usually got stuck too >>> (can't find kernel traces right now, but always in network related >> syscalls) >>> >>> Network is only lightly loaded on the affected systems (usually 5-20 >> mbit, >>> capped at 200 mbit, per server), and netstat never showed any indication >> of >>> ressource shortage (like mbufs). >>> >>> What made the problem go away temporariliy was to ifconfig down/up the >>> affected interface. >>> >>> We tested a 9.2 kernel with the 9.1 ixgbe driver, which was not really >>> stable. Also, we tested a few revisions between 9.1 and 9.2 to find out >>> when the problem started. Unfortunately, the ixgbe driver turned out to >> be >>> mostly unstable on our systems between these releases, worse than on 9.2. >>> The instability was introduced shortly after to 9.1 and fixed only very >>> shortly before 9.2 release. So no luck there. We ended up using 9.1 with >>> backports of 9.2 features we really need. >>> >>> What we can't tell is wether it's the 9.2 kernel or the 9.2 ixgbe driver >>> or a combination of both that causes these problems. Unfortunately we ran >>> out of time (and ideas). >>> >>> >>>>> EFBIG is sometimes used for drivers when a packet takes too many >>>>> scatter/gather entries. Since you mentioned NFS, one thing you can >>>>> try is to >>>>> disable TSO on the intertface you are using for NFS to see if that >>>>> "fixes" it. >>>>> >>>> And please email if you try it and let us know if it helps. >>>> >>>> I've think I've figured out how 64K NFS read replies can do this, >>>> but I'll admit "ping" is a mystery? (Doesn't it just send a single >>>> packet that would be in a single mbuf?) >>>> >>>> I think the EFBIG is replied by bus_dmamap_load_mbuf_sg(), but I >>>> don't know if it can happen for an mbuf chain with < 32 entries? >>> >>> We don't use the nfs server on our systems, but they're (new)nfsclients. >>> So I don't think our problem is nfs related, unless the default >> rsize/wsize >>> for client mounts is not 8K, which I thought it was. Can you confirm >> this, >>> Rick? >>> >>> IIRC, disabling TSO did not make any difference in our case. >>> >>> >>> Markus >>> >>> >> >> >> >> > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"