Re: 9.2 ixgbe tx queue hang

Dr. A. Haakh Thu, 06 Mar 2014 14:36:12 -0800


Markus Gebert schrieb:

On 06.03.2014, at 19:33, Jack Vogel <jfvo...@gmail.com> wrote:

You did not make it explicit before, but I noticed in your dtrace info that
you are using
lagg, its been the source of lots of problems, so take it out of the setup
and see if this
queue problem still happens please.

Jack

Well, last year when upgrading another batch of servers (same hardware) to 9.2, 
we tried find a solution to this network problem, and we eliminated lagg where 
we had used it before, which did not help at all. That’s why I didn’t mention 
it explicitly.

My point is, I can confirm that 9.2 has network problems on this same hardware 
with or without lagg, so it’s unlikely that removing it will bring immediate 
success. OTOH, I didn’t have this tx queue theory back then, so I cannot be 
sure that what we saw then without lagg, and what we see now with lagg, really 
are the same problem.

I guess, for the sake of simplicity I will remove lagg on these new systems. 
But before I do that, to save time, I wanted to ask wether I should remove vlan 
interfaces too? While that didn’t help either last year, my guess is that I 
should take them out of the picture, unless you say otherwise.

Thanks for looking into this.


Markus

I don't use ixgbe but this might be related to the discussed problem.

I too realized network problems when I moved from 9.1 to 9.2 lastoctober. Occasionally I use vlc to watch tv on udp://@224.0.0.1:7792coming from an XP-system which displayed perfect on 9.1 but gotscrambled on 9.2. By accident I realized that vlc worked fine again,when I had a cpu-intensiv job like portupgrade -a running. So I thoughtit might be a problem related to the scheduler.

In the meantime I upgraded to 10.0-STABLE and things looks better now --it still takes about 20 seconds for a video-stream get synchronized.


My system is

CPU: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz (2675.02-MHzK8-class CPU)Origin = "GenuineIntel" Id = 0x106e5 Family = 0x6 Model = 0x1eStepping = 5

Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x98e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant, performance statistics
real memory  = 12884901888 (12288 MB)
avail memory = 12438151168 (11861 MB)

with this ethernet-card

re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port0xd800-0xd8ff mem 0xf6fff000-0xf6ffffff,0xf6ff8000-0xf6ffbfff irq 19 atdevice 0.0 on pci2

re0: Using 1 MSI-X message
re0: Chip rev. 0x28000000
re0: MAC rev. 0x00300000
miibus0: <MII bus> on re0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0

rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX,100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master,1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow,1000baseT-FDX-flow-master, auto, auto-flow

re0: Ethernet address: 90:e6:ba:bb:28:3e


Andreas

On Thu, Mar 6, 2014 at 2:24 AM, Markus Gebert <markus.geb...@hostpoint.ch>wrote:

(creating a new thread, because I'm no longer sure this is related to
Johan's thread that I originally used to discuss this)

On 27.02.2014, at 18:02, Jack Vogel <jfvo...@gmail.com> wrote:

I would make SURE that you have enough mbuf resources of whatever size

pool

that you are
using (2, 4, 9K), and I would try the code in HEAD if you had not.

Jack

Jack, we've upgraded some other systems on which I get more time to debug
(no impact for customers). Although those systems use the nfsclient too, I
no longer think that NFS is the source of the problem (hence the new
thread). I think it's the ixgbe driver and/or card. When our problem
occurs, it looks like it's a single tx queue that gets stuck somehow (its
buf_ring remains full).

I tracked ping using dtrace to determine the source of ENOBUFS it returns
every few packets when things get weird:

# dtrace -n 'fbt:::return / arg1 == ENOBUFS && execname == "ping" / {
stack(); }'
dtrace: description 'fbt:::return ' matched 25476 probes
CPU     ID                    FUNCTION:NAME
26   7730            ixgbe_mq_start:return
              if_lagg.ko`lagg_transmit+0xc4
              kernel`ether_output_frame+0x33
              kernel`ether_output+0x4fe
              kernel`ip_output+0xd74
              kernel`rip_output+0x229
              kernel`sosend_generic+0x3f6
              kernel`kern_sendit+0x1a3
              kernel`sendit+0xdc
              kernel`sys_sendto+0x4d
              kernel`amd64_syscall+0x5ea
              kernel`0xffffffff80d35667



The only way ixgbe_mq_start could return ENOBUFS would be when
drbr_enqueue() encouters a full tx buf_ring. Since a new ping packet
probably has no flow id, it should be assigned to a queue based on curcpu,
which made me try to pin ping to single cpus to check wether it's always
the same tx buf_ring that reports being full. This turned out to be true:

# cpuset -l 0 ping 10.0.4.5
PING 10.0.4.5 (10.0.4.5): 56 data bytes
64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.347 ms
64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.135 ms

# cpuset -l 1 ping 10.0.4.5
PING 10.0.4.5 (10.0.4.5): 56 data bytes
64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.184 ms
64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.232 ms

# cpuset -l 2 ping 10.0.4.5
PING 10.0.4.5 (10.0.4.5): 56 data bytes
ping: sendto: No buffer space available
ping: sendto: No buffer space available
ping: sendto: No buffer space available
ping: sendto: No buffer space available
ping: sendto: No buffer space available

# cpuset -l 3 ping 10.0.4.5
PING 10.0.4.5 (10.0.4.5): 56 data bytes
64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.130 ms
64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.126 ms
[...snip...]

The system has 32 cores, if ping runs on cpu 2, 10, 18 or 26, which use
the third tx buf_ring, ping reliably return ENOBUFS. If ping is run on any
other cpu using any other tx queue, it runs without any packet loss.

So, when ENOBUFS is returned, this is not due to an mbuf shortage, it's
because the buf_ring is full. Not surprisingly, netstat -m looks pretty
normal:

# netstat -m
38622/11823/50445 mbufs in use (current/cache/total)
32856/11642/44498/132096 mbuf clusters in use (current/cache/total/max)
32824/6344 mbuf+clusters out of packet secondary zone in use
(current/cache)
16/3906/3922/66048 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/33024 9k jumbo clusters in use (current/cache/total/max)
0/0/0/16512 16k jumbo clusters in use (current/cache/total/max)
75431K/41863K/117295K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

In the meantime I've checked the commit log of the ixgbe driver in HEAD
and besides there are little differences between HEAD and 9.2, I don't see
a commit that fixes anything related to what were seeing...

So, what's the conclusion here? Firmware bug that's only triggered under
9.2? Driver bug introduced between 9.1 and 9.2 when new multiqueue stuff
was added? Jack, how should we proceed?


Markus



On Thu, Feb 27, 2014 at 8:05 AM, Markus Gebert
<markus.geb...@hostpoint.ch>wrote:

On 27.02.2014, at 02:00, Rick Macklem <rmack...@uoguelph.ca> wrote:

John Baldwin wrote:

On Tuesday, February 25, 2014 2:19:01 am Johan Kooijman wrote:

Hi all,

I have a weird situation here where I can't get my head around.

One FreeBSD 9.2-STABLE ZFS/NFS box, multiple Linux clients. Once in
a while
the Linux clients loose their NFS connection:

Feb 25 06:24:09 hv3 kernel: nfs: server 10.0.24.1 not responding,
timed out

Not all boxes, just one out of the cluster. The weird part is that
when I
try to ping a Linux client from the FreeBSD box, I have between 10
and 30%
packetloss - all day long, no specific timeframe. If I ping the
Linux
clients - no loss. If I ping back from the Linux clients to FBSD
box - no
loss.

The errors I get when pinging a Linux client is this one:
ping: sendto: File too large

We were facing similar problems when upgrading to 9.2 and have stayed

with

9.1 on affected systems for now. We've seen this on HP G8 blades with
82599EB controllers:

ix0@pci0:4:0:0: class=0x020000 card=0x18d0103c chip=0x10f88086 rev=0x01
hdr=0x00
   vendor     = 'Intel Corporation'
   device     = '82599EB 10 Gigabit Dual Port Backplane Connection'
   class      = network
   subclass   = ethernet

We didn't find a way to trigger the problem reliably. But when it occurs,
it usually affects only one interface. Symptoms include:

- socket functions return the 'File too large' error mentioned by Johan
- socket functions return 'No buffer space' available
- heavy to full packet loss on the affected interface
- "stuck" TCP connection, i.e. ESTABLISHED TCP connections that should
have timed out stick around forever (socket on the other side could have
been closed ours ago)
- userland programs using the corresponding sockets usually got stuck too
(can't find kernel traces right now, but always in network related

syscalls)

Network is only lightly loaded on the affected systems (usually 5-20

mbit,

capped at 200 mbit, per server), and netstat never showed any indication

of

ressource shortage (like mbufs).

What made the problem go away temporariliy was to ifconfig down/up the
affected interface.

We tested a 9.2 kernel with the 9.1 ixgbe driver, which was not really
stable. Also, we tested a few revisions between 9.1 and 9.2 to find out
when the problem started. Unfortunately, the ixgbe driver turned out to

be

mostly unstable on our systems between these releases, worse than on 9.2.
The instability was introduced shortly after to 9.1 and fixed only very
shortly before 9.2 release. So no luck there. We ended up using 9.1 with
backports of 9.2 features we really need.

What we can't tell is wether it's the 9.2 kernel or the 9.2 ixgbe driver
or a combination of both that causes these problems. Unfortunately we ran
out of time (and ideas).

EFBIG is sometimes used for drivers when a packet takes too many
scatter/gather entries.  Since you mentioned NFS, one thing you can
try is to
disable TSO on the intertface you are using for NFS to see if that
"fixes" it.

And please email if you try it and let us know if it helps.

I've think I've figured out how 64K NFS read replies can do this,
but I'll admit "ping" is a mystery? (Doesn't it just send a single
packet that would be in a single mbuf?)

I think the EFBIG is replied by bus_dmamap_load_mbuf_sg(), but I
don't know if it can happen for an mbuf chain with < 32 entries?

We don't use the nfs server on our systems, but they're (new)nfsclients.
So I don't think our problem is nfs related, unless the default

rsize/wsize

for client mounts is not 8K, which I thought it was. Can you confirm

this,

Rick?

IIRC, disabling TSO did not make any difference in our case.


Markus

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: 9.2 ixgbe tx queue hang

Reply via email to