Markus Gebert schrieb:
On 06.03.2014, at 19:33, Jack Vogel <jfvo...@gmail.com> wrote:
You did not make it explicit before, but I noticed in your dtrace info that
you are using
lagg, its been the source of lots of problems, so take it out of the setup
and see if this
queue problem still happens please.
Jack
Well, last year when upgrading another batch of servers (same hardware) to 9.2,
we tried find a solution to this network problem, and we eliminated lagg where
we had used it before, which did not help at all. That’s why I didn’t mention
it explicitly.
My point is, I can confirm that 9.2 has network problems on this same hardware
with or without lagg, so it’s unlikely that removing it will bring immediate
success. OTOH, I didn’t have this tx queue theory back then, so I cannot be
sure that what we saw then without lagg, and what we see now with lagg, really
are the same problem.
I guess, for the sake of simplicity I will remove lagg on these new systems.
But before I do that, to save time, I wanted to ask wether I should remove vlan
interfaces too? While that didn’t help either last year, my guess is that I
should take them out of the picture, unless you say otherwise.
Thanks for looking into this.
Markus
I don't use ixgbe but this might be related to the discussed problem.
I too realized network problems when I moved from 9.1 to 9.2 last
october. Occasionally I use vlc to watch tv on udp://@224.0.0.1:7792
coming from an XP-system which displayed perfect on 9.1 but got
scrambled on 9.2. By accident I realized that vlc worked fine again,
when I had a cpu-intensiv job like portupgrade -a running. So I thought
it might be a problem related to the scheduler.
In the meantime I upgraded to 10.0-STABLE and things looks better now --
it still takes about 20 seconds for a video-stream get synchronized.
My system is
CPU: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz (2675.02-MHz
K8-class CPU)
Origin = "GenuineIntel" Id = 0x106e5 Family = 0x6 Model = 0x1e
Stepping = 5
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Features2=0x98e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,POPCNT>
AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
AMD Features2=0x1<LAHF>
TSC: P-state invariant, performance statistics
real memory = 12884901888 (12288 MB)
avail memory = 12438151168 (11861 MB)
with this ethernet-card
re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port
0xd800-0xd8ff mem 0xf6fff000-0xf6ffffff,0xf6ff8000-0xf6ffbfff irq 19 at
device 0.0 on pci2
re0: Using 1 MSI-X message
re0: Chip rev. 0x28000000
re0: MAC rev. 0x00300000
miibus0: <MII bus> on re0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
rgephy0: none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX,
100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master,
1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow,
1000baseT-FDX-flow-master, auto, auto-flow
re0: Ethernet address: 90:e6:ba:bb:28:3e
Andreas
On Thu, Mar 6, 2014 at 2:24 AM, Markus Gebert <markus.geb...@hostpoint.ch>wrote:
(creating a new thread, because I'm no longer sure this is related to
Johan's thread that I originally used to discuss this)
On 27.02.2014, at 18:02, Jack Vogel <jfvo...@gmail.com> wrote:
I would make SURE that you have enough mbuf resources of whatever size
pool
that you are
using (2, 4, 9K), and I would try the code in HEAD if you had not.
Jack
Jack, we've upgraded some other systems on which I get more time to debug
(no impact for customers). Although those systems use the nfsclient too, I
no longer think that NFS is the source of the problem (hence the new
thread). I think it's the ixgbe driver and/or card. When our problem
occurs, it looks like it's a single tx queue that gets stuck somehow (its
buf_ring remains full).
I tracked ping using dtrace to determine the source of ENOBUFS it returns
every few packets when things get weird:
# dtrace -n 'fbt:::return / arg1 == ENOBUFS && execname == "ping" / {
stack(); }'
dtrace: description 'fbt:::return ' matched 25476 probes
CPU ID FUNCTION:NAME
26 7730 ixgbe_mq_start:return
if_lagg.ko`lagg_transmit+0xc4
kernel`ether_output_frame+0x33
kernel`ether_output+0x4fe
kernel`ip_output+0xd74
kernel`rip_output+0x229
kernel`sosend_generic+0x3f6
kernel`kern_sendit+0x1a3
kernel`sendit+0xdc
kernel`sys_sendto+0x4d
kernel`amd64_syscall+0x5ea
kernel`0xffffffff80d35667
The only way ixgbe_mq_start could return ENOBUFS would be when
drbr_enqueue() encouters a full tx buf_ring. Since a new ping packet
probably has no flow id, it should be assigned to a queue based on curcpu,
which made me try to pin ping to single cpus to check wether it's always
the same tx buf_ring that reports being full. This turned out to be true:
# cpuset -l 0 ping 10.0.4.5
PING 10.0.4.5 (10.0.4.5): 56 data bytes
64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.347 ms
64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.135 ms
# cpuset -l 1 ping 10.0.4.5
PING 10.0.4.5 (10.0.4.5): 56 data bytes
64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.184 ms
64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.232 ms
# cpuset -l 2 ping 10.0.4.5
PING 10.0.4.5 (10.0.4.5): 56 data bytes
ping: sendto: No buffer space available
ping: sendto: No buffer space available
ping: sendto: No buffer space available
ping: sendto: No buffer space available
ping: sendto: No buffer space available
# cpuset -l 3 ping 10.0.4.5
PING 10.0.4.5 (10.0.4.5): 56 data bytes
64 bytes from 10.0.4.5: icmp_seq=0 ttl=255 time=0.130 ms
64 bytes from 10.0.4.5: icmp_seq=1 ttl=255 time=0.126 ms
[...snip...]
The system has 32 cores, if ping runs on cpu 2, 10, 18 or 26, which use
the third tx buf_ring, ping reliably return ENOBUFS. If ping is run on any
other cpu using any other tx queue, it runs without any packet loss.
So, when ENOBUFS is returned, this is not due to an mbuf shortage, it's
because the buf_ring is full. Not surprisingly, netstat -m looks pretty
normal:
# netstat -m
38622/11823/50445 mbufs in use (current/cache/total)
32856/11642/44498/132096 mbuf clusters in use (current/cache/total/max)
32824/6344 mbuf+clusters out of packet secondary zone in use
(current/cache)
16/3906/3922/66048 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/33024 9k jumbo clusters in use (current/cache/total/max)
0/0/0/16512 16k jumbo clusters in use (current/cache/total/max)
75431K/41863K/117295K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines
In the meantime I've checked the commit log of the ixgbe driver in HEAD
and besides there are little differences between HEAD and 9.2, I don't see
a commit that fixes anything related to what were seeing...
So, what's the conclusion here? Firmware bug that's only triggered under
9.2? Driver bug introduced between 9.1 and 9.2 when new multiqueue stuff
was added? Jack, how should we proceed?
Markus
On Thu, Feb 27, 2014 at 8:05 AM, Markus Gebert
<markus.geb...@hostpoint.ch>wrote:
On 27.02.2014, at 02:00, Rick Macklem <rmack...@uoguelph.ca> wrote:
John Baldwin wrote:
On Tuesday, February 25, 2014 2:19:01 am Johan Kooijman wrote:
Hi all,
I have a weird situation here where I can't get my head around.
One FreeBSD 9.2-STABLE ZFS/NFS box, multiple Linux clients. Once in
a while
the Linux clients loose their NFS connection:
Feb 25 06:24:09 hv3 kernel: nfs: server 10.0.24.1 not responding,
timed out
Not all boxes, just one out of the cluster. The weird part is that
when I
try to ping a Linux client from the FreeBSD box, I have between 10
and 30%
packetloss - all day long, no specific timeframe. If I ping the
Linux
clients - no loss. If I ping back from the Linux clients to FBSD
box - no
loss.
The errors I get when pinging a Linux client is this one:
ping: sendto: File too large
We were facing similar problems when upgrading to 9.2 and have stayed
with
9.1 on affected systems for now. We've seen this on HP G8 blades with
82599EB controllers:
ix0@pci0:4:0:0: class=0x020000 card=0x18d0103c chip=0x10f88086 rev=0x01
hdr=0x00
vendor = 'Intel Corporation'
device = '82599EB 10 Gigabit Dual Port Backplane Connection'
class = network
subclass = ethernet
We didn't find a way to trigger the problem reliably. But when it occurs,
it usually affects only one interface. Symptoms include:
- socket functions return the 'File too large' error mentioned by Johan
- socket functions return 'No buffer space' available
- heavy to full packet loss on the affected interface
- "stuck" TCP connection, i.e. ESTABLISHED TCP connections that should
have timed out stick around forever (socket on the other side could have
been closed ours ago)
- userland programs using the corresponding sockets usually got stuck too
(can't find kernel traces right now, but always in network related
syscalls)
Network is only lightly loaded on the affected systems (usually 5-20
mbit,
capped at 200 mbit, per server), and netstat never showed any indication
of
ressource shortage (like mbufs).
What made the problem go away temporariliy was to ifconfig down/up the
affected interface.
We tested a 9.2 kernel with the 9.1 ixgbe driver, which was not really
stable. Also, we tested a few revisions between 9.1 and 9.2 to find out
when the problem started. Unfortunately, the ixgbe driver turned out to
be
mostly unstable on our systems between these releases, worse than on 9.2.
The instability was introduced shortly after to 9.1 and fixed only very
shortly before 9.2 release. So no luck there. We ended up using 9.1 with
backports of 9.2 features we really need.
What we can't tell is wether it's the 9.2 kernel or the 9.2 ixgbe driver
or a combination of both that causes these problems. Unfortunately we ran
out of time (and ideas).
EFBIG is sometimes used for drivers when a packet takes too many
scatter/gather entries. Since you mentioned NFS, one thing you can
try is to
disable TSO on the intertface you are using for NFS to see if that
"fixes" it.
And please email if you try it and let us know if it helps.
I've think I've figured out how 64K NFS read replies can do this,
but I'll admit "ping" is a mystery? (Doesn't it just send a single
packet that would be in a single mbuf?)
I think the EFBIG is replied by bus_dmamap_load_mbuf_sg(), but I
don't know if it can happen for an mbuf chain with < 32 entries?
We don't use the nfs server on our systems, but they're (new)nfsclients.
So I don't think our problem is nfs related, unless the default
rsize/wsize
for client mounts is not 8K, which I thought it was. Can you confirm
this,
Rick?
IIRC, disabling TSO did not make any difference in our case.
Markus
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"