[Bug 217606] Bridge stops working after some days

bugzilla-noreply Tue, 07 Mar 2017 01:09:08 -0800

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217606


            Bug ID: 217606
           Summary: Bridge stops working after some days
           Product: Base System
           Version: 11.0-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: a...@torrentkino.de

Hello,

we recently upgraded our Bridging FWs from 10.1-RELEASE-pxx to 11.0-RELEASE-p8.
And since then they stop passing through traffic after some time. In this case
after ~4 days. One of them stopped yesterday evening. (We have a failover
mechanism to reduce the impact.)

$ uptime
9:26AM  up 4 days, 19:22, 2 users, load averages: 0.12, 0.06, 0.01

bridge0 consists of ix0/ix1:

ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port
0xecc0-0xecdf mem 0xd9e80000-0xd9efffff,0xd9ff8000-0xd9ffbfff irq 48 at device
0.0 numa-domain 0 on pci2
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> port
0xece0-0xecff mem 0xd9f00000-0xd9f7ffff,0xd9ffc000-0xd9ffffff irq 52 at device
0.1 numa-domain 0 on pci2

In case of error I see the following for IPv4. The bridge does IPv6 as well.
Same problem.

ix0: A load balancer is asking for its default GW. No reply...

$ tcpdump -i ix0 \( arp \)
09:37:47.330361 ARP, Request who-has A.A.A.A tell B.B.B.B, length 46

ix1: The default GW actually sends a reply. I can see it on ix1.

$ tcpdump -i ix1 \( arp \)
09:38:59.328956 ARP, Request who-has A.A.A.A tell B.B.B.B, length 46
09:38:59.329374 ARP, Reply A.A.A.A is-at 00:00:0a:0b:0c:0d (oui Cisco), length
46

A tcpdump for bridge0 show the same as ix1.

Some numbers of the currently not working system:

$ netstat -m
82409/6901/89310 mbufs in use (current/cache/total)
38692/4094/42786/1015426 mbuf clusters in use (current/cache/total/max)
38692/4065 mbuf+clusters out of packet secondary zone in use (current/cache)
0/192/192/507713 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/150433 9k jumbo clusters in use (current/cache/total/max)
0/0/0/84618 16k jumbo clusters in use (current/cache/total/max)
97986K/10681K/108667K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed

$ netstat -b -d -h -i bridge0
Name    Mtu Network       Address              Ipkts Ierrs Idrop     Ibytes   
Opkts Oerrs     Obytes  Coll  Drop
ix0    1.5K <Link#1>      00:00:00:00:00:0a      12G     0     0        11T    
7.9G     0       1.1T     0  335k
ix1    1.5K <Link#2>      00:00:00:00:00:0b     7.9G     0     0       1.2T    
 12G     0        11T     0     0
bridg  1.5K <Link#8>      00:00:00:00:00:0c      20G     0     0        12T    
 20G  335k        12T     0     0

What I did so far:

# Disable Ethernet Flow-Control
# https://wiki.freebsd.org/10gFreeBSD/Router
dev.ix.0.fc=0
dev.ix.1.fc=0

# Disable TSO
cloned_interfaces="bridge0"
ifconfig_bridge0="addm ix0 addm ix1 up"
ifconfig_ix0="up -tso"
ifconfig_ix1="up -tso"

I found the following bug reports:
2004: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=185633
2016: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212749

And since this system uses PF and Scrubbing. I applied this patch manually:
https://reviews.freebsd.org/D7780

But I have no success so far.

Shutting down ix0/ix1 and bringing them up makes brigde0 responsive again. But
time now works against me. Netstat after that procedure:

$ netstat -m
33281/56284/89565 mbufs in use (current/cache/total)
33280/9756/43036/2015426 mbuf clusters in use (current/cache/total/max)
33280/9730 mbuf+clusters out of packet secondary zone in use (current/cache)
0/192/192/507713 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/150433 9k jumbo clusters in use (current/cache/total/max)
0/0/0/84618 16k jumbo clusters in use (current/cache/total/max)
74880K/34351K/109231K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed

Kind regards,
Aiko

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

[Bug 217606] Bridge stops working after some days

Reply via email to