Hi, all.

I've been having sporadic and serious problems with the Realtek gigabit
interface built into my motherboard. Periodically, it just freezes up. I've
tried several things to no avail: turning on DEVICE_POLLING, frobbing
bootloader options and sysctl settings, etc.

I had a solid week of function with the following:

hw.re.msi_disable="1"
hw.re.msix_disable="1"
dev.re.0.int_rx_mod=0     <-- this one says it can be a loader tuneable, but
                              it didn't work that way - I had to set it from
                              sysctl.conf

And then after a reboot, I locked up again on pushing the interface a little
with an rsync. However, I've seen interactive sessions lock the thing up too.
It's not just when I'd doing big transfers.

It's not clear what's happening. I have been capturing stats periodically
with 'sysctl dev.re.0.stats=1', but that doesn't always show a problem. For
instance, during one of the lock-ups last night, after a reboot, I got this:

re0 statistics:
Tx frames : 171306
Rx frames : 20271
Tx errors : 0
Rx errors : 0
Rx missed frames : 0
Rx frame alignment errs : 0
Tx single collisions : 0
Tx multiple collisions : 0
Rx unicast frames : 20271
Rx broadcast frames : 0
Rx multicast frames : 0
Tx aborts : 0
Tx underruns : 0

After running overnight, with sporadic automated transfers:

re0 statistics:
Tx frames : 4658945
Rx frames : 1258514
Tx errors : 0
Rx errors : 33
Rx missed frames : 0
Rx frame alignment errs : 3591
Tx single collisions : 0
Tx multiple collisions : 0
Rx unicast frames : 1255880
Rx broadcast frames : 2411
Rx multicast frames : 223
Tx aborts : 0
Tx underruns : 0

I was seeing the "Rx multicast frames" creep up each time I saw a freeze last
night, which was confusing in that I'm not sure why there'd be any multicast
traffic.

Here's the card from dmesg, with MSI/X turned off:

re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 
0xe800-0xe8ff mem 0xfbfff000-0xfbffffff,0xfbff8000-0xfbffbfff irq 18 at device 
0.0 on pci2
re0: Chip rev. 0x2c000000
re0: MAC rev. 0x00200000
miibus0: <MII bus> on re0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX,
100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master,
1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow,
1000baseT-FDX-flow-master, auto, auto-flow
re0: Ethernet address: bc:ae:c5:bd:44:e7

The motherboard with this included:

Base Board Information
        Manufacturer: ASUSTeK Computer INC.
        Product Name: M4A88T-M
        Version: Rev X.0x
        Serial Number: MF70B1G04201588
        Asset Tag: To Be Filled By O.E.M.
        Features:
                Board is a hosting board
                Board is replaceable
        Location In Chassis: To Be Filled By O.E.M.
        Chassis Handle: 0x0003
        Type: Motherboard
        Contained Object Handles: 0

In general I've been saying "ifconfig re0 down ; ifconfig re0 up" to kick the
interface, but last night a friendly person from IRC mentioned that I could
work around this by running a steady ping and frobbing mediatype when I see
the pings fail. So, I've got this running:

while true
do
ping -c 1 -t 1 firewall > /dev/null 2>&1
if [ $? -ne 0 ]; then
    date
    echo "toggling re0"
    echo
    ifconfig re0 media 1000baseT mediaopt full-duplex,flowcontrol,master
    ifconfig re0 media autoselect mediaopt flowcontrol              
    sleep 3
fi
sleep 1
done

This has been noting failures sporadically throughout the day, but it's
allowing traffic to continue moving, albeit with the occasional hiccough.

This hardware has been running Debian for a couple years, and it's never had
so much as a short hiccough, so I have confidence that the hardware is fine.
It suggests that there's something the Linux driver is doing to handle this
hardware that FreeBSD isn't doing. For a while I was dual-booting and I'd see
errors with FreeBSD running that were't there under Debian.

I'd started diving into the source, both Linux and FreeBSD, but I lack
sufficient exposure to ethernet driver code to be able to get a high-level
picture of what they're doing, and as such I haven't yet noticed any special-
case or hardware glitch handling that we're missing, although I might find
something eventually.

I'm struggling with finding a way to see what's actually happening with this.
I've toggled MSI and MSI-X handling, I've turned down interrupt handling
delays, I've tried both I/O and memory register transfers, although I'd not
actually clear what's happening differently there. I've had polling variously
enabled and disabled.

One thing to note is that last night's horror while I was trying to move some
back-up data was after rebooting from Windows. (Installed on a partition for
gaming...) It made me wonder if we're not fully setting up some state on the
card. I'd have what felt like a solid, glitchless week before that.

FWIW, I'm running 10.1-RC3 on this box and I've seen issues from early on
while I was still running 10.0-RELEASE.

Thanks in advance for clues. This is a showstopper for futher deployment for
me, as I've got these Realtek on-board cards in several boxes, and while the
media frobbing largely works, it's not something I can inflict on my users.

-- 
Mason Loring Bliss  ((   If I have not seen as far as others, it is because
 ma...@blisses.org   ))   giants were standing on my shoulders. - Hal Abelson
_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to