For a while now with e1000, we've been trying to optimize our jumbo frame
memory utilization by using multiple 2k buffers chained together (see
rx_skb_top in e1000_main.c) using frag_list. Up until 2.6.14, this
worked.
A recent commit in 2.6.14 broke this, see this git commit:
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bc8dfcb93970ad7139c976356bfc99d7e251deaf
Or for a shorter version http://tinyurl.com/drpu8
Basically I believe ip_frag_reasm to create a frag_list in order to
reassemble, and since we've already got skbs chained together there is a
frag list within a frag list.
to reproduce, with 6.3.9-k2 driver in 2.6.15-rc2 enable jumbo frames and
ping with size > mtu, on a pci/pci-x adapter (or set the disable packet
split option in .config)
# ifconfig eth0 mtu 5000
# ping -s 10000 <eth0 connected host>
The BUG_ON(skb_shinfo(skb)->frag_list); hits us. Is this code correct?
I believe e1000 is using the (admittedly not often used) interface
correctly and that everything should work. I think the s2io driver will
also suffer from this bug.
History:
Back in September I had missed this thread.
http://marc.theaimsgroup.com/?t=112495029500004&r=1&w=2
I have confirmed that backing out the change noted above does fix the
problem.
Here is the bug output, but its not very useful IMO besides the source
reference.
Feb 2 15:19:47 lh kernel: Kernel BUG at net/core/datagram.c:253
Feb 2 15:19:47 lh kernel: invalid operand: 0000 [1] SMP
Feb 2 15:19:47 lh kernel: CPU 0
Feb 2 15:19:47 lh kernel: Modules linked in: iptable_filter ip_tables
e1000 ipv6 autofs4 sunrpc video button battery ac uhci_hcd ehci_hcd
hw_random i2c_i801 i2c_core shpchp ixgb e100 mii floppy ext3 jbd dm_mod
ata_piix libata sd_mod scsi_mod
Feb 2 15:19:47 lh kernel: Pid: 4928, comm: ping Not tainted 2.6.15-jesse
#1
Feb 2 15:19:47 lh kernel: RIP: 0010:[<ffffffff802ad479>]
<ffffffff802ad479>{skb_copy_datagram_iovec+360}
Feb 2 15:19:47 lh kernel: RSP: 0018:ffff810055869b58 EFLAGS: 00010282
Feb 2 15:19:47 lh kernel: RAX: ffff810058855c00 RBX: ffff8100596a3780
RCX: 0000000000000000
Feb 2 15:19:47 lh kernel: RDX: ffff810054d98880 RSI: ffff810054d98812
RDI: 0000555555670802
Feb 2 15:19:47 lh kernel: RBP: ffff810055869e88 R08: d3d2d1d0cfcecdcc
R09: 9b9a999897969594
Feb 2 15:19:47 lh kernel: R10: a3a2a1a09f9e9d9c R11: 8b8a898887868584
R12: 00000000000007f2
Feb 2 15:19:47 lh kernel: R13: ffff810054d98020 R14: ffff81005a386480
R15: 0000000000000bb2
Feb 2 15:19:47 lh kernel: FS: 00002aaaaae00e20(0000)
GS:ffffffff804e7800(0000) knlGS:0000000000000000
Feb 2 15:19:47 lh kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
000000008005003b
Feb 2 15:19:47 lh kernel: CR2: 00002aaaaac3e680 CR3: 000000005671c000
CR4: 00000000000006e0
Feb 2 15:19:47 lh kernel: Process ping (pid: 4928, threadinfo
ffff810055868000, task ffff81005f9257c0)
Feb 2 15:19:47 lh kernel: Stack: ffffffff80145116 ffff81005a386480
00000000000007f2 ffff810055869ec8
Feb 2 15:19:47 lh kernel: 000007f200000000 ffff8100596a3780
ffff810055869e88 ffff81005fa1d480
Feb 2 15:19:47 lh kernel: 0000000000001410 ffff810055869e08
Feb 2 15:19:47 lh kernel: Call
Trace:<ffffffff80145116>{autoremove_wake_function+0}
<ffffffff802e8fd8>{raw_recvmsg+175}
Feb 2 15:19:47 lh kernel:
<ffffffff802a9fff>{sock_common_recvmsg+45}
<ffffffff802a6894>{sock_recvmsg+271}
Feb 2 15:19:47 lh kernel: <ffffffff8020379e>{bit_clear+117}
<ffffffff80232ef9>{tty_ldisc_try+59}
Feb 2 15:19:47 lh kernel:
<ffffffff80145116>{autoremove_wake_function+0}
<ffffffff80151aea>{audit_sockaddr+53}
Feb 2 15:19:47 lh kernel:
<ffffffff80145116>{autoremove_wake_function+0}
<ffffffff802a7f7e>{sys_sendmsg+534}
Feb 2 15:19:47 lh kernel: <ffffffff802a80f4>{sys_recvmsg+348}
<ffffffff8030f753>{_spin_lock_irqsave+11}
Feb 2 15:19:47 lh kernel: <ffffffff8011e304>{do_page_fault+1109}
<ffffffff8011133b>{syscall_trace_enter+195}
Feb 2 15:19:47 lh kernel: <ffffffff8010d9c0>{tracesys+209}
Feb 2 15:19:47 lh kernel:
Feb 2 15:19:48 lh kernel: Code: 0f 0b 68 07 cb 34 80 c2 fd 00 48 8b 54 24
08 48 8b 12 48 89
Feb 2 15:19:48 lh kernel: RIP
<ffffffff802ad479>{skb_copy_datagram_iovec+360} RSP <ffff810055869b58>
Feb 2 15:19:48 lh kernel: <7>Losing some ticks... checking if CPU
frequency changed.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html