For a while now with e1000, we've been trying to optimize our jumbo frame memory utilization by using multiple 2k buffers chained together (see rx_skb_top in e1000_main.c) using frag_list. Up until 2.6.14, this worked.

A recent commit in 2.6.14 broke this, see this git commit: http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bc8dfcb93970ad7139c976356bfc99d7e251deaf Or for a shorter version http://tinyurl.com/drpu8

Basically I believe ip_frag_reasm to create a frag_list in order to reassemble, and since we've already got skbs chained together there is a frag list within a frag list.

to reproduce, with 6.3.9-k2 driver in 2.6.15-rc2 enable jumbo frames and ping with size > mtu, on a pci/pci-x adapter (or set the disable packet split option in .config)

# ifconfig eth0 mtu 5000
# ping -s 10000 <eth0 connected host>

The BUG_ON(skb_shinfo(skb)->frag_list); hits us. Is this code correct? I believe e1000 is using the (admittedly not often used) interface correctly and that everything should work. I think the s2io driver will also suffer from this bug.

History: Back in September I had missed this thread.
http://marc.theaimsgroup.com/?t=112495029500004&r=1&w=2

I have confirmed that backing out the change noted above does fix the problem.

Here is the bug output, but its not very useful IMO besides the source reference.

Feb  2 15:19:47 lh kernel: Kernel BUG at net/core/datagram.c:253
Feb  2 15:19:47 lh kernel: invalid operand: 0000 [1] SMP
Feb  2 15:19:47 lh kernel: CPU 0
Feb 2 15:19:47 lh kernel: Modules linked in: iptable_filter ip_tables e1000 ipv6 autofs4 sunrpc video button battery ac uhci_hcd ehci_hcd hw_random i2c_i801 i2c_core shpchp ixgb e100 mii floppy ext3 jbd dm_mod ata_piix libata sd_mod scsi_mod Feb 2 15:19:47 lh kernel: Pid: 4928, comm: ping Not tainted 2.6.15-jesse #1 Feb 2 15:19:47 lh kernel: RIP: 0010:[<ffffffff802ad479>] <ffffffff802ad479>{skb_copy_datagram_iovec+360}
Feb  2 15:19:47 lh kernel: RSP: 0018:ffff810055869b58  EFLAGS: 00010282
Feb 2 15:19:47 lh kernel: RAX: ffff810058855c00 RBX: ffff8100596a3780 RCX: 0000000000000000 Feb 2 15:19:47 lh kernel: RDX: ffff810054d98880 RSI: ffff810054d98812 RDI: 0000555555670802 Feb 2 15:19:47 lh kernel: RBP: ffff810055869e88 R08: d3d2d1d0cfcecdcc R09: 9b9a999897969594 Feb 2 15:19:47 lh kernel: R10: a3a2a1a09f9e9d9c R11: 8b8a898887868584 R12: 00000000000007f2 Feb 2 15:19:47 lh kernel: R13: ffff810054d98020 R14: ffff81005a386480 R15: 0000000000000bb2 Feb 2 15:19:47 lh kernel: FS: 00002aaaaae00e20(0000) GS:ffffffff804e7800(0000) knlGS:0000000000000000 Feb 2 15:19:47 lh kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Feb 2 15:19:47 lh kernel: CR2: 00002aaaaac3e680 CR3: 000000005671c000 CR4: 00000000000006e0 Feb 2 15:19:47 lh kernel: Process ping (pid: 4928, threadinfo ffff810055868000, task ffff81005f9257c0) Feb 2 15:19:47 lh kernel: Stack: ffffffff80145116 ffff81005a386480 00000000000007f2 ffff810055869ec8 Feb 2 15:19:47 lh kernel: 000007f200000000 ffff8100596a3780 ffff810055869e88 ffff81005fa1d480
Feb  2 15:19:47 lh kernel:        0000000000001410 ffff810055869e08
Feb 2 15:19:47 lh kernel: Call Trace:<ffffffff80145116>{autoremove_wake_function+0} <ffffffff802e8fd8>{raw_recvmsg+175} Feb 2 15:19:47 lh kernel: <ffffffff802a9fff>{sock_common_recvmsg+45} <ffffffff802a6894>{sock_recvmsg+271} Feb 2 15:19:47 lh kernel: <ffffffff8020379e>{bit_clear+117} <ffffffff80232ef9>{tty_ldisc_try+59} Feb 2 15:19:47 lh kernel: <ffffffff80145116>{autoremove_wake_function+0} <ffffffff80151aea>{audit_sockaddr+53} Feb 2 15:19:47 lh kernel: <ffffffff80145116>{autoremove_wake_function+0} <ffffffff802a7f7e>{sys_sendmsg+534} Feb 2 15:19:47 lh kernel: <ffffffff802a80f4>{sys_recvmsg+348} <ffffffff8030f753>{_spin_lock_irqsave+11} Feb 2 15:19:47 lh kernel: <ffffffff8011e304>{do_page_fault+1109} <ffffffff8011133b>{syscall_trace_enter+195}
Feb  2 15:19:47 lh kernel:        <ffffffff8010d9c0>{tracesys+209}
Feb  2 15:19:47 lh kernel:
Feb 2 15:19:48 lh kernel: Code: 0f 0b 68 07 cb 34 80 c2 fd 00 48 8b 54 24 08 48 8b 12 48 89 Feb 2 15:19:48 lh kernel: RIP <ffffffff802ad479>{skb_copy_datagram_iovec+360} RSP <ffff810055869b58> Feb 2 15:19:48 lh kernel: <7>Losing some ticks... checking if CPU frequency changed.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to