skb_pull_rcsum - Fatal exception in interrupt

Alan J. Wylie Wed, 15 Aug 2007 08:10:37 -0700

We have been shipping Linux based servers to customers for several
years now, with few problems. Recently, however, a single customer has
been seeing kernel panics. Unfortunately, the customer is about 200
miles away, so physical access is limited. There are two ethernet
interfaces, one should be plugged into a local RFC1918 network, the
other is connected to the internet. If eth0 is plugged into the local
network, a short time later the system panics.


Hardware: Intel S5000VSA server

Network cards: Intel e1000
   Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) 

We shipped a second system, and this displayed identical symptoms.  We
have tested with several recent 2.6 kernels, including

2.6.22
2.6.17.14
2.6.20.15

all of which crash.

We have a couple of photographs showing the tail end of the messages
on the screen.

The last two lines are:

EIP: [<c02b6fb2>] skb_pull_rcsum+0x6d/0x71 SS:ESP 09068:c03e1ea4
Kernel panic - not syncing: Fatal exception in interrupt

The photos, along with the following information are available at
http://wylie.me.uk/skb_pull_rcsum/

lspci
lspci -n
lspci -v
ethtool -d
/proc/interrupts
kernel config

There are no related messages in the syslog files.

The code for skb_pull_rcsum is short, but contains two calls
to BUG_ON, checking for invalid lengths.

unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len)
{
        BUG_ON(len > skb->len);
        skb->len -= len;
        BUG_ON(skb->len < skb->data_len);
        skb_postpull_rcsum(skb, skb->data, len);
        return skb->data += len;
}

I wonder whether this problem bears any resemblance to 

http://bugzilla.kernel.org/show_bug.cgi?id=2979

| We were overreacting to invalid incoming AppleTalk frames. Better
| just drop invalid frames than crash the kernel ;)

<http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=75559c167bddc1254db5bcff032ad5eed8bd6f4a>

| [APPLETALK]: Fix a remotely triggerable crash

| When we receive an AppleTalk frame shorter than what its header
| says, we still attempt to verify its checksum, and trip on the
| BUG_ON() at the end of function atalk_sum_skb() because of the
| length mismatch.

| This has security implications because this can be triggered by
| simply sending a specially crafted ethernet frame to a target
| victim, effectively crashing that host. Thus this qualifies, I
| think, as a remote DoS.

Our system is also installed in a school. We have remote access to the
box, and can, with some inconvenience, arrange for the box to be
rebooted. We are currently arranging for two different network cards
(RealTek RTL8139) to be installed.

I am pretty certain that the problem is to do with network traffic,
rather than hardware or software configurations - this box is pretty
well identical to tens of other boxes working successfully, the only
difference being that recently the on-board ethernet changed from
8086:1079 (rev 03) to 8086:1096 (rev 01) requiring an updated e1000
driver.

What is the best way to track this bug down, remembering that we have
little more than ssh access and a remote finger to press the reboot
button?

Could we modify the code to log and drop the packet, rather than
panicking the kernel?

-- 
Alan J. Wylie                                          http://www.wylie.me.uk/
"Perfection [in design] is achieved not when there is nothing left to add,
but rather when there is nothing left to take away."
  -- Antoine de Saint-Exupery
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

skb_pull_rcsum - Fatal exception in interrupt

Reply via email to