Hello, We are developing an advanced networking services loadable module and are having problems porting it to work on 2.4.x kernels. The driver is supposed to provide services such as fault tolerance, load balancing and link aggregation over a team of network adapters. It works OK on 2.2.x kernels but hangs on 2.4.x kernels. In order to debug it, we stripped it down to become a mere "intermediate" or "filter" driver that binds to a base driver and passes everything through in both directions (Rx, Tx, IOCTL, stats, etc.). After going through the basics of modifying the driver to compile on 2.4.x kernels and fighting some nasty dead locks due to the new nature of the networking layer, we managed to get it to run. The driver will receive and transmit a few hundreds of thousands of packets (while having a periodic timer expire 10 times a second and running continuous IOCTLs), and then it causes an oops about not being able to handle a page fault. The function looks something like: int iansHardStartXmit(struct sk_buff *skb, struct net_device *dev) { int res; struct net_device *base; spin_lock(&lock); base = get_base_driver_by_name(name); if(base != NULL) { res = base->hard_start_xmit(skb, base); } spin_unlock(&lock); return res; } We used kdb in order to track down the problem and found out the following stack trace: EBP EIP function(args) 0xc4cd1c54 0xd081e3e7 [e100]__kallsyms+0xb (0xc4b595a0, 0xc840f200) e100 __kallsyms 0xd081e3dc 0xd081e3dc 0xd0820dsc 0xd08244ba [ians]iansHardStartXmit+0xa6 (0xc4b595a0, 0xc4d9bc00) ians .text 0xd0824060 0xd0824414 0xd082452c 0xc01f9d1f qdisc_restart+0xcf (0xc4d9bc00) kernel .text 0xc0100000 0xc01f9c50 0xc01f9f14 * * * This goes on and shows that this is an ICMP echo reply packet going down through the IP stack to the filter driver (apparently 0xc4b595a0 is the skb, 0xc4d9bc00 is the *dev of the filter driver and 0xc840f200 is the *dev of the base driver). The filter driver is supposed to call the dev->hard_start_xmit of the base driver, but strangely it lands somewhere in the data segment of the base driver (__kallsyms is a part of the symbol table of the module according to insmod -m). Figuring the dev->hard_start_xmit pointer got trashed somehow, we added a check to make sure the same pointer is always called, and indeed this was the case. Looking at the assembly code with kdb, we could see that the call to the base driver is done by a 'call *%eax' command. kdb reports that eax=0xffffffff after the page fault (origeax). How is it possible that the pointer to the function keeps it's value, but the jump to that function falls somewhere else ? The entire function is protected by a spinlock, so there is no worry about the other threads messing my data. We are using: RedHat 6.2 gcc v2.91.66 modutils v2.3.11-1 kernel linux-2.4.0-test9 kdb v1.5-2.4.0-test9-pre9 Compaq ap500 dual p-III Xeon Thanks, Shmulik Hen Software Engineer Linux Advanced Networking Services Network Communications Group, Israel (NCGj) Intel Corporation Ltd. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/