On Sep 10, 2014, at 3:42 AM, Or Gerlitz <ogerl...@mellanox.com> wrote:
> > On 9/9/2014 10:30 PM, Chuck Lever wrote: >> This crash happens when booting v3.17-rcN on any of my IB-enabled >> systems. I have both ConnectX-2 and mthca systems, all are affected. >> >> I bisected this to: >> >> commit e0f31d8498676fda36289603a054d0d490aa2679 >> Author: Govindarajulu Varadarajan <_gov...@gmx.com> >> AuthorDate: Mon Jun 23 16:07:58 2014 +0530 >> Commit: David S. Miller <da...@davemloft.net> >> CommitDate: Mon Jun 23 14:32:19 2014 -0700 >> >> flow_keys: Record IP layer protocol in skb_flow_dissect() >> >> skb_flow_dissect() dissects only transport header type in ip_proto. It >> dose not >> give any information about IPv4 or IPv6. >> This patch adds new member, n_proto, to struct flow_keys. Which records >> the >> IP layer type. i.e IPv4 or IPv6. >> This can be used in netdev->ndo_rx_flow_steer driver function to dissect >> flow. >> Adding new member to flow_keys increases the struct size by around 4 >> bytes. >> This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in >> qdisc_cb_private_validate() >> So increase data size by 4 >> >> Signed-off-by: Govindarajulu Varadarajan <_gov...@gmx.com> >> Signed-off-by: David S. Miller <da...@davemloft.net> >> >> >> This commit includes a hunk that increases the size of struct qdisc_skb_cb >> by at least 4 bytes: >> >>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h >>> index 624f985..a3cfb8e 100644 >>> --- a/include/net/sch_generic.h >>> +++ b/include/net/sch_generic.h >>> @@ -231,7 +231,7 @@ struct qdisc_skb_cb { >>> unsigned int pkt_len; >>> u16 slave_dev_queue_mapping; >>> u16 _pad; >>> - unsigned char data[20]; >>> + unsigned char data[24]; >>> }; >>> static inline void qdisc_cb_private_validate(const struct sk_buff *skb, >>> int sz) >> >> >> IPoIB defines the following structure in >> drivers/infiniband/ulp/ipoib/ipoib.h: >> >>> struct ipoib_cb { >>> struct qdisc_skb_cb qdisc_cb; >>> u8 hwaddr[INFINIBAND_ALEN]; >>> }; >> IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes. >> After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes >> 52 bytes. >> >> Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst >> field, which contains a pointer. By the time we get into >> __dev_queue_xmit() and try to use the result of skb_dst(), that pointer >> is garbage, and we oops. >> >> Obviously, cb[] could be increased to 56 bytes to accommodate struct >> ipoib_cb. I tried this, and it is effective in preventing the oops on >> one of my systems. >> >> But I suspect there is an historical reason I’m not aware of that it >> has remained 48 bytes for years. > > Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 commit > 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL > addresses" we are using the skb->cb field to enable proper work under GRO and > avoid another historical quirk we had there... so I think we can definetly > consider commit e0f31d849 to introduce a severe regression... Govindarajulu, > Dave - what's your thinking here? any quick idea on how to fix? Hi Or- Is there a bugzilla report filed for this issue? Has there been any progress towards a fix? > Also, I was thinking we have the mechanics in the kernel, e.g commit > a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to catch > such over-flows? > > Or. > >>>> BUG: unable to handle kernel paging request at ffff88090000007e >>>> IP: __dev_queue_xmit+0x519 >>>> Call Trace: >>>> ? __dev_queue_xmit+0x49 >>>> dev_queue_xmit+0x10 >>>> neigh_connected_output >>>> ? ip_finish_output >>>> ip_finish_output >>>> ? ip_finish_output >>>> ? netif_rx_ni >>>> ip_mc_output >>>> ip_local_out_sk >>>> ip_send_skb >>>> udp_send_skb >>>> udp_sendmsg >>>> ? ip_reply_glue_bits >>>> ? __lock_is_held >>>> inet_sendmsg >>>> ? inet_sendmsg >>>> sock_sendmsg >>>> ? might_fault >>>> ? might_fault >>>> ? move_addr_to_kernel.part.38 >>>> SYSC_sendto >>>> ? sysret_check >>>> ? trace_hardirqs_on_caller >>>> ? trace_hardirqs_on_thunk >>>> SyS_sendto >>>> system_call_fastpath >>>> >>>> Kernel panic - not syncing: Fatal exception in interrupt >>>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: >>>> 0xffffffff80000000-0xffffffff9fffffff) >>>> drm_kms_helper: panic occurred, switching back to text console >>>> >>>> A screenshot of this kernel oops can be found here: >>>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/ >>>> >>>> gdb translates the crash address into the following (not sure this makes >>>> sense >>>> since offset 0x519 is past the end of __dev_queue_xmit()): >>>> >>>> (gdb) list *(__dev_queue_xmit+0x519) >>>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links >>>> (net/core/dev.c:5167). >>>> 5162 void netdev_adjacent_rename_links(struct net_device *dev, char >>>> *oldname) >>>> 5163 { >>>> 5164 struct netdev_adjacent *iter; >>>> 5165 >>>> 5166 list_for_each_entry(iter, &dev->adj_list.upper, list) { >>>> 5167 netdev_adjacent_sysfs_del(iter->dev, oldname, >>>> 5168 >>>> &iter->dev->adj_list.lower); >>>> 5169 netdev_adjacent_sysfs_add(iter->dev, dev, >>>> 5170 >>>> &iter->dev->adj_list.lower); >>>> 5171 } >>>> >>>> And the address __dev_queue_xmit+0x49 is translated by gdb into: >>>> >>>> (gdb) list *(__dev_queue_xmit+0x49) >>>> 0xffffffff8136b7b9 is in __dev_queue_xmit >>>> (./arch/x86/include/asm/preempt.h:75). >>>> 70 * The various preempt_count add/sub methods >>>> 71 */ >>>> 72 >>>> 73 static __always_inline void __preempt_count_add(int val) >>>> 74 { >>>> 75 raw_cpu_add_4(__preempt_count, val); >>>> 76 } >>>> 77 >>>> 78 static __always_inline void __preempt_count_sub(int val) >>>> 79 { > -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/