Re: 3.17-rc1 oops during network interface configuration

Chuck Lever Mon, 29 Sep 2014 11:53:24 -0700

On Sep 10, 2014, at 3:42 AM, Or Gerlitz <[email protected]> wrote:


> 
> On 9/9/2014 10:30 PM, Chuck Lever wrote:
>> This crash happens when booting v3.17-rcN on any of my IB-enabled
>> systems. I have both ConnectX-2 and mthca systems, all are affected.
>> 
>> I bisected this to:
>> 
>> commit e0f31d8498676fda36289603a054d0d490aa2679
>> Author:     Govindarajulu Varadarajan <[email protected]>
>> AuthorDate: Mon Jun 23 16:07:58 2014 +0530
>> Commit:     David S. Miller <[email protected]>
>> CommitDate: Mon Jun 23 14:32:19 2014 -0700
>> 
>>     flow_keys: Record IP layer protocol in skb_flow_dissect()
>> 
>>     skb_flow_dissect() dissects only transport header type in ip_proto. It 
>> dose not
>>     give any information about IPv4 or IPv6.
>>     This patch adds new member, n_proto, to struct flow_keys. Which records 
>> the
>>     IP layer type. i.e IPv4 or IPv6.
>>     This can be used in netdev->ndo_rx_flow_steer driver function to dissect 
>> flow.
>>     Adding new member to flow_keys increases the struct size by around 4 
>> bytes.
>>     This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
>>     qdisc_cb_private_validate()
>>     So increase data size by 4
>> 
>>     Signed-off-by: Govindarajulu Varadarajan <[email protected]>
>>     Signed-off-by: David S. Miller <[email protected]>
>> 
>> 
>> This commit includes a hunk that increases the size of struct qdisc_skb_cb
>> by at least 4 bytes:
>> 
>>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>>> index 624f985..a3cfb8e 100644
>>> --- a/include/net/sch_generic.h
>>> +++ b/include/net/sch_generic.h
>>> @@ -231,7 +231,7 @@ struct qdisc_skb_cb {
>>>         unsigned int            pkt_len;
>>>         u16                     slave_dev_queue_mapping;
>>>         u16                     _pad;
>>> -       unsigned char           data[20];
>>> +       unsigned char           data[24];
>>>  };
>>>    static inline void qdisc_cb_private_validate(const struct sk_buff *skb, 
>>> int sz)
>> 
>> 
>> IPoIB defines the following structure in 
>> drivers/infiniband/ulp/ipoib/ipoib.h:
>> 
>>> struct ipoib_cb {
>>>         struct qdisc_skb_cb     qdisc_cb;
>>>         u8                      hwaddr[INFINIBAND_ALEN];
>>> };
>> IPoIB keeps this in the sk_buff:cb field, which is exactly 48 bytes.
>> After commit e0f31d84, the size of struct ipoib_cb on x86_64 becomes
>> 52 bytes.
>> 
>> Thus IPoIB overruns sk_buff:cb, and trashes the sk_buff::_skb_refdst
>> field, which contains a pointer. By the time we get into
>> __dev_queue_xmit() and try to use the result of skb_dst(), that pointer
>> is garbage, and we oops.
>> 
>> Obviously, cb[] could be increased to 56 bytes to accommodate struct
>> ipoib_cb. I tried this, and it is effective in preventing the oops on
>> one of my systems.
>> 
>> But I suspect there is an historical reason I’m not aware of that it
>> has remained 48 bytes for years.
> 
> Hi Chuck, thanks for bisecting this out. Indeed, as of this kernel 3.2 commit 
> 936d7de "IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL 
> addresses" we are using the skb->cb field to enable proper work under GRO and 
> avoid another historical quirk we had there... so I think we can definetly 
> consider commit e0f31d849 to introduce a severe regression... Govindarajulu, 
> Dave - what's your thinking here? any quick idea on how to fix?

Hi Or-

Is there a bugzilla report filed for this issue? Has there been any progress
towards a fix?


> Also, I was thinking we have the mechanics in the kernel, e.g commit 
> a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") to catch 
> such over-flows?
> 
> Or.
> 
>>>> BUG: unable to handle kernel paging request at ffff88090000007e
>>>> IP: __dev_queue_xmit+0x519
>>>> Call Trace:
>>>> ? __dev_queue_xmit+0x49
>>>> dev_queue_xmit+0x10
>>>> neigh_connected_output
>>>> ? ip_finish_output
>>>> ip_finish_output
>>>> ? ip_finish_output
>>>> ? netif_rx_ni
>>>> ip_mc_output
>>>> ip_local_out_sk
>>>> ip_send_skb
>>>> udp_send_skb
>>>> udp_sendmsg
>>>> ? ip_reply_glue_bits
>>>> ? __lock_is_held
>>>> inet_sendmsg
>>>> ? inet_sendmsg
>>>> sock_sendmsg
>>>> ? might_fault
>>>> ? might_fault
>>>> ? move_addr_to_kernel.part.38
>>>> SYSC_sendto
>>>> ? sysret_check
>>>> ? trace_hardirqs_on_caller
>>>> ? trace_hardirqs_on_thunk
>>>> SyS_sendto
>>>> system_call_fastpath
>>>> 
>>>> Kernel panic - not syncing: Fatal exception in interrupt
>>>> Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 
>>>> 0xffffffff80000000-0xffffffff9fffffff)
>>>> drm_kms_helper: panic occurred, switching back to text console
>>>> 
>>>> A screenshot of this kernel oops can be found here:
>>>> https://drive.google.com/file/d/0B1YQOreL3_FxVDB5UTNwekF6LVU/
>>>> 
>>>> gdb translates the crash address into the following (not sure this makes 
>>>> sense
>>>> since offset 0x519 is past the end of __dev_queue_xmit()):
>>>> 
>>>> (gdb) list *(__dev_queue_xmit+0x519)
>>>> 0xffffffff8136bc89 is in netdev_adjacent_rename_links 
>>>> (net/core/dev.c:5167).
>>>> 5162    void netdev_adjacent_rename_links(struct net_device *dev, char 
>>>> *oldname)
>>>> 5163    {
>>>> 5164            struct netdev_adjacent *iter;
>>>> 5165
>>>> 5166            list_for_each_entry(iter, &dev->adj_list.upper, list) {
>>>> 5167                    netdev_adjacent_sysfs_del(iter->dev, oldname,
>>>> 5168                                              
>>>> &iter->dev->adj_list.lower);
>>>> 5169                    netdev_adjacent_sysfs_add(iter->dev, dev,
>>>> 5170                                              
>>>> &iter->dev->adj_list.lower);
>>>> 5171            }
>>>> 
>>>> And the address __dev_queue_xmit+0x49 is translated by gdb into:
>>>> 
>>>> (gdb) list *(__dev_queue_xmit+0x49)
>>>> 0xffffffff8136b7b9 is in __dev_queue_xmit 
>>>> (./arch/x86/include/asm/preempt.h:75).
>>>> 70       * The various preempt_count add/sub methods
>>>> 71       */
>>>> 72
>>>> 73      static __always_inline void __preempt_count_add(int val)
>>>> 74      {
>>>> 75              raw_cpu_add_4(__preempt_count, val);
>>>> 76      }
>>>> 77
>>>> 78      static __always_inline void __preempt_count_sub(int val)
>>>> 79      {
> 

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.17-rc1 oops during network interface configuration

Reply via email to