On Mon, Jun 16, 2014 at 07:42:54PM -0400, Dave Jones wrote:
 > On Mon, Jun 16, 2014 at 07:04:50PM -0400, Dave Jones wrote:
 >  > On Sun, Jun 15, 2014 at 07:33:12PM -0700, David Miller wrote:
 >  > 
 >  >  > 1) Fix checksumming regressions, from Tom Herbert.
 >  > 
 >  > Something still not right for me here.
 >  > After about 5 minutes, I get an oops and then instant reboot/lock up.
 >  > 
 >  > I haven't managed to get a trace over usb-serial because it seems to
 >  > crash before it completes. Hand transcribed one looks like..
 >  > 
 >  > rbp: ffff880236403970 r08: 0000000000000000 r09: 0000000000000000
 >  > r10: 000000000000005a r11: 00000000000002d7 f12: ffff880233000d80
 >  > r13: ffff8800aa1a6fc2 r14: ffff880233001d40 f15: 00000000ffffac82
 >  > fs: 0 fs: ffff880236400000 knlGS: 0
 >  > CS: 10 DS: 0 ES: 0 CR0: 80050033
 >  > CR2: ffff8800aa1a8000 CR3: 1a0d000 CR4: 407f0
 >  > Stack:
 >  >  ffff880236403988 ffffffff81298bbc 00000000000016c0 ffff8802364039e8
 >  >  ffffffff814ca05a ffff880233001d40 000005a80000e397 ffff880233001680
 >  >  0000000000000000 0d420685ffffac82 000000000000012a 000000000000004e
 >  > Call Trace:
 >  > <IRQ>
 >  > csum_partial
 >  > tcp_gso_segment
 >  > inet_gso_segment
 >  > ? update_dl_migration
 >  > skb_mac_gso_segment
 >  > __skb_gso_segment
 >  > dev_hard_start_xmit
 >  > sch_direct_xmit
 >  > __dev_queue_xmit
 >  > ? dev_hard_start_xmit
 >  > dev_queue_xmit
 >  > ip_finish_output
 >  > ? ip_output
 >  > ip_output
 >  > ip_forward_finish
 >  > ip_forward
 >  > ip_rcv_finish
 >  > ip_rcv
 >  > __netif_receive_skb_core
 >  > ? __netif_receive_skb_core
 >  > ? trace_hardirqs_on
 >  > __netif_receive_skb
 >  > netif_receive_skb_internal
 >  > napi_gro_complete
 >  > ? napi_gro_complete
 >  > dev_gro_receive
 >  > ? dev_gro_receive
 >  > napi_gro_receive
 >  > rtl8169_poll
 >  > net_rx_action
 >  > __do_softirq
 >  > irq_exit
 >  > do_IRQ
 >  > common_interrupt
 >  > <EOI>
 >  > cpuidle_enter_state
 >  > cpuidle_enter
 >  > cpu_startup_entry
 >  > rest_init
 >  > ? csum_partial_copy_generic
 >  > start_kernel
 >  > RIP: do_csum+0x83/0x180
 >  > 
 >  > Code: 41 89 d2 74 45 89 d1 45 31 c0 48 89 fa 0f 1f 00 48 03 02 48 13 42
 >  > 08 48 13 42 10 48 13 42 20 48 13 42 28 48 13 42 30 <48> 13 42 38 4c 11
 >  > c0 48 83 c2 40 83 e9 01 75 d5 41 83 ea 01 49
 >  > 
 >  > All code
 >  > ========
 >  >    0:     41 89 d2                mov    %edx,%r10d
 >  >    3:     74 45                   je     0x4a
 >  >    5:     89 d1                   mov    %edx,%ecx
 >  >    7:     45 31 c0                xor    %r8d,%r8d
 >  >    a:     48 89 fa                mov    %rdi,%rdx
 >  >    d:     0f 1f 00                nopl   (%rax)
 >  >   10:     48 03 02                add    (%rdx),%rax
 >  >   13:     48 13 42 08             adc    0x8(%rdx),%rax
 >  >   17:     48 13 42 10             adc    0x10(%rdx),%rax
 >  >   1b:     48 13 42 20             adc    0x20(%rdx),%rax
 >  >   1f:     48 13 42 28             adc    0x28(%rdx),%rax
 >  >   23:     48 13 42 30             adc    0x30(%rdx),%rax
 >  >   27:*    48 13 42 38             adc    0x38(%rdx),%rax     <-- trapping 
 > instruction
 >  >   2b:     4c 11 c0                adc    %r8,%rax
 >  >   2e:     48 83 c2 40             add    $0x40,%rdx
 >  >   32:     83 e9 01                sub    $0x1,%ecx
 >  >   35:     75 d5                   jne    0xc
 >  >   37:     41 83 ea 01             sub    $0x1,%r10d
 >  >   3b:     49                      rex.WB
 >  > 
 >  > Typical, rdx and rax had scrolled off the screen.
 > 
 > after removing the dump_stack invocations, I noticed that the reason
 > this is rebooting is probably because right after the initial oops
 > we hit the WARN_ON at arch/x86/kernel/smp.c:124
 > 
 >         if (unlikely(cpu_is_offline(cpu))) {
 >                 WARN_ON(1);
 >                 return;
 >         }
 > 
 > lol.
 > 
 > Anwyay, before all that nonsense, I now have the top of the oops..
 > 
 > BUG: unable to handle kernel paging request at ffff880218c18000
 > IP: do_csum+0x68
 > PGD: 2c6a067 PUD: 2c6d067 PMD 23fd1c067 PTE: 80000000218c18060
 > RAX: 2090539bbf7b28f2 RBX: 00000000acb23d4e RCX: 000000000000000b
 > RDX: ffff880218c18000 RSI: 0000000000001c62 RDI: ffff880218c16680
 > 
 > Maybe also notable here is that the kernel is built with DEBUG_PAGEALLOC on.

This is still a problem in -rc2.
Lasts about 5 minutes, then reboots.

        Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to