This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed- xenial' to 'verification-done-xenial'.
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you! -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1574697 Title: WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595 [travis3EN] Status in linux package in Ubuntu: Fix Released Status in linux source package in Wily: Fix Committed Status in linux source package in Xenial: Fix Committed Status in linux source package in Yakkety: Fix Released Bug description: ---Problem Description--- WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595 [travis3EN] ---uname output--- Linux ltciofvtr-s822l2-lp3 4.4.0-4-generic #19-Ubuntu SMP Fri Feb 5 17:36:21 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux Machine Type = s822l ---Steps to Reproduce--- triggering EEH causes the warning messages in syslog Note: its just the warning messages, card recovers after EEH 1. from peer: run some load linux-xqxs:~ # ping -f 22.22.22.22 2. from pKVM host run the EEH for the travis3EN card [root@ltciofvtr-s822l2-lp1 ~]# echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0003/err_injct_inboundA; sleep 1; echo 0x0 > /sys/kernel/debug/powerpc/PCI0003/err_injct_inboundA 3. on client's sysfs you can see the warning messages "WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595" [ 940.382507] EEH: Frozen PHB#0-PE#1 detected [ 940.382594] EEH: PE location: N/A, PHB location: N/A [ 940.382828] mlx4_core 0000:00:04.0: mlx4_pci_err_detected was called [ 940.382891] mlx4_core 0000:00:04.0: device is going to be reset [ 940.382953] mlx4_core 0000:00:04.0: device was reset successfully [ 940.383014] mlx4_en 0000:00:04.0: Internal error detected, restarting device Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382507] EEH: Frozen PHB#0-PE#1 detected Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382594] EEH: PE location: N/A, PHB location: N/A Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382647] CPU: 1 PID: 176 Comm: kworker/u16:2 Not tainted 4.4.0-4-generic #19-Ubuntu Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382671] Workqueue: mlx4_en mlx4_en_do_get_stats [mlx4_en] Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382673] Call Trace: Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382714] [c00000000487b7c0] [c000000000ad8aa0] dump_stack+0x90/0xbc (unreliable) Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382725] [c00000000487b7f0] [c0000000000378f4] eeh_dev_check_failure+0x534/0x580 Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382728] [c00000000487b890] [c0000000000379c4] eeh_check_failure+0x84/0xd0 Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382743] [c00000000487b8d0] [d000000002112fc0] cmd_pending+0xb0/0xe0 [mlx4_core] Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382749] [c00000000487b900] [d0000000021130b0] mlx4_cmd_post+0xc0/0x250 [mlx4_core] Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382756] [c00000000487b9b0] [d00000000211592c] __mlx4_cmd+0x1dc/0x9b0 [mlx4_core] Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382766] [c00000000487ba70] [d0000000024eb030] mlx4_en_DUMP_ETH_STATS+0xc0/0x830 [mlx4_en] Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382770] [c00000000487bb70] [d0000000024ef150] mlx4_en_do_get_stats+0x160/0x340 [mlx4_en] Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382780] [c00000000487bc50] [c0000000000dc920] process_one_work+0x1e0/0x560 Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382783] [c00000000487bce0] [c0000000000dce34] worker_thread+0x194/0x680 Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382785] [c00000000487bd80] [c0000000000e58d0] kthread+0x110/0x130 Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382788] [c00000000487be30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4 Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382814] mlx4_core 0000:00:04.0: Could not post command 0x49: ret=-5, in_param=0x0, in_mod=0x1, op_mod=0x0 Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382821] EEH: Detected PCI bus error on PHB#0-PE#1 Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382823] EEH: This PCI device has failed 1 times in the last hour Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382824] EEH: Notify device drivers to shutdown Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382828] mlx4_core 0000:00:04.0: mlx4_pci_err_detected was called Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382891] mlx4_core 0000:00:04.0: device is going to be reset Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.382953] mlx4_core 0000:00:04.0: device was reset successfully Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.383014] mlx4_en 0000:00:04.0: Internal error detected, restarting device Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.383320] mlx4_en: enp0s4: Close port called Feb 19 02:18:23 ltciofvtr-s822l2-lp3 systemd[1]: Starting Cleanup of Temporary Directories... Feb 19 02:18:23 ltciofvtr-s822l2-lp3 systemd-tmpfiles[2473]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring. Feb 19 02:18:23 ltciofvtr-s822l2-lp3 systemd[1]: Started Cleanup of Temporary Directories. Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 940.801690] mlx4_en 0000:00:04.0: removed PHC Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 941.079593] EEH: Collect temporary log Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 941.079631] eeh_pci_enable: Unexpected state change 2 on PHB#0-PE#1, err=-3 Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 941.081329] EEH: of node=0000:00:04:0 Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 941.081348] EEH: PCI device/vendor: 100315b3 Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 941.081582] EEH: PCI cmd/status register: 00100142 Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [ 941.081584] EEH: PCI-E capabilities and status follow: Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.081725] EEH: PCI-E 00: 0002c010 11d08e02 0020202e 0843f483 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.081849] EEH: PCI-E 10: 10830000 00000000 00000000 00000000 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.081851] EEH: PCI-E 20: 00000000 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.081886] EEH: Reset without hotplug activity Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.081935] mlx4_core 0000:00:04.0: mlx4_remove_one: interface is down Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082003] mlx4_core 0000:00:04.0: disabling already-disabled device Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082046] ------------[ cut here ]------------ Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082049] WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082051] Modules linked in: ib_ipoib mlx5_ib mlx5_core rdma_ucm rdma_cm iw_cm ib_umad ib_ucm ib_cm ib_sa ib_mad ib_uverbs ib_core ib_addr pseries_rng rtc_generic nfsd auth_rpcgss nfs_acl lockd grace sunrpc autofs4 mlx4_en vxlan ip6_udp_tunnel udp_tunnel ibmvscsi mlx4_core Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082257] CPU: 1 PID: 49 Comm: eehd Not tainted 4.4.0-4-generic #19-Ubuntu Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082260] task: c0000003f91e9370 ti: c0000003f9060000 task.ti: c0000003f9060000 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082263] NIP: c0000000005cdf0c LR: c0000000005cdf08 CTR: c00000000057ae00 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082265] REGS: c0000003f9063560 TRAP: 0700 Not tainted (4.4.0-4-generic) Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082267] MSR: 8000000100029033 <SF,EE,ME,IR,DR,RI,LE> CR: 28002422 XER: 20000000 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] CFAR: c000000000ad578c SOFTE: 1 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR00: c0000000005cdf08 c0000003f90637e0 c000000001593900 0000000000000039 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR04: 0000000000000001 0000000000000000 0000000000000048 0000000000000175 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR08: c000000001733900 0000000000000000 0000000000000000 0000000000000005 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR12: 0000000028002428 c00000000fb40980 c0000000000e57c8 c0000003fe165980 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000d1f500 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR24: c000000000d1f4d8 0000000000000100 c0000003fe058580 0000000000000000 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082275] GPR28: c0000003fe144000 c000000004fd0300 c0000003fe144758 c0000003fe144000 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082328] NIP [c0000000005cdf0c] pci_disable_device+0x11c/0x140 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082332] LR [c0000000005cdf08] pci_disable_device+0x118/0x140 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082333] Call Trace: Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082337] [c0000003f90637e0] [c0000000005cdf08] pci_disable_device+0x118/0x140 (unreliable) Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082350] [c0000003f9063850] [d00000000212b0d4] mlx4_remove_one+0xc4/0x250 [mlx4_core] Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082353] [c0000003f90638e0] [c0000000005d2fc0] pci_device_remove+0x70/0x110 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082358] [c0000003f9063920] [c0000000006be740] __device_release_driver+0xc0/0x190 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082362] [c0000003f9063950] [c0000000006be850] device_release_driver+0x40/0x70 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082365] [c0000003f9063980] [c0000000005c7e30] pci_stop_bus_device+0xf0/0x110 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082368] [c0000003f90639c0] [c0000000005c7fbc] pci_stop_and_remove_bus_device+0x2c/0x50 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082372] [c0000003f90639f0] [c00000000003c100] eeh_rmv_device+0x140/0x1a0 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082375] [c0000003f9063a70] [c00000000003a294] eeh_pe_dev_traverse+0x94/0x160 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082380] [c0000003f9063b00] [c000000000ad39d0] eeh_reset_device+0xbc/0x218 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082383] [c0000003f9063ba0] [c00000000003c454] eeh_handle_normal_event+0x2f4/0x430 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082386] [c0000003f9063c20] [c00000000003c764] eeh_handle_event+0x54/0x360 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082389] [c0000003f9063cd0] [c00000000003cb8c] eeh_event_handler+0x11c/0x1e0 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082393] [c0000003f9063d80] [c0000000000e58d0] kthread+0x110/0x130 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082397] [c0000003f9063e30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082399] Instruction dump: Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082401] 409eff64 387f0098 480eab45 60000000 e8bf00e8 2fa50000 7c641b78 419e0028 Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082407] 3c62ff7f 38633aa8 48507821 60000000 <0fe00000> 39200001 3d42fff8 992a1acb Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082413] ---[ end trace 1cce98b956e06602 ]--- Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [ 941.082431] iommu: Removing device 0000:00:04.0 from group 0 Feb 19 02:18:28 ltciofvtr-s822l2-lp3 kernel: [ 945.197931] EEH: Sleep 5s ahead of partial hotplug Feb 19 02:18:33 ltciofvtr-s822l2-lp3 kernel: [ 950.204919] iommu: Adding device 0000:00:04.0 to group 0 Feb 19 02:18:33 ltciofvtr-s822l2-lp3 kernel: [ 950.205129] mlx4_core: Initializing 0000:00:04.0 Feb 19 02:18:33 ltciofvtr-s822l2-lp3 kernel: [ 950.207395] mlx4_core 0000:00:04.0: Using 64-bit direct DMA at offset 800000000000000 Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.254212] mlx4_core 0000:00:04.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.254215] mlx4_core 0000:00:04.0: PCIe link width is x8, device supports x8 [ 955.356803] mlx4_en: 0000:00:04.0: Port 1: frag:0 - size:1522 prefix:0 stride:1536 Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.353773] mlx4_en 0000:00:04.0: Activating port:1 Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.356795] mlx4_en: 0000:00:04.0: Port 1: Using 64 TX rings Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.356800] mlx4_en: 0000:00:04.0: Port 1: Using 8 RX rings Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.356803] mlx4_en: 0000:00:04.0: Port 1: frag:0 - size:1522 prefix:0 stride:1536 Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.359817] mlx4_en: 0000:00:04.0: Port 1: Initializing port Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.360278] mlx4_en 0000:00:04.0: registered PHC clock Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.361113] mlx4_en 0000:00:04.0: Activating port:2 [ 955.365352] mlx4_en: 0000:00:04.0: Port 2: frag:0 - size:1522 prefix:0 stride:1536 Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.363940] mlx4_core 0000:00:04.0 enp0s4: renamed from eth0 Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.365347] mlx4_en: 0000:00:04.0: Port 2: Using 64 TX rings Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.365350] mlx4_en: 0000:00:04.0: Port 2: Using 8 RX rings Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.365352] mlx4_en: 0000:00:04.0: Port 2: frag:0 - size:1522 prefix:0 stride:1536 Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.380726] mlx4_en: 0000:00:04.0: Port 2: Initializing port Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.386733] EEH: Notify device drivers the completion of reset Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.386737] EEH: Notify device driver to resume Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.408991] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014) Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.410735] mlx4_core 0000:00:04.0 enp0s4d1: renamed from eth0 Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.411687] <mlx4_ib> mlx4_ib_add: counter index 2 for port 1 allocated 1 Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [ 955.411690] <mlx4_ib> mlx4_ib_add: counter index 3 for port 2 allocated 1 Feb 19 02:18:40 ltciofvtr-s822l2-lp3 kernel: [ 957.608097] mlx4_en: enp0s4d1: Link Up Feb 19 02:18:40 ltciofvtr-s822l2-lp3 kernel: [ 957.662997] mlx4_en: enp0s4: Link Up pKVM syslog: Feb 19 18:16:47 ltciofvtr-s822l2-lp1 kernel: vfio-pci 0003:0b:00.0: enabling dev ice (0140 -> 0142) Feb 19 18:20:01 ltciofvtr-s822l2-lp1 systemd: Starting Session 1302 of user root . Feb 19 18:20:01 ltciofvtr-s822l2-lp1 systemd: Started Session 1302 of user root. Feb 19 18:20:01 ltciofvtr-s822l2-lp1 systemd: Failed to reset devices.list on /m achine.slice: Invalid argument The patches are finally upstream: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/patch/drivers/net/ethernet/mellanox/mlx4?id=c12833acff62cff83a8b728253e7ebbc1264d75e From c12833acff62cff83a8b728253e7ebbc1264d75e Mon Sep 17 00:00:00 2001 From: Daniel Jurgens <dani...@mellanox.com> Date: Wed, 20 Apr 2016 16:01:15 +0300 Subject: net/mlx4_core: Implement pci_resume callback https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/patch/drivers/net/ethernet/mellanox/mlx4?id=4bfd2e6e53435a214888fd35e230157a38ffc6a0 From 4bfd2e6e53435a214888fd35e230157a38ffc6a0 Mon Sep 17 00:00:00 2001 From: Daniel Jurgens <dani...@mellanox.com> Date: Wed, 20 Apr 2016 16:01:16 +0300 Subject: net/mlx4_core: Avoid repeated calls to pci enable/disable To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1574697/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp