------- Comment From cls...@us.ibm.com 2017-07-10 15:54 EDT------- (In reply to comment #5) > I built a test kernel with a pick of commit 377aa6b0efba. The test kernel > can be downloaded from: > > http://kernel.ubuntu.com/~jsalisbury/lp1702768/ > > Can you test this kernel and see if it resolves this bug?
I tried this kernel and it is ok. [ 128.224664] (0004:01:00.0): E-Switch: E-Switch enable SRIOV: nvfs(1) mode (1) [ 128.234634] (0004:01:00.0): E-Switch: SRIOV enabled: active vports(2) [ 128.234818] mlx5_core 0004:01:00.0: VF BAR0: [mem 0x240000000000-0x2401ffffffff 64bit pref] shifted to [mem 0x240000000000-0x2401ffffffff 64bit pref] (Disabling 1 VFs shifted by 0) [ 128.234836] pci 0004:01: 0.2: [PE# 00] VF 0004:01:00.2 associated with PE#0 [ 128.235086] pci 0004:01: 0.2: [PE# 00] Setting up 32-bit TCE table at 0..80000000 [ 128.238861] pci 0004:01: 0.2: [PE# 00] Setting up window#0 0..7fffffff pg=1000 [ 128.238972] pci 0004:01: 0.2: [PE# 00] Enabling 64-bit DMA bypass [ 128.344614] pci 0004:01:00.2: [15b3:1014] type 00 class 0x020000 [ 128.344942] pci 0004:01:00.2: Max Payload Size set to 512 (was 128, max 512) [ 128.345403] iommu: Adding device 0004:01:00.2 to group 6 [ 128.345871] mlx5_core 0004:01:00.2: enabling device (0000 -> 0002) [ 128.345907] mlx5_core 0004:01:00.2: Using 64-bit DMA iommu bypass [ 128.346076] mlx5_core 0004:01:00.2: firmware version: 12.20.1010 [ 128.902589] mlx5_core 0004:01:00.2: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0) [ 128.903017] mlx5_core 0004:01:00.2: Assigned random MAC address 2a:fc:b7:49:03:1b [ 129.007113] mlx5_core 0004:01:00.2 enP4p1s0f2: renamed from eth0 [ 129.015731] mlx5_ib: Mellanox Connect-IB Infiniband driver v2.2-1 (Feb 2014) uname -a 4.10.0-26-generic #30~lp1702768 SMP Mon Jul 10 18:37:50 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1702768 Title: Ubuntu 17.04 KVM: stack trace generated when enabling SRIOV in power Status in The Ubuntu-power-systems project: In Progress Status in linux package in Ubuntu: In Progress Bug description: ---Problem Description--- When enabling SRIOV with kernel 4.10.0-26-generic in power will see this stack trace: [ 2084.079575] ------------[ cut here ]------------ [ 2084.079583] WARNING: CPU: 120 PID: 734 at /build/linux-TAhFXm/linux-4.10.0/arch/powerpc/platforms/powernv/npu-dma.c:78 pnv_pci_get_npu_dev+0x40/0xb0 [ 2084.079584] Modules linked in: mst_pciconf(OE) mst_pci(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp kvm_hv kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rdma_ucm(OE) ib_ucm(OE) ib_ipoib(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx4_ib(OE) binfmt_misc bridge stp llc ipmi_powernv ipmi_devintf ipmi_msghandler powernv_rng powernv_op_panel uio_pdrv_genirq leds_powernv uio ibmpowernv vmx_crypto sunrpc ib_iser(OE) rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_core(OE) configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi knem(OE) ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx [ 2084.079640] xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx4_en(OE) ses enclosure scsi_transport_sas crc32c_vpmsum tg3 mlx5_core(OE) mlx4_core(OE) ipr devlink mlx_compat(OE) [ 2084.079658] CPU: 120 PID: 734 Comm: kworker/120:0 Tainted: G W OE 4.10.0-26-generic #30-Ubuntu [ 2084.079663] Workqueue: events work_for_cpu_fn [ 2084.079665] task: c000000fee60dc00 task.stack: c000000fee534000 [ 2084.079666] NIP: c00000000009c210 LR: c00000000009d404 CTR: 0000000000000000 [ 2084.079668] REGS: c000000fee537700 TRAP: 0700 Tainted: G W OE (4.10.0-26-generic) [ 2084.079669] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> [ 2084.079677] CR: 42004428 XER: 20000000 [ 2084.079678] CFAR: c00000000009d400 SOFTE: 1 GPR00: c00000000009d404 c000000fee537980 c00000000145d100 0000000000000000 GPR04: 0000000000000000 0000000000000aa6 c000001fff700000 0000000000049188 GPR08: 0000000000000007 0000000000000001 0000000000000001 0000000000000000 GPR12: 0000000000002200 c00000000fbc3800 c00000000010ef48 c000000ff70ec540 GPR16: c000000ffa622c58 c000000ffa622a10 c000000ffa6229a0 0000000000000001 GPR20: 0000000000000000 c000000001318de8 c000000000d700e8 0000000000000001 GPR24: c000000000d6f070 c000000000d6f050 c000000003d02000 c000000003d02098 GPR28: c000000e92680060 0800001fffffffff ffffffffffffffff 0000000000000000 [ 2084.079702] NIP [c00000000009c210] pnv_pci_get_npu_dev+0x40/0xb0 [ 2084.079704] LR [c00000000009d404] pnv_npu_try_dma_set_bypass+0x144/0x250 [ 2084.079705] Call Trace: [ 2084.079708] [c000000fee5379b0] [c00000000009d404] pnv_npu_try_dma_set_bypass+0x144/0x250 [ 2084.079710] [c000000fee537a80] [c000000000096c74] pnv_pci_ioda_dma_set_mask+0xa4/0x150 [ 2084.079714] [c000000fee537b00] [c0000000000291a0] dma_set_mask+0x40/0xc0 [ 2084.079728] [c000000fee537b20] [d0000000143531e4] init_one+0x33c/0x6a0 [mlx5_core] [ 2084.079732] [c000000fee537bd0] [c00000000066ba9c] local_pci_probe+0x6c/0x140 [ 2084.079734] [c000000fee537c60] [c0000000001016b8] work_for_cpu_fn+0x38/0x60 [ 2084.079737] [c000000fee537c90] [c0000000001061a0] process_one_work+0x2b0/0x5a0 [ 2084.079740] [c000000fee537d20] [c000000000106780] worker_thread+0x2f0/0x650 [ 2084.079742] [c000000fee537dc0] [c00000000010f0a4] kthread+0x164/0x1b0 [ 2084.079746] [c000000fee537e30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74 [ 2084.079747] Instruction dump: [ 2084.079748] 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c690074 7929d182 0b090000 2fa30000 [ 2084.079753] 419e0060 e8630330 7c690074 7929d182 <0b090000> 2fa30000 419e0048 7c852378 [ 2084.079759] ---[ end trace 7bf01a937efd69d8 ]--- This issue was introduced by this commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4c3b89effc281704d5395282c800c45e453235f6 (Subject: powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev ) and the solution will be to add this commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=377aa6b0efbaa29cfeecd8b9244641217f9544ca which reads: "powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has no of_node" Requesting fix inclusion in 17.04 and probably 16.04.3. ---uname output--- 4.10.0-26-generic #30-Ubuntu SMP Tue Jun 27 09:29:34 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux ---Additional Hardware Info--- Need a Mellanox card that supports SRIOV. Machine Type = P8 ---Steps to Reproduce--- Just enable SRIOV in a power system with Mellanox CX4 or CX5 will be like this: echo 1 > /sys/class/infiniband/mlx5_0/device/sriov_numvfs Stack trace output: [ 2084.079567] mlx5_core 0004:01:04.0: Using 64-bit DMA iommu bypass [ 2084.079575] ------------[ cut here ]------------ [ 2084.079583] WARNING: CPU: 120 PID: 734 at /build/linux-TAhFXm/linux-4.10.0/arch/powerpc/platforms/powernv/npu-dma.c:78 pnv_pci_get_npu_dev+0x40/0xb0 [ 2084.079584] Modules linked in: mst_pciconf(OE) mst_pci(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp kvm_hv kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rdma_ucm(OE) ib_ucm(OE) ib_ipoib(OE) ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx4_ib(OE) binfmt_misc bridge stp llc ipmi_powernv ipmi_devintf ipmi_msghandler powernv_rng powernv_op_panel uio_pdrv_genirq leds_powernv uio ibmpowernv vmx_crypto sunrpc ib_iser(OE) rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_core(OE) configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi knem(OE) ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx [ 2084.079640] xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx4_en(OE) ses enclosure scsi_transport_sas crc32c_vpmsum tg3 mlx5_core(OE) mlx4_core(OE) ipr devlink mlx_compat(OE) [ 2084.079658] CPU: 120 PID: 734 Comm: kworker/120:0 Tainted: G W OE 4.10.0-26-generic #30-Ubuntu [ 2084.079663] Workqueue: events work_for_cpu_fn [ 2084.079665] task: c000000fee60dc00 task.stack: c000000fee534000 [ 2084.079666] NIP: c00000000009c210 LR: c00000000009d404 CTR: 0000000000000000 [ 2084.079668] REGS: c000000fee537700 TRAP: 0700 Tainted: G W OE (4.10.0-26-generic) [ 2084.079669] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> [ 2084.079677] CR: 42004428 XER: 20000000 [ 2084.079678] CFAR: c00000000009d400 SOFTE: 1 GPR00: c00000000009d404 c000000fee537980 c00000000145d100 0000000000000000 GPR04: 0000000000000000 0000000000000aa6 c000001fff700000 0000000000049188 GPR08: 0000000000000007 0000000000000001 0000000000000001 0000000000000000 GPR12: 0000000000002200 c00000000fbc3800 c00000000010ef48 c000000ff70ec540 GPR16: c000000ffa622c58 c000000ffa622a10 c000000ffa6229a0 0000000000000001 GPR20: 0000000000000000 c000000001318de8 c000000000d700e8 0000000000000001 GPR24: c000000000d6f070 c000000000d6f050 c000000003d02000 c000000003d02098 GPR28: c000000e92680060 0800001fffffffff ffffffffffffffff 0000000000000000 [ 2084.079702] NIP [c00000000009c210] pnv_pci_get_npu_dev+0x40/0xb0 [ 2084.079704] LR [c00000000009d404] pnv_npu_try_dma_set_bypass+0x144/0x250 [ 2084.079705] Call Trace: [ 2084.079708] [c000000fee5379b0] [c00000000009d404] pnv_npu_try_dma_set_bypass+0x144/0x250 [ 2084.079710] [c000000fee537a80] [c000000000096c74] pnv_pci_ioda_dma_set_mask+0xa4/0x150 [ 2084.079714] [c000000fee537b00] [c0000000000291a0] dma_set_mask+0x40/0xc0 [ 2084.079728] [c000000fee537b20] [d0000000143531e4] init_one+0x33c/0x6a0 [mlx5_core] [ 2084.079732] [c000000fee537bd0] [c00000000066ba9c] local_pci_probe+0x6c/0x140 [ 2084.079734] [c000000fee537c60] [c0000000001016b8] work_for_cpu_fn+0x38/0x60 [ 2084.079737] [c000000fee537c90] [c0000000001061a0] process_one_work+0x2b0/0x5a0 [ 2084.079740] [c000000fee537d20] [c000000000106780] worker_thread+0x2f0/0x650 [ 2084.079742] [c000000fee537dc0] [c00000000010f0a4] kthread+0x164/0x1b0 [ 2084.079746] [c000000fee537e30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74 [ 2084.079747] Instruction dump: [ 2084.079748] 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c690074 7929d182 0b090000 2fa30000 [ 2084.079753] 419e0060 e8630330 7c690074 7929d182 <0b090000> 2fa30000 419e0048 7c852378 [ 2084.079759] ---[ end trace 7bf01a937efd69d8 ]--- [ 2084.080096] mlx5_core 0004:01:04.0: firmware version: 12.20.1010 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-power-systems/+bug/1702768/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp