Hi Thomas, > > Below is my conclusion for this bug. > An expert of x86 is required to follow-up. > > Summary: > - CPU: Intel Skylake > - Linux environment: Ubuntu 18.04 > - Compiler: GCC 7 or 8 > - Scenario: testpmd crashes when it starts forwarding > - Behaviour: AVX2 version of rte_memcpy() fails if optimized for AVX512 > - Context: inline rte_memcpy() is called from > inline rte_mempool_put_bulk(), called from > mlx5_tx_complete() (inline or not) > - Analysis: AVX512 optimization changes vmovdqu to vmovdqu8 > > Latest status can be found in Bugzilla: > https://bugs.dpdk.org/show_bug.cgi?id=97#c35
Looking at dissamled output from the bug report, it seems that the problem is not in vmovdqu8 instruction itself, but in the wrong offsets generated by the compiler: vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2] vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1 vmovups XMMWORD PTR [rsi+0x20],xmm0 vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1 vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4] vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1 vmovups XMMWORD PTR [rsi+0x40],xmm0 vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1 vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6] Should be: vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x20] I think. Same for next two offsets: 0x4 and 0x6 respectively should be 0x40 and 0x60. Not sure what causing compiler behaves that way. BTW, looking though testpmd objdump output - it seems that only mlx5 driver exhibits such problem (I didn't enable mlx4 actually, probably same problem here). Which looks a bit weird to me. Konstantin