> 11/11/2018 15:15, Ananyev, Konstantin: > > Hi Thomas, > > > > > Below is my conclusion for this bug. > > > An expert of x86 is required to follow-up. > > > > > > Summary: > > > - CPU: Intel Skylake > > > - Linux environment: Ubuntu 18.04 > > > - Compiler: GCC 7 or 8 > > > - Scenario: testpmd crashes when it starts forwarding > > > - Behaviour: AVX2 version of rte_memcpy() fails if optimized for AVX512 > > > - Context: inline rte_memcpy() is called from > > > inline rte_mempool_put_bulk(), called from > > > mlx5_tx_complete() (inline or not) > > > - Analysis: AVX512 optimization changes vmovdqu to vmovdqu8 > > > > > > Latest status can be found in Bugzilla: > > > https://bugs.dpdk.org/show_bug.cgi?id=97#c35 > > > > > > Looking at dissamled output from the bug report, it seems that the > > problem is not in vmovdqu8 instruction itself, but in the wrong offsets > > generated by the compiler: > > > > vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2] > > vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1 > > vmovups XMMWORD PTR [rsi+0x20],xmm0 > > vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1 > > vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4] > > vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1 > > vmovups XMMWORD PTR [rsi+0x40],xmm0 > > vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1 > > vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6] > > > > Should be: > > vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x20] > > I think. > > > > Same for next two offsets: 0x4 and 0x6 respectively should be 0x40 and 0x60. > > Yes, you're right, I missed it, thank you! > > The full diff is below: > > --- bad-avx512-enabled > +++ good-avx512-disabled > - vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x0] > + vmovdqu xmm0,XMMWORD PTR [rax*8+0x0] > vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x10],0x1 > vmovups XMMWORD PTR [rsi],xmm0 > vextracti128 XMMWORD PTR [rsi+0x10],ymm0,0x1 > - vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2] > + vmovdqu xmm0,XMMWORD PTR [rax*8+0x20] > vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1 > vmovups XMMWORD PTR [rsi+0x20],xmm0 > vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1 > - vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4] > + vmovdqu xmm0,XMMWORD PTR [rax*8+0x40] > vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1 > vmovups XMMWORD PTR [rsi+0x40],xmm0 > vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1 > - vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6] > + vmovdqu xmm0,XMMWORD PTR [rax*8+0x60] > vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x70],0x1 > vmovups XMMWORD PTR [rsi+0x60],xmm0 > vextracti128 XMMWORD PTR [rsi+0x70],ymm0,0x1 > > > Not sure what causing compiler behaves that way. > > BTW, looking though testpmd objdump output - it seems that only mlx5 driver > > exhibits such problem (I didn't enable mlx4 actually, probably same problem > > here). > > Which looks a bit weird to me. > > Yes it's weird. I don't see how the mlx5 code could influence > the compiler to generate this bad code in AVX512 mode.
Same here, looked through mlx5_rxtx code, it is unclear to me what triggers the issue. So far looks like gcc bug to me. Konstantin