Hi! As the vgather* insns are designed to support both unconditional and conditional gather loads, the current pattern consume the previous content of the destination register, so we end up with code like: vmovaps .LC0(%rip), %ymm0 vmovdqa .LC1(%rip), %ymm5 vmovdqa .LC2(%rip), %ymm4 .p2align 4,,10 .p2align 3 .L6: vmovdqa k(%rax,%rax), %ymm1 vmovaps %ymm0, %ymm6 vmovaps %ymm0, %ymm2 vmovdqa k+32(%rax,%rax), %ymm3 vgatherdps %ymm6, vf1(,%ymm1,4), %ymm2 vmovaps %ymm0, %ymm1 vmovaps %ymm0, %ymm6 vcvttps2dq %ymm2, %ymm2 vpshufb %ymm5, %ymm2, %ymm2 vgatherdps %ymm6, vf1(,%ymm3,4), %ymm1 ... note: each vgather* preceeded usually by two movaps, one copying usually before the loop computed/loaded mask of all ones and the other initializes the destination register. But with mask of all ones the whole destination register is overwritten unless there is a segfault, so IMNSHO at least for autovectorization it would be nice to just leave the content of the destination register undefined in case of a segfault. The only way users can see a difference is if a segfault happens and in a segfault handler they inspect the destination register or transfer control to the next insn from the segfault handler.
My question is about the avx2intrin.h intrinsics, in the AVX2 manual the insns are well documented, but there are no details about the intrinsics. There are 2 kind of intrinsics for gather, one without mask/src operands, one with them. So, my question is, for the intrinsics without mask/src operands, is it supposed to be well defined what dest register will contain after a segfault? Currently we load zeros into src, but would it be a valid optimization to just leave that register undefined in case of segfault? And, what about the other intrinsics if mask is known to be all ones? Can the compiler optimize this and assume the destination is just overwritten rather than being in/out operand? What could be done is during expansion check if mask has all high bits set and if so, just use different insn patterns that wouldn't consume the register with "0" constraint. Or have second set of compiler builtins that wouldn't have src/mask arguments. On large testcases (like Toon's weather forecast routine which has over 260 vgather* insns) this would allow us to get rid of one extra insn per vgather* insn. Jakub