On Tue, May 19, 2020 at 10:48 AM Richard Biener <rguent...@suse.de> wrote: > > On Tue, 19 May 2020, Uros Bizjak wrote: > > > Hello! > > > > Attached patch adds missing vector zero/sign_extend expanders to allow > > vectorization of operations between different vector sizes. > > > > The patch regresses (progresses?): > > > > FAIL: gcc.target/i386/pr92645-4.c scan-tree-dump-times optimized > > "vec_unpack_lo" 3 > > > > but eyeballing the asm code before/after the patch, we get much better: > > > > .L3: > > - vmovdqu (%rsi,%rax), %xmm6 > > - vpxor %xmm5, %xmm5, %xmm5 > > - vmovdqa %ymm5, -32(%rsp) > > - vmovdqa %xmm6, -32(%rsp) > > - vpmovzxbw -32(%rsp), %ymm0 > > + vpmovzxbw (%rsi,%rax), %ymm0 > > vpmullw %ymm4, %ymm0, %ymm0 > > vpaddw %ymm2, %ymm0, %ymm0 > > vpsrlw $8, %ymm0, %ymm0 > > > > and even more differences to a much better code in the loop prologue. > > > > (Please note a strange double-save to a stack slot in the old code). > > > > Richi, I guess that the testcase you introduced needs some adjustment. > > I will deal with the FAIL once you commit the patch, the testcase > is for forwprop code which indeed also knows how to exercise those > missing patterns. IIRC I filed the PR when working on those > (and may in turn remove the VEC_UNPACK_* support from forwprop again!) > > > As discussed in the PR, there are a couple of XFAILs, where the > > compiler is not able to vectorize the code. The named expanders are > > there, but for the reason, explained in PR comment #8, middle-end > > doesn't exercise them. > > OK, so we should track this in a separate PR? Can you point to > the specific expander and the XFAILed testcases there?
Yes, I'll open a new PR and document the current limitation. I will CC you on the PR. Thanks, Uros.