On Wed, Oct 12, 2016 at 9:50 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Wed, Oct 12, 2016 at 10:29 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Wed, Oct 12, 2016 at 9:12 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Tue, Oct 11, 2016 at 5:03 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>> Hi, >>>> Given below test case, >>>> int foo (unsigned short a[], unsigned int x) >>>> { >>>> unsigned int i; >>>> for (i = 0; i < 1000; i++) >>>> { >>>> x = a[i]; >>>> a[i] = (unsigned short)(x >= 32768 ? x - 32768 : 0); >>>> } >>>> return x; >>>> } >>>> >>>> it now can be vectorized on AArch64, but generated assembly is way from >>>> optimal: >>>> .L4: >>>> ldr q4, [x3, x1] >>>> add w2, w2, 1 >>>> cmp w2, w0 >>>> ushll v1.4s, v4.4h, 0 >>>> ushll2 v0.4s, v4.8h, 0 >>>> add v3.4s, v1.4s, v6.4s >>>> add v2.4s, v0.4s, v6.4s >>>> cmhi v1.4s, v1.4s, v5.4s >>>> cmhi v0.4s, v0.4s, v5.4s >>>> and v1.16b, v3.16b, v1.16b >>>> and v0.16b, v2.16b, v0.16b >>>> xtn v2.4h, v1.4s >>>> xtn2 v2.8h, v0.4s >>>> str q2, [x3, x1] >>>> add x1, x1, 16 >>>> bcc .L4 >>>> >>>> The vectorized loop has 15 instructions, which can be greatly simplified >>>> by turning cond_expr into max_expr, as below: >>>> .L4: >>>> ldr q1, [x3, x1] >>>> add w2, w2, 1 >>>> cmp w2, w0 >>>> umax v0.8h, v1.8h, v2.8h >>>> add v0.8h, v0.8h, v2.8h >>>> str q0, [x3, x1] >>>> add x1, x1, 16 >>>> bcc .L4 >>>> >>>> This patch addresses the issue by adding new vectorization pattern. >>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>> >>> So the COND_EXPRs are generated this way by if-conversion, right? I >> Though ?: is used in source code, yes, it is if-conv regenerated COND_EXPR. >>> believe that >>> the MAX/MIN_EXPR form is always preferrable and thus it looks like >>> if-conversion >>> might want to either directly generate it or make sure to fold the >>> introduced stmts >>> (and have a match.pd pattern catching this). >> Hmm, I also noticed saturation cases which should be better >> transformed before vectorization in scalar optimizers. But this case >> is a bit different because there is additional computation involved >> other than type conversion. We need to prove the computation can be >> done in either large or small types. It is quite specific case and I >> don't see good (general) solution in if-conv. Vect-pattern looks like >> a natural place doing this. I am also looking at general saturation >> cases, but this one is different? > > (vect-patterns should go away ...) > > But as if-conversion results may also prevail for scalar code doing the > pattern in match.pd would be better - that is, "apply" the pattern > already during if-conversion. > > Yes, if-conversion fails to fold the stmts it generates (it only uses > generic folding on the trees it builds - it can need some TLC here). Hi, Sorry for being slow in replying, I looked into match.pd and can transform simpler cond_expr into minmax expr successfully, but this one is more complicated. It transforms 3 gimple statements into 2 result statements, but result of match&simplify pattern is an expression. How should I write the pattern outputing two gimple statement as result? Hmm, now I see the transform looks more like gimple combine...
Thanks, bin