On Wed, Oct 12, 2016 at 9:12 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, Oct 11, 2016 at 5:03 PM, Bin Cheng <bin.ch...@arm.com> wrote: >> Hi, >> Given below test case, >> int foo (unsigned short a[], unsigned int x) >> { >> unsigned int i; >> for (i = 0; i < 1000; i++) >> { >> x = a[i]; >> a[i] = (unsigned short)(x >= 32768 ? x - 32768 : 0); >> } >> return x; >> } >> >> it now can be vectorized on AArch64, but generated assembly is way from >> optimal: >> .L4: >> ldr q4, [x3, x1] >> add w2, w2, 1 >> cmp w2, w0 >> ushll v1.4s, v4.4h, 0 >> ushll2 v0.4s, v4.8h, 0 >> add v3.4s, v1.4s, v6.4s >> add v2.4s, v0.4s, v6.4s >> cmhi v1.4s, v1.4s, v5.4s >> cmhi v0.4s, v0.4s, v5.4s >> and v1.16b, v3.16b, v1.16b >> and v0.16b, v2.16b, v0.16b >> xtn v2.4h, v1.4s >> xtn2 v2.8h, v0.4s >> str q2, [x3, x1] >> add x1, x1, 16 >> bcc .L4 >> >> The vectorized loop has 15 instructions, which can be greatly simplified by >> turning cond_expr into max_expr, as below: >> .L4: >> ldr q1, [x3, x1] >> add w2, w2, 1 >> cmp w2, w0 >> umax v0.8h, v1.8h, v2.8h >> add v0.8h, v0.8h, v2.8h >> str q0, [x3, x1] >> add x1, x1, 16 >> bcc .L4 >> >> This patch addresses the issue by adding new vectorization pattern. >> Bootstrap and test on x86_64 and AArch64. Is it OK? > > So the COND_EXPRs are generated this way by if-conversion, right? I Though ?: is used in source code, yes, it is if-conv regenerated COND_EXPR. > believe that > the MAX/MIN_EXPR form is always preferrable and thus it looks like > if-conversion > might want to either directly generate it or make sure to fold the > introduced stmts > (and have a match.pd pattern catching this). Hmm, I also noticed saturation cases which should be better transformed before vectorization in scalar optimizers. But this case is a bit different because there is additional computation involved other than type conversion. We need to prove the computation can be done in either large or small types. It is quite specific case and I don't see good (general) solution in if-conv. Vect-pattern looks like a natural place doing this. I am also looking at general saturation cases, but this one is different?
Thanks, bin > > Richard. > >> Thanks, >> bin >> >> 2016-10-11 Bin Cheng <bin.ch...@arm.com> >> >> * tree-vect-patterns.c (vect_recog_min_max_modify_pattern): New. >> (vect_vect_recog_func_ptrs): New element for above pattern. >> * tree-vectorizer.h (NUM_PATTERNS): Increase by 1. >> >> gcc/testsuite/ChangeLog >> 2016-10-11 Bin Cheng <bin.ch...@arm.com> >> >> * gcc.dg/vect/vect-umax-modify-pattern.c: New test. >> * gcc.dg/vect/vect-umin-modify-pattern.c: New test.