Richard Sandiford <richard.sandif...@arm.com> writes: > Tamar Christina <tamar.christ...@arm.com> writes: >> Hi All, >> >> This adds an RTL pattern for when two NARROWB instructions are being combined >> with a PACK. The second NARROWB is then transformed into a NARROWT. >> >> For the example: >> >> void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) >> { >> for (int i = 0; i < (n & -16); i+=1) >> pixel[i] += (pixel[i] * level) / 0xff; >> } >> >> we generate: >> >> addhnb z6.b, z0.h, z4.h >> addhnb z5.b, z1.h, z4.h >> addhnb z0.b, z0.h, z6.h >> addhnt z0.b, z1.h, z5.h >> add z0.b, z0.b, z2.b >> >> instead of: >> >> addhnb z6.b, z1.h, z4.h >> addhnb z5.b, z0.h, z4.h >> addhnb z1.b, z1.h, z6.h >> addhnb z0.b, z0.h, z5.h >> uzp1 z0.b, z0.b, z1.b >> add z0.b, z0.b, z2.b >> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> >> Ok for master? >> >> Thanks, >> Tamar >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-sve2.md (*aarch64_sve_pack_<sve_int_op><mode>): >> New. >> * config/aarch64/iterators.md (binary_top): New. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/vect/vect-div-bitmask-4.c: New test. >> * gcc.target/aarch64/sve2/div-by-bitmask_2.c: New test. >> >> --- inline copy of patch -- >> diff --git a/gcc/config/aarch64/aarch64-sve2.md >> b/gcc/config/aarch64/aarch64-sve2.md >> index >> ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98ad3c41e5d05d8cf38 >> 100644 >> --- a/gcc/config/aarch64/aarch64-sve2.md >> +++ b/gcc/config/aarch64/aarch64-sve2.md >> @@ -1600,6 +1600,25 @@ (define_insn "@aarch64_sve_<sve_int_op><mode>" >> "<sve_int_op>\t%0.<Ventype>, %2.<Vetype>, %3.<Vetype>" >> ) >> >> +(define_insn_and_split "*aarch64_sve_pack_<sve_int_op><mode>" >> + [(set (match_operand:<VNARROW> 0 "register_operand" "=w") >> + (unspec:<VNARROW> >> + [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w") > > "0" would be safer, in case the instruction is only split after RA. > >> + (subreg:SVE_FULL_HSDI (unspec:<VNARROW> >> + [(match_operand:SVE_FULL_HSDI 2 "register_operand" "w") >> + (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")] >> + SVE2_INT_BINARY_NARROWB) 0)] >> + UNSPEC_PACK))] > > I think ideally this would be the canonical pattern, so that we can > drop the separate top unspecs. That's more work though, and would > probably make sense to do once we have a generic way of representing > the pack. > > So OK with the "0" change above.
Hmm, actually, I take that back. Is this transform really correct? I think the blend corresponds to a TRN1 rather than a UZP1. The bottom operations populate the lower half of each wider element and the top operations populate the upper half. Thanks, Richard