[Bug target/101846] Improve __builtin_shufflevector emitted code

rguenth at gcc dot gnu.org via Gcc-bugs Tue, 10 Aug 2021 07:30:58 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101846


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2021-08-10
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  I've pondered (for elsewhere) about how to represent "paradoxical
subregs" on GIMPLE.  We expand from

v32hi foo (v16hi x)
{
  vector(32) short int _1;
  v32hi _3;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _1 = {x_2(D), { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }};
  _3 = VEC_PERM_EXPR <_1, { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, { 0, 32, 1, 33, 2, 34, 3, 35, 4,
36, 5, 37, 6, 38, 7, 39, 8, 40, 9, 41, 10, 42, 11, 43, 12, 44, 13, 45, 14, 46,
15, 47 }>;
  return _3;

and

v16hi bar (v32hi x)
{
  vector(32) short int _1;
  v16hi _3;

;;   basic block 2, loop depth 0
;;    pred:       ENTRY
  _1 = VEC_PERM_EXPR <x_2(D), x_2(D), { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20,
22, 24, 26, 28, 30, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31 }>;
  _3 = BIT_FIELD_REF <_1, 256, 0>;
  return _3;

I think bar() is reasonable from the GIMPLE side, it would be a 1:1
canonicalization choice to move the BIT_FIELD_REF across the permute
(and something only "profitable" for single operand permutes).

For foo() I thought of doing

 _1 = BIT_INSERT_EXPR <tem_3(D), x_2(D), 0>;

with tem_3(D) being uninitialized as to represent a paradoxical subreg.
I've tested and disregarded the idea of simply doing VIEW_CONVERT_EXPRs
here but I'm considering it for the case where we need the lowpart
of a vector and the the highpart doesn't matter (aka %xmm0 vs %ymm0)
since the current representation of doing a BIT_FIELD_REF doesn't
seem to optimize well (that was in the context of AVX512 mask registers
though).

I suppose the testcases can be optimized on the RTL level as well.

[Bug target/101846] Improve __builtin_shufflevector emitted code

Reply via email to