https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98254

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, we have simplify_vector_constructor, but that handles only extractions from
some other vector, not reads from memory.

E.g. on -mavx2 -O2 -mtune=intel:
typedef int __attribute__((vector_size(32))) V;
typedef short __attribute__((vector_size(16))) W;

V
foo (short *a)
{
  return (V){a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]};
}

V
bar (int *a)
{
  return (V){a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]};
}

V
baz (short *b)
{
  W a = *(W *)b;
  return (V){a[0], a[1], a[2], a[3], a[4], a[5], a[6], a[7]};
}

V
qux (short *b)
{
  W a = *(W *)b;
  return __builtin_convertvector (a, V);
}

it triggers on baz, but emits worse code than qux, bar is after forwprop1 and
dce:
  W a;
  vector(8) short int _1;
  vector(16) short int _20;
  V _21;

  <bb 2> :
  _1 = MEM[(W *)b_19(D)];
  _20 = BIT_INSERT_EXPR <{ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
_1, 0>;
  _21 = [vec_unpack_lo_expr] _20;
  return _21;
and we don't really emit the code we want for the BIT_INSERT_EXPR.
I guess it might be better to canonicalize on .VEC_CONVERT for this.
And then for the memory reads, see if the target supports unaligned vector
loads and in that case also support MEM_REFs.

Reply via email to