[Bug tree-optimization/88492] SLP optimization generates ugly code

rguenth at gcc dot gnu.org Fri, 12 Jul 2019 04:13:43 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88492


--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Yeah.  Again, on x86 with -mavx2 we now have right after late FRE:

  <bb 2> [local count: 214748371]:
  vect__1.6_64 = MEM <vector(16) unsigned char> [(unsigned char *)b_24(D)];
  vect__1.7_63 = VEC_PERM_EXPR <vect__1.6_64, vect__1.6_64, { 0, 2, 1, 3, 4, 6,
5, 7, 8, 10, 9, 11, 12, 14, 13, 15 }>;
  vect__2.9_62 = [vec_unpack_lo_expr] vect__1.7_63;
  vect__2.9_59 = [vec_unpack_hi_expr] vect__1.7_63;
  vect__2.8_57 = [vec_unpack_lo_expr] vect__2.9_62;
  vect__2.8_56 = [vec_unpack_hi_expr] vect__2.9_62;
  vect__2.8_55 = [vec_unpack_lo_expr] vect__2.9_59;
  vect__2.8_54 = [vec_unpack_hi_expr] vect__2.9_59;
  MEM <vector(4) unsigned int> [(unsigned int *)&tmp] = vect__2.8_57;
  MEM <vector(4) unsigned int> [(unsigned int *)&tmp + 16B] = vect__2.8_56;
  MEM <vector(4) unsigned int> [(unsigned int *)&tmp + 32B] = vect__2.8_55;
  MEM <vector(4) unsigned int> [(unsigned int *)&tmp + 48B] = vect__2.8_54;
  vectp_b.4_65 = b_24(D) + 16;
  _8 = BIT_FIELD_REF <vect__2.8_57, 32, 0>;
  _22 = BIT_FIELD_REF <vect__2.8_56, 32, 0>;
  _30 = _8 + _22;
  _14 = BIT_FIELD_REF <vect__2.8_55, 32, 0>;
  _81 = BIT_FIELD_REF <vect__2.8_54, 32, 0>;
  _43 = _14 + _30;
  _45 = _43 + _81;
  sum_34 = (int) _45;
  _58 = BIT_FIELD_REF <vect__2.8_57, 32, 32>;
  _38 = BIT_FIELD_REF <vect__2.8_56, 32, 32>;
  _72 = _38 + _58;
  _68 = BIT_FIELD_REF <vect__2.8_55, 32, 32>;
  _53 = BIT_FIELD_REF <vect__2.8_54, 32, 32>;
  _29 = _68 + _72;
  _31 = _29 + _53;
  _7 = _31 + _45;
  sum_61 = (int) _7;
  _47 = BIT_FIELD_REF <vect__2.8_57, 32, 64>;
  _88 = BIT_FIELD_REF <vect__2.8_56, 32, 64>;
  _44 = _47 + _88;
  _74 = _44 + _90;
  _73 = _74 + _92;
  _83 = _7 + _73;
  sum_84 = (int) _83;
  _94 = BIT_FIELD_REF <vect__2.8_57, 32, 96>;
  _96 = BIT_FIELD_REF <vect__2.8_56, 32, 96>;
  _71 = _94 + _96;
  _98 = BIT_FIELD_REF <vect__2.8_55, 32, 96>;
  _100 = BIT_FIELD_REF <vect__2.8_54, 32, 96>;
  _70 = _71 + _98;
  _46 = _70 + _100;
  _18 = _46 + _83;
  sum_27 = (int) _18;
  tmp ={v} {CLOBBER};
  return sum_27;

I know the "real" code this testcase is from has actual operations
in place of the b[N] reads, for the above vectorization looks somewhat
pointless given we end up decomposing the result again.

So the appropriate fix would of course be to vectorize the reduction
loop (but that hits the sign-changing reduction issue).

[Bug tree-optimization/88492] SLP optimization generates ugly code

Reply via email to