Hi folks.

I have a pending patch for the path solver that pulls in global ranges when available (the stuff in SSA_NAME_RANGE_INFO). In doing so, I have run into a regression I was hoping the loop experts could pontificate on.

The regression comes from the simple_reduc_1() function in tree-ssa/loop-interchange-9.c, and it involves the following path:

=========== BB 5 ============
Imports: n_10(D)  j_19
Exports: n_10(D)  j_13  j_19
         j_13 : j_19(I)
n_10(D) int [1, 257]
j_19    int [0, 256]
Relational : (j_13 > j_19)
    <bb 5> [local count: 118111600]:
    # sum_21 = PHI <sum_14(4), sum_11(3)>
    c[j_19] = sum_21;
    j_13 = j_19 + 1;
    if (n_10(D) > j_13)
      goto <bb 9>; [89.00%]
    else
      goto <bb 6>; [11.00%]

j_13 : int [1, 257]
5->9  (T) n_10(D) :  int [2, 257]
5->9  (T) j_13 :     int [1, 256]
5->9  (T) j_19 :     int [0, 255]
5->6  (F) n_10(D) :  int [1, 257]
5->6  (F) j_13 :     int [1, 257]
5->6  (F) j_19 :     int [0, 256]

=========== BB 9 ============
n_10(D) int [2, 257]
j_13    int [1, 256]
Relational : (n_10(D) > j_19)
Relational : (n_10(D) > j_13)
    <bb 9> [local count: 105119324]:
    goto <bb 3>; [100.00%]


=========== BB 3 ============
Imports: n_10(D)
Exports: n_10(D)
n_10(D) int [1, +INF]
    <bb 3> [local count: 118111600]:
    # j_19 = PHI <j_13(9), 0(7)>
    sum_11 = c[j_19];
    if (n_10(D) > 0)
      goto <bb 8>; [89.00%]
    else
      goto <bb 5>; [11.00%]

With knowledge from previous passes (SSA_NAME_RANGE_INFO), we know that the loop cannot legally iterate outside the size of c[256]. So, j_13 lies within [1, 257] and n_10 is [2, +INF] at the end of the path. This allows us to thread BB3 to BB8. Doing so, causes the test to fail here:

/* { dg-final { scan-tree-dump-times "Loop_pair<outer:., inner:.> is interchanged" 1 "linterchange" } } */

All the blocks lie within the same loop, and the path passes all the tests in path_profitable_p().

Is there something about the shape of this path that should make it invalid in the backward threader, or should the test be marked with -fdisable-tree-threadN (or something else entirely)? Note that the backward threader is allowed to peel through loop headers.

I am attaching the gimple dump as well as the graphviz output for easier analysis.

Thanks.
Aldy
__attribute__((noinline))
void simple_reduc_1 (int n)
{
  int i;
  int sum;
  int j;
  int _1;
  int _2;
  int _3;

  <bb 2> [local count: 14598063]:
  if (n_10(D) > 0)
    goto <bb 7>; [89.00%]
  else
    goto <bb 6>; [11.00%]

  <bb 7> [local count: 12992276]:

  <bb 3> [local count: 118111600]:
  # j_19 = PHI <j_13(9), 0(7)>
  sum_11 = c[j_19];
  if (n_10(D) > 0)
    goto <bb 8>; [89.00%]
  else
    goto <bb 5>; [11.00%]

  <bb 8> [local count: 105119324]:

  <bb 4> [local count: 955630225]:
  # sum_20 = PHI <sum_14(10), sum_11(8)>
  # i_22 = PHI <i_15(10), 0(8)>
  _1 = a[i_22][j_19];
  _2 = b[i_22][j_19];
  _3 = _1 * _2;
  sum_14 = _3 + sum_20;
  i_15 = i_22 + 1;
  if (n_10(D) > i_15)
    goto <bb 10>; [89.00%]
  else
    goto <bb 5>; [11.00%]

  <bb 10> [local count: 850510901]:
  goto <bb 4>; [100.00%]

  <bb 5> [local count: 118111600]:
  # sum_21 = PHI <sum_14(4), sum_11(3)>
  c[j_19] = sum_21;
  j_13 = j_19 + 1;
  if (n_10(D) > j_13)
    goto <bb 9>; [89.00%]
  else
    goto <bb 6>; [11.00%]

  <bb 9> [local count: 105119324]:
  goto <bb 3>; [100.00%]

  <bb 6> [local count: 14598063]:
  return;

}

Attachment: graph.ps
Description: PostScript document

Reply via email to