Re: [PATCH] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p.

Marc Glisse Sat, 27 Jun 2020 07:51:39 -0700

On Fri, 26 Jun 2020, Jeff Law via Gcc-patches wrote:

In theory yes, but there are cases where paths converge (like you've shown) 
where
you may have evaluated to a constant on the paths, but it's not a constant at 
the
convergence point.  You have to be very careful using b_c_p like this and it's
been a regular source of kernel bugs.



I'd recommend looking at the .ssa dump and walk forward from there if the .ssa
dump looks correct.

Here is the last dump before thread1 (105t.mergephi2). I don't seeanything incorrect in it.


ledtrig_cpu (_Bool is_active)
{
  int old;
  int iftmp.0_1;
  int _5;

  <bb 2> [local count: 1073741824]:
  if (is_active_2(D) != 0)
    goto <bb 4>; [50.00%]
  else
    goto <bb 3>; [50.00%]

  <bb 3> [local count: 536870913]:

  <bb 4> [local count: 1073741824]:
  # iftmp.0_1 = PHI <1(2), -1(3)>
  _5 = __builtin_constant_p (iftmp.0_1);
  if (_5 != 0)
    goto <bb 5>; [50.00%]
  else
    goto <bb 8>; [50.00%]

  <bb 5> [local count: 536870913]:
  if (iftmp.0_1 >= -128)
    goto <bb 6>; [50.00%]
  else
    goto <bb 8>; [50.00%]

  <bb 6> [local count: 268435456]:
  if (iftmp.0_1 <= 127)
    goto <bb 7>; [34.00%]
  else
    goto <bb 8>; [66.00%]

  <bb 7> [local count: 91268056]:
  __asm__ __volatile__("asi %0,%1
" : "ptr" "=Q" MEM[(int *)&num_active_cpus] : "val" "i" iftmp.0_1, "Q" MEM[(int *)&num_active_cpus] 
: "memory", "cc");
  goto <bb 9>; [100.00%]

  <bb 8> [local count: 982473769]:
  __asm__ __volatile__("laa %0,%2,%1
" : "old" "=d" old_8, "ptr" "=Q" MEM[(int *)&num_active_cpus] : "val" "d" iftmp.0_1, "Q" MEM[(int 
*)&num_active_cpus] : "memory", "cc");

  <bb 9> [local count: 1073741824]:
  return;

}

There is a single _b_c_p, the immediate asm argument is exactly theargument of _b_c_p, and it is in the branch protected by _b_c_p.


Now the thread1 dump, for comparison

ledtrig_cpu (_Bool is_active)
{
  int old;
  int iftmp.0_4;
  int iftmp.0_6;
  int _7;
  int _12;
  int iftmp.0_13;
  int iftmp.0_14;

  <bb 2> [local count: 1073741824]:
  if (is_active_2(D) != 0)
    goto <bb 3>; [50.00%]
  else
    goto <bb 4>; [50.00%]

  <bb 3> [local count: 536870912]:
  # iftmp.0_6 = PHI <1(2)>
  _7 = __builtin_constant_p (iftmp.0_6);
  if (_7 != 0)
    goto <bb 6>; [50.00%]
  else
    goto <bb 8>; [50.00%]

  <bb 4> [local count: 536870912]:
  # iftmp.0_4 = PHI <-1(2)>
  _12 = __builtin_constant_p (iftmp.0_4);
  if (_12 != 0)
    goto <bb 5>; [50.00%]
  else
    goto <bb 8>; [50.00%]

  <bb 5> [local count: 268435456]:
  if (iftmp.0_4 >= -128)
    goto <bb 7>; [20.00%]
  else
    goto <bb 8>; [80.00%]

  <bb 6> [local count: 214748364]:
  if (iftmp.0_6 <= 127)
    goto <bb 7>; [12.00%]
  else
    goto <bb 8>; [88.00%]

  <bb 7> [local count: 91268056]:
  # iftmp.0_13 = PHI <iftmp.0_6(6), iftmp.0_4(5)>
  __asm__ __volatile__("asi %0,%1
" : "ptr" "=Q" MEM[(int *)&num_active_cpus] : "val" "i" iftmp.0_13, "Q" MEM[(int 
*)&num_active_cpus] : "memory", "cc");
  goto <bb 9>; [100.00%]

  <bb 8> [local count: 982473769]:
  # iftmp.0_14 = PHI <iftmp.0_4(5), iftmp.0_4(4), iftmp.0_6(6), iftmp.0_6(3)>
  __asm__ __volatile__("laa %0,%2,%1
" : "old" "=d" old_8, "ptr" "=Q" MEM[(int *)&num_active_cpus] : "val" "d" iftmp.0_14, "Q" MEM[(int 
*)&num_active_cpus] : "memory", "cc");

  <bb 9> [local count: 1073741824]:
  return;

}

Thread1 decides to separate the paths is_active and !is_active(surprisingly, for one it optimizes out the comparison <= 127 and for theother the comparison >= -128, while it could optimize both in both cases).And it decides to converge after the comparisons, but before the asm.

What the pass did does seem to hurt. It looks like if we duplicate _b_c_p,we may need to duplicate far enough to include all the blocks dominated by_b_c_p==true (including the asm, here). Otherwise, any _b_c_p can beoptimized to true, because for a boolean


b is the same as b ? true : false

__builtin_constant_p(b ? true : false) would be the same as b ?__builtin_constant_p(true) : __builtin_constant_p(false), i.e. true.

It is too bad we don't have any optimization pass using ranges between IPAand thread1, that would have gotten rid of the comparisons, and hence thetemptation to thread. Adding always_inline on atomic_add (or flatten onthe caller) does help: EVRP removes the comparisons.

Do you see a way forward without changing what thread1 does or declaringthe testcase as unsupported?


--
Marc Glisse

Re: [PATCH] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p.

Reply via email to