On Monday 31 October 2005 18:49, Jeffrey A Law wrote:

> Thoughts?
>
I'm not sure this would buy you much better precision.  I was tinkering 
with PR 18501 a few days ago.  This is one of those cases where the 
optimizers (CCP in this case) remove the code that we were supposed to 
warn about.  It goes something like this (all variables are locals):

bitmap_print_value_set ()
{
  # first_1 = PHI <first_2(0), first_4(3)>;
<L3>:;
  D.1286_3 = bmp_iter_set ();
  if (D.1286_3 != 0) goto <L0>; else goto <L4>;

<L0>:;
  if (first_1 == 0) goto <L1>; else goto <L2>;

<L1>:;
  something ();

<L2>:;
  first_4 = 0;
  goto <L3>;

<L4>:;
  return;
}

To prevent losing location information for the warning, I had modified the 
propagation engine to warn as it folded the expression away.  In this 
case, we were missing the warning because constant propagation was merging 
an UNDEFINED value (first_2) with a CONSTANT (first_4 == 0), which yields 
the constant because we explicitly ignore undefined values on locals.

So, as we fold 'if (first_1 == 0)' into 'if (1)', the patch emits the 
warning about the possibly uninitialized value of 'first'.  My approach 
improved precision of the warning because we are able to indicate exactly 
what statement is doing the uninitialized reference.

However, there are two huge problems with the approach: (1) it is 
intrusive.  Passes are required to keep track of their actions, in this 
case, CCP would inform the propagation engine that it had called the merge 
operator with an uninitialized value.

(2) it introduces false positives:

sub ()
{
  i_3 = 0;
  j_4 = 0;

  # k_2 = PHI <k_5(0), k_9(1)>;
  # i_1 = PHI <i_3(0), i_10(1)>;
<L1>:;
  D.1285_6 = i_1 | j_4;
  if (D.1285_6 == 0) goto <L0>; else goto <L2>;

<L0>:;
  k_9 = 10;
  i_10 = sub ();
  goto <L1>;

<L2>:;
  return k_2;
}

Again, when CCP evaluates k_2 = PHI <k_5, k_9>, it will merge the undefined 
name k_5 with the constant value k_9 == 10.  So, when we replace 'return 
k_2' with 'return 10' in <L2>, we will warn that 'k' may be used 
uninitialized.  However, in this case the loop is guaranteed to execue at 
least once, so the undefined value k_5 is never actually used by the 
program.

Unless I'm misunderstanding your approach, it will have the same problem.  
The early pass will have marked the PHI node 'k_2 = PHI <k_5, k_9>' as 
problematic.  Depending on when the second pass runs, it will not have a 
chance of marking it as problematic, and we will emit the warning.

The thing here is that both test cases are structurally similar.  In both 
cases we are disregarding an undefined value during optimization.  We 
don't actually know whether that undefined value will be executed or not.  
In the first test case, it will be.  In the second test case, it will not 
(k_2 is guaranteed to get its value from k_9 the first time).

A more robust solution would have to involve some kind of symbolic 
processing on our part.  Mechanically noticing whether we see undefined 
names in PHI nodes will always have this problem.

We could implement some variant of Gated SSA to record predicates in PHI 
nodes.  Uses of potentially uninitialized names would have to be analyzed 
for two things:

(1) if the use is protected by a predicate with the same value number as 
the predicate protecting a real definition, then you don't need to warn.  
Assume x_0 is undefined and that pred_5 and pred_9 have the same value 
number.

        if (pred_5)
          x_5 = 3;
        x_6 = PHI <x_0, x_5>
        ...
        if (pred_9)
          ... = x_6;

(2) if we can guarantee that at least one of the PHI arguments with real 
definitions will execute at least once, then we don't need to warn.  This 
is the case in the second test case I described above.

Note, however, that this approach is not foolproof either and I suspect it 
would not be trivial to implement.  I think it would allow us to not be at 
the mercy of the order in which optimizations are executed, but it can be 
easily confused in the presence of aliasing and other nasties that prevent 
you from computing good value numbers or execution paths.  I suspect that 
in the extreme, this problem can be reduced to the stopping problem.

Reply via email to