On Monday 31 October 2005 18:49, Jeffrey A Law wrote: > Thoughts? > I'm not sure this would buy you much better precision. I was tinkering with PR 18501 a few days ago. This is one of those cases where the optimizers (CCP in this case) remove the code that we were supposed to warn about. It goes something like this (all variables are locals):
bitmap_print_value_set () { # first_1 = PHI <first_2(0), first_4(3)>; <L3>:; D.1286_3 = bmp_iter_set (); if (D.1286_3 != 0) goto <L0>; else goto <L4>; <L0>:; if (first_1 == 0) goto <L1>; else goto <L2>; <L1>:; something (); <L2>:; first_4 = 0; goto <L3>; <L4>:; return; } To prevent losing location information for the warning, I had modified the propagation engine to warn as it folded the expression away. In this case, we were missing the warning because constant propagation was merging an UNDEFINED value (first_2) with a CONSTANT (first_4 == 0), which yields the constant because we explicitly ignore undefined values on locals. So, as we fold 'if (first_1 == 0)' into 'if (1)', the patch emits the warning about the possibly uninitialized value of 'first'. My approach improved precision of the warning because we are able to indicate exactly what statement is doing the uninitialized reference. However, there are two huge problems with the approach: (1) it is intrusive. Passes are required to keep track of their actions, in this case, CCP would inform the propagation engine that it had called the merge operator with an uninitialized value. (2) it introduces false positives: sub () { i_3 = 0; j_4 = 0; # k_2 = PHI <k_5(0), k_9(1)>; # i_1 = PHI <i_3(0), i_10(1)>; <L1>:; D.1285_6 = i_1 | j_4; if (D.1285_6 == 0) goto <L0>; else goto <L2>; <L0>:; k_9 = 10; i_10 = sub (); goto <L1>; <L2>:; return k_2; } Again, when CCP evaluates k_2 = PHI <k_5, k_9>, it will merge the undefined name k_5 with the constant value k_9 == 10. So, when we replace 'return k_2' with 'return 10' in <L2>, we will warn that 'k' may be used uninitialized. However, in this case the loop is guaranteed to execue at least once, so the undefined value k_5 is never actually used by the program. Unless I'm misunderstanding your approach, it will have the same problem. The early pass will have marked the PHI node 'k_2 = PHI <k_5, k_9>' as problematic. Depending on when the second pass runs, it will not have a chance of marking it as problematic, and we will emit the warning. The thing here is that both test cases are structurally similar. In both cases we are disregarding an undefined value during optimization. We don't actually know whether that undefined value will be executed or not. In the first test case, it will be. In the second test case, it will not (k_2 is guaranteed to get its value from k_9 the first time). A more robust solution would have to involve some kind of symbolic processing on our part. Mechanically noticing whether we see undefined names in PHI nodes will always have this problem. We could implement some variant of Gated SSA to record predicates in PHI nodes. Uses of potentially uninitialized names would have to be analyzed for two things: (1) if the use is protected by a predicate with the same value number as the predicate protecting a real definition, then you don't need to warn. Assume x_0 is undefined and that pred_5 and pred_9 have the same value number. if (pred_5) x_5 = 3; x_6 = PHI <x_0, x_5> ... if (pred_9) ... = x_6; (2) if we can guarantee that at least one of the PHI arguments with real definitions will execute at least once, then we don't need to warn. This is the case in the second test case I described above. Note, however, that this approach is not foolproof either and I suspect it would not be trivial to implement. I think it would allow us to not be at the mercy of the order in which optimizations are executed, but it can be easily confused in the presence of aliasing and other nasties that prevent you from computing good value numbers or execution paths. I suspect that in the extreme, this problem can be reduced to the stopping problem.