NoQ added a comment.

Nice catch but I don't think this is the right solution.

Symbol liveness corresponds to a concrete runtime property of the program: can 
the program obtain this value by executing further runtime instructions? The 
program cannot obtain a pointer to variable `n`, or even a pointer to a region 
`*n`, just by looking at the data in a struct that was copied from region `*n` 
previously. Therefore neither `n` nor `*n` should not be marked live just 
because they are mentioned in a base region of a lazy compound value. In 
particular, memory leaks should be reported against `n` if the last place it's 
mentioned is a base region of a lazy compound value, as the program cannot 
possibly free that memory. You can most likely write a MallocChecker test for 
that, which will become a false negative after your patch.

For the same reason, for any region `R`, if `reg<R>` or `derived<R, $x>` is 
live, does not mean `R` is live. //The program cannot guess a memory address by 
only looking at a value loaded from that address.//

If you look at how a simpler example works:

  void clang_analyzer_warnIfReached();
  
  void foo(int **n) {
    int *n2 = *n;
    if (!**n) {
      if (*n2) {
        clang_analyzer_warnIfReached();
      }
    }
  }

You will notice that it does not try to keep VarRegion `n` or even the symbol 
`reg_$0<int ** n>` alive. It is *sufficient* to keep `reg_$1<int * 
Element{SymRegion{reg_$0<int ** n>},0 S64b,int *}>` alive in order to keep the 
test passing, and it does match the actual definition of live symbol (`reg_$0` 
can no longer be retrieved by the program but `reg_$1` can).

The original code tries to do the same for lazy compound values. I suspect that 
you'll find the actual bug when you descend into `getInterestingValues()`.

What looks fishy about `getInterestingValues()` is that it assumes that the 
amount of interesting values is finite. This sounds incredibly wrong to me. If 
a lazy compound value contains any pointer symbol `$p`, then all values in the 
following infinite series are interesting:

  $p,  *$p,  **$p,  ***$p,  ...

So I think the correct solution would be to maintain a list of explicitly-live 
compound values. Just like we already maintain a list of explicitly live 
symbols - `TheLiving`, and explicitly live regions - `RegionRoots`; neither of 
these lists enumerates all live symbols or regions as there are infinitely many 
of them, but each of these lists enumerates the ones that we've learned to be 
definitely live, and they're sufficient to determine liveness of any other 
symbol through recursive descent into the identity of that symbol. So with a 
list of live compound value regions, we can ask a question like "is this symbol 
stored in any of the known-live compound values?" to determine that symbol's 
liveness. This will keep the stored symbol alive without making the lazy 
compound value's base region alive.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132236/new/

https://reviews.llvm.org/D132236

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to