martong added a comment. Alright, I see your point. I agree that solving the problem of "$a +$b +$c constrained and then $a + $c got constrained" requires canonicalization. However, canonicalization seems to be not trivial to implement. And there are some other easier cases that I think we could (should) solve before starting to implement canonicalization.
Parent map is just an optimization. The functionality should be working even if we apply brute-force to go through all existing constraints. Because of the above, I suggest the following steps. These steps could be implemented as individual patches that depend on each other and once all of them accepted then we could merge them in one lock-step. 1. Update `setConstraint` to simplify existing constraints (and adding the simplified constraint) when a new constraint is added. In this step we'd just simply iterate over all existing constraints and would try to simplfy them with `simplifySVal`. This would solve the simplest cases where we have two symbols in the tree. E.g. these cases would pass: int test_rhs_further_constrained(int x, int y) { if (x + y != 0) return 0; if (y != 0) return 0; clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}} clang_analyzer_eval(y == 0); // expected-warning{{TRUE}} return 0; } int test_lhs_and_rhs_further_constrained(int x, int y) { if (x % y != 1) return 0; if (x != 1) return 0; if (y != 2) return 0; clang_analyzer_eval(x % y == 1); // expected-warning{{TRUE}} clang_analyzer_eval(y == 2); // expected-warning{{TRUE}} return 0; } 2. Add the capability to simplify more complex constraints, where there are 3 symbols in the tree. In this change I suggest the extension and overhaul of `simplifySVal`. This change would make the following cases to pass (notice the 3rd check in each test-case). int test_left_tree_constrained(int x, int y, int z) { if (x + y + z != 0) return 0; if (x + y != 0) return 0; clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}} clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}} clang_analyzer_eval(z == 0); // expected-warning{{TRUE}} x = y = z = 1; return 0; } int test_right_tree_constrained(int x, int y, int z) { if (x + (y + z) != 0) return 0; if (y + z != 0) return 0; clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}} clang_analyzer_eval(y + z == 0); // expected-warning{{TRUE}} clang_analyzer_eval(x == 0); // expected-warning{{TRUE}} return 0; } 3. Add canonical trees to the solver. The way we should do this is to build "flat" sub-trees of associative operands. E.g. `b + a + c` becomes `+(a, b, c)`, i.e. a tree with a root and 3 children [1]. The ordering of the children could be done by their addresses (`Stmt *`) and we could simply store them in an array. Probably we'll need a new mapping: SymbolRef -> CanonicalTree. When a new constraint is added then we build up its canonical tree (if not existing yet) Then, we check each existing constraints' canonical tree to check whether the newly constrained expression is a sub-expression of that. (The check could be a simple recursively descending search in the children arrays of the nodes.) If it is a sub-expression then we simplify that with a customized canonical tree visitor that will call evalBinOp appropriately. In this step we should handle only `+` and `*` This would pass the below tests: int test_left_tree_constrained_commutative(int x, int y, int z) { if (x + y + z != 0) return 0; if (y + x != 0) return 0; clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}} clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}} clang_analyzer_eval(z == 0); // expected-warning{{TRUE}} return 0; } int test_associative(int x, int y, int z) { if (x + y + z != 0) return 0; if (x + z != 0) return 0; clang_analyzer_printState(); clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}} clang_analyzer_eval(x + z == 0); // expected-warning{{TRUE}} clang_analyzer_eval(y == 0); // expected-warning{{TRUE}} return 0; 4. Extend 3) with `-` and `/`. `lhs - rhs` becomes a `lhs + (-rhs)` and `lhs /rhs` is substituted wit `lhs * (1/rhs)`. 5. There are still some expressions that the solver will not be able to reason about (e.g. `<<`), but it may happen that we store a constraint for them. In these cases it is a good idea to search for the expression in the constraint map, but without constant folding. This functionality might be solved by step 2). If not, then https://reviews.llvm.org/D102696 could be a good alternative. 6. Do performance measurements and enhance perhaps with parent maps. I am planning to upload a patch for steps 1) and 2) very soon. [1] Hans Vangheluwe et al, An algorithm to implement a canonical representation of algebraic expressions and equations in AToM3 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D102696/new/ https://reviews.llvm.org/D102696 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits