steakhal added a comment. In D86874#2256249 <https://reviews.llvm.org/D86874#2256249>, @martong wrote:
>> Calculation of the RegionRawOffsetV2 >> ------------------------------------ >> >> Let's see how does these subscript expressions used during the calculation >> of the `RegionRawOffsetV2`: >> For multi-dimension arrays we //fold// the index expressions of the nested >> `ElementRegion`s. >> We are doing that by simply calculating the byte-offset of the nested >> region, and adding the current level's //index*sizeof(Type)//. >> So, the `ByteOffset` that we build up will contain mostly multiplications by >> a constant OR additions of symbolic expressions (like the `x+1` in the >> example). > > We have these lines in the case of handling the ElementRegion: > > if (!index.getAs<NonLoc>()) > return RegionRawOffsetV2(); > > So, if the index is symbolic we give up, don't we? So, how could this work > with `x+1`? What am I missing here? In my example, I'm speaking about the case when it's `nonloc::SymbolVal` wrapping the expression `x+1`. So not every `NonLoc` is `ConcreteInt`. >> The resulting `RegionRawOffsetV2` for `p`, `q` will be: >> p: {BaseRegion: `buf`, ByteOffset: `20 S64b`} >> q: {BaseRegion: `buf`, ByteOffset: `(((reg_$0<int x>) + 1) * 12) + 24`} >> ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^ >> | | >> `@a` This is an //object-language// expression. --/ | >> / >> `@b` The rest should be //meta-language// expression. -------/ >> >> SPOILER: This distinction is really important here. >> >> So, we made an expression, we should not evaluate in either of the worlds. >> We should not evaluate it using the semantics of the //object-language// >> since only the `@a` is of that world. >> We should model overflow in `@a` if that expression is of //unsigned// type >> (OR signed but `-fwrapv`...) etc. according to the //abstract machine//. >> BUT modeling the possibility of an overflow of the rest of the expression >> (`@b`) would be a flaw. > > Why? I'd expect that the value of the whole expression `@a@b` could overflow. Hm, you might be right about that. This //simplification// (over the plus operator) is only valid if no overflow can happen. We could add the necessary constraints during this constant folding operation such as: Simplifying the expression `x+1 < 0` into `x<-1` //assuming that// `x <= numeric_limit_max(typeof(x)) - 1` We could repeat this, while we have a //state// where all such assumptions are true. I'm looking forward to implementing it. >> Simplify step, again >> -------------------- >> >> Simplification of the `(((reg_$0<int x>) + 1) * 12) + 24` < `0` >> inequality... >> ^^^^^^^^^^ >> `@b` >> Result: `reg_$0<int x> < -3 S64b` >> ^^^^^^^^^^^^^ ^^^^^^^ >> `RootSym` --/ | >> / >> `C'` ----------------------/ >> >> `#1`: This step supposed to fold **ONLY** the //meta-language// expression >> parts (`@b`) of the previously aquired `ByteOffset`. >> `#2`: This step requires the expression under simplified to be of >> //meta-language// to be able to reorder the constants. (aka. to fold into >> the right hand side's constant). >> `#3`: This also requires the right-hand side to be of the //meta-language//. > > Do I understand this correctly, that none of the binary operations can have a > symbolic RHS, because that would mean we have a VLA? I think yes, you are right. However, as I tried to build an example demonstrating this, it turned out that it does not handle VLAs quite well. void foo(int x, int y) { int buf[x][y]; // VLA clang_analyzer_dump(&buf[1][2]); // &Element{Element{buf,1 S64b,int [y]},2 S64b,int} clang_analyzer_DumpRegionRawOffsetV2AndSimplify(&buf[1][2]); // RawOffsetV2: {BaseRegion: buf, ByteOffset: 8 S64b} // simplification result: {RootSymbol: 8 S64b, FoldedConstant: 0 S64b) } The RawOffsetV2 value is wrong. It supposed to have the ByteOffset: `(1 * sizeof(int[y])) + 8`. >> We treat the complete expression under //simplification// as an expression >> of the //meta-language//. >> I'm not changing this, but I probably should though. > > Again, I don't understand why you are sure that the value of //whole// > expression cannot overflow. > >> Ideally, after //simplification// we should have this inequality: `x+1 < -2` >> That way we would not fold any subexpression of the //object-language//, so >> the `#1` requirement would be preserved. >> The right-hand side is of an expression of the //meta-language//. >> It makes sense, to evaluate the `operator<` as the semantics of the >> //meta-language//. >> But the left-hand side is an expression of the //object-language//. >> We need some sort of //logical// conversion here. > > What if the second index is also symbolic? E.g `buf[x+1][y+1]`? > This would result in `(((reg_$0<int x>) + 1) * 12) + (reg_$1<int y>) + 1)` < > `0` . And after simplification, the RHS cannot be interpreted as //meta//. As the analyzer does not symbolicate such indexing, the pointer value will be `Unknown`, so the //simplification// process does not apply. At best we can hope to achieve something like this: BaseRegion: buf, ByteOffset: (reg_$0<int x>) + (reg_$1<int y>) + 16 After simplification: (reg_$0<int x>) + (reg_$1<int y>) < -16 But to get this, we need some sort of a //canonical// form of expressions having multiple symbolic values. For now, it seems reasonable to me to squash to the left all symbolic values. >> Check if the resulting //folded constant// (`C'`) is negative or not. > > What happens if the symbolic index is not the most inner index? E.g. > `buf[6][x+1]`. Then `C` is no longer constant, and no longer //meta// ... In this particular case, I would expect the BytesOffset: `(((reg_$0<int x>) + 1) * 4) + 72`, where 72 would be the `C` constant. This example is interesting from a different standpoint. The //canonical// form of expressions, again. I think for our purposes `((reg_$0<int x>) * 4) + 76` form would be a better choice since we need a single multiplication and a single addition. To get this, we need to transform `(Symbol + C) * C'` => `(Symbol * C') + (C * C')`. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D86874/new/ https://reviews.llvm.org/D86874 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits