ASDenysPetrov marked an inline comment as done. ASDenysPetrov added inline comments.
================ Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:588 + // AnyX2 means that two expressions marked as `Any` are met in code, + // and there is a special column for that, for example: + // if (x >= y) ---------------- xazax.hun wrote: > ASDenysPetrov wrote: > > xazax.hun wrote: > > > xazax.hun wrote: > > > > ASDenysPetrov wrote: > > > > > xazax.hun wrote: > > > > > > I have really hard time processing how to interpret `AnyX2`. > > > > > > > > > > > > For example in the code below: > > > > > > ``` > > > > > > if (x >= y) > > > > > > if (x != y) > > > > > > if (x <= y) > > > > > > return false > > > > > > ``` > > > > > > > > > > > > ``` > > > > > > if (x >= y) > > > > > > if (x == y) > > > > > > if (x <= y) > > > > > > return true > > > > > > ``` > > > > > > > > > > > > We would get different results for `<=`. So I do not really get how > > > > > > I should read the `AnyX2` column. > > > > > I'm sorry for the obscure explanation. I was really thinking about > > > > > how to explain it more clear. > > > > > Let's consider this code: > > > > > ``` > > > > > if (x >= y) > > > > > if (x != y) > > > > > if (x <= y) > > > > > ``` > > > > > If 'x >= y' is //true//, then following `x <= y` is `Any` ( can be > > > > > //true// or //false// ). //See table in the summary.// > > > > > If 'x != y' is //true//, then following `x <= y` is `Any` ( can be > > > > > //true// or //false// ). > > > > > If 'x >= y && x != y' is //true//, then following `x <= y` is `False` > > > > > only (it means that we previously met two Any states, what I've > > > > > called as **AnyX2**). > > > > > > > > > > > > > > > > > > > > > > > > Could you tell me how the reasoning would be different for > > > > ``` > > > > if (x >= y) > > > > if (x == y) > > > > if (x <= y) > > > > ``` > > > > ? > > > Oh never mind, after `x == y` the `x <= y` would be `true`. But still, I > > > find this logic hard to explain, understand, and prove correct. > > > I'd like to see at least some informal reasoning why will it always work. > > > Also, if someone would like to extend this table in the future (e.g. with > > > operator <=>), how to make it in such a way that does not break this > > > logic. > > When I was creating a table it was just 6x6 without AnyX2 column. It should > > tell of what RangeSet we shall return (true, false or empty). It covers > > cases with only **one** preceding condition. > > Then I found that **two** certain sequential conditions can also tell us > > what happen to the next condition. And those two conditions happened to be > > described as `Any` in the table. Thus I deduced one more column for this > > purpose. > > In other words: > > ``` > > if (x >= y) // 3. Aha, we found `x >= y`, which is also `Any` for `x <= > > y` in the table > > if (x != y) // 2. Aha, we found `x == y`, which is `Any` for `x <= y` > > in the table > > if (x <= y) // 1. Reasoning a RangeSet about this `x <= y` > > // 4. OK, we found two `Any` states. Let's look at the value in `AnyX2` > > column for `x <= y`. > > // 5. The value is `False`, so return `FalseRangeSet`, assuming CSA > > builds false branch only. > > ``` > > You can substitute any other row from the table and corresponding > > Any-comparisons. > > Honestly, I don’t know how else to explain this. :) > > >I find this logic hard to explain, understand, and prove correct. > > If you come up with it, It becomes really easy and clear. You'll see. But I > > agree that AnyX2 can be changed to something more obvious. > Ok, in the meantime I revisited the algorithm and I think now I understand > why does it work. > I think the main observation I missed was the following: **Every row has > exactly two `Any` entries, so we only need to think about one combination per > row.** I think originally I interpreted your matrix wrong (the transposed > one) and it did not really add up. > So I think you should document the fact that each row has exactly two unknown > cases. But for each of the rows, if we have both facts, we can actually > evaluate the current operator. > > The goal is that if someone wants to extend this code later on (e.g. if the > standard committee introduces a new operator that we handle), the next person > should be able to modify this algorithm without breaking the existing code. > So as you can see, adding a new column/row could break this invariant of > having exactly 2 unknowns in each row rendering this code incorrect. This is > why it is important, to explain what makes the algorithm work. > > I might miss something, but I do not really see test cases that are covering > these. > > I think some renaming might also help others understanding what is going on: > 1) Capitalize variable names, for example, `tristate`. > 2) Chose a bit more descriptive names. `tristate` from the previous example > describes the type of the variable. But variable names should describe their > contents rather than their type. It can be `Result` in case it describes the > result of the operation. I think in 3-valued logic we tend to use `Unknown` > rather than `Any`. You have code like `auto tristate = > CmpOpTable[IndexOP][Index];`. This is not really readable as it is really > hard to tell what index corresponds to what. One of the indexes correspond to > the evaluated operator while the other corresponds to the query. The variable > names should reflect that. Thank you. I'll take into account all your notes. I'll update the patch soon. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78933/new/ https://reviews.llvm.org/D78933 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits