[PATCH] D78933: [analyzer] RangeConstraintManager optimizations in comparison expressions

Denys Petrov via Phabricator via cfe-commits Thu, 07 May 2020 03:57:04 -0700

ASDenysPetrov marked an inline comment as done.
ASDenysPetrov added inline comments.



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:588
+  // AnyX2 means that two expressions marked as `Any` are met in code,
+  // and there is a special column for that, for example:
+  // if (x >= y)
----------------
xazax.hun wrote:
> ASDenysPetrov wrote:
> > xazax.hun wrote:
> > > xazax.hun wrote:
> > > > ASDenysPetrov wrote:
> > > > > xazax.hun wrote:
> > > > > > I have really hard time processing how to interpret `AnyX2`.
> > > > > > 
> > > > > > For example in the code below:
> > > > > > ```
> > > > > >    if (x >= y)
> > > > > >      if (x != y)
> > > > > >        if (x <= y)
> > > > > >          return false
> > > > > > ```
> > > > > > 
> > > > > > ```
> > > > > >    if (x >= y)
> > > > > >      if (x == y)
> > > > > >        if (x <= y)
> > > > > >          return true
> > > > > > ```
> > > > > > 
> > > > > > We would get different results for `<=`. So I do not really get how 
> > > > > > I should read the `AnyX2` column.
> > > > > I'm sorry for the obscure explanation. I was really thinking about 
> > > > > how to explain it more clear.
> > > > > Let's consider this code:
> > > > > ```
> > > > > if (x >= y)
> > > > >   if (x != y)
> > > > >     if (x <= y)
> > > > > ```
> > > > > If 'x >= y' is //true//, then following `x <= y` is  `Any` ( can be 
> > > > > //true// or //false// ). //See table in the summary.//
> > > > > If 'x != y' is //true//, then following `x <= y` is  `Any` ( can be 
> > > > > //true// or //false// ).
> > > > > If 'x >= y && x != y' is //true//, then following `x <= y` is `False` 
> > > > > only (it means that we previously met two Any states, what I've 
> > > > > called as **AnyX2**).
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > Could you tell me how the reasoning would be different for 
> > > > ```
> > > > if (x >= y)
> > > >   if (x == y)
> > > >     if (x <= y)
> > > > ```
> > > > ?
> > > Oh never mind, after `x == y` the `x <= y` would be `true`. But still, I 
> > > find this logic hard to explain, understand, and prove correct.
> > > I'd like to see at least some informal reasoning why will it always work. 
> > > Also, if someone would like to extend this table in the future (e.g. with 
> > > operator <=>), how to make it in such a way that does not break this 
> > > logic.
> > When I was creating a table it was just 6x6 without AnyX2 column. It should 
> > tell of what RangeSet we shall return (true, false or empty). It covers 
> > cases with only **one** preceding condition.
> > Then I found that **two** certain sequential conditions can also tell us 
> > what happen to the next condition. And those two conditions happened to be 
> > described as `Any` in the table. Thus I deduced one more column for this 
> > purpose.
> > In other words:
> > ```
> > if (x >= y)     // 3. Aha, we found `x >= y`, which is also `Any` for `x <= 
> > y` in the table
> >   if (x != y)   // 2. Aha, we found `x == y`, which is `Any` for `x <= y` 
> > in the table
> >     if (x <= y) // 1. Reasoning a RangeSet about this `x <= y`
> >     // 4. OK, we found two `Any` states. Let's look at the value in `AnyX2` 
> > column for `x <= y`.
> >     // 5. The value is `False`, so return `FalseRangeSet`, assuming CSA 
> > builds false branch only.
> > ```
> > You can substitute any other row from the table and corresponding 
> > Any-comparisons.
> > Honestly, I don’t know how else to explain this. :)
> > >I find this logic hard to explain, understand, and prove correct.
> > If you come up with it, It becomes really easy and clear. You'll see. But I 
> > agree that AnyX2 can be changed to something more obvious.
> Ok, in the meantime I revisited the algorithm and I think now I understand 
> why does it work.
> I think the main observation I missed was the following:  **Every row has 
> exactly two `Any` entries, so we only need to think about one combination per 
> row.** I think originally I interpreted your matrix wrong (the transposed 
> one) and it did not really add up.
> So I think you should document the fact that each row has exactly two unknown 
> cases. But for each of the rows, if we have both facts, we can actually 
> evaluate the current operator.
> 
> The goal is that if someone wants to extend this code later on (e.g. if the 
> standard committee introduces a new operator that we handle), the next person 
> should be able to modify this algorithm without breaking the existing code. 
> So as you can see, adding a new column/row could break this invariant of 
> having exactly 2 unknowns in each row rendering this code incorrect. This is 
> why it is important, to explain what makes the algorithm work.
> 
> I might miss something, but I do not really see test cases that are covering 
> these.
> 
> I think some renaming might also help others understanding what is going on:
> 1) Capitalize variable names, for example, `tristate`.
> 2) Chose a bit more descriptive names. `tristate` from the previous example 
> describes the type of the variable. But variable names should describe their 
> contents rather than their type. It can be `Result` in case it describes the 
> result of the operation. I think in 3-valued logic we tend to use `Unknown` 
> rather than `Any`.  You have code like `auto tristate = 
> CmpOpTable[IndexOP][Index];`. This is not really readable as it is really 
> hard to tell what index corresponds to what. One of the indexes correspond to 
> the evaluated operator while the other corresponds to the query. The variable 
> names should reflect that. 
Thank you. I'll take into account all your notes. I'll update the patch soon.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78933/new/

https://reviews.llvm.org/D78933



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D78933: [analyzer] RangeConstraintManager optimizations in comparison expressions

Reply via email to