[PATCH] D78933: [analyzer] RangeConstraintManager optimizations in comparison expressions

Gábor Horváth via Phabricator via cfe-commits Wed, 06 May 2020 12:27:24 -0700

xazax.hun added inline comments.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:588
+  // AnyX2 means that two expressions marked as `Any` are met in code,
+  // and there is a special column for that, for example:
+  // if (x >= y)
----------------
ASDenysPetrov wrote:
> xazax.hun wrote:
> > xazax.hun wrote:
> > > ASDenysPetrov wrote:
> > > > xazax.hun wrote:
> > > > > I have really hard time processing how to interpret `AnyX2`.
> > > > > 
> > > > > For example in the code below:
> > > > > ```
> > > > >    if (x >= y)
> > > > >      if (x != y)
> > > > >        if (x <= y)
> > > > >          return false
> > > > > ```
> > > > > 
> > > > > ```
> > > > >    if (x >= y)
> > > > >      if (x == y)
> > > > >        if (x <= y)
> > > > >          return true
> > > > > ```
> > > > > 
> > > > > We would get different results for `<=`. So I do not really get how I 
> > > > > should read the `AnyX2` column.
> > > > I'm sorry for the obscure explanation. I was really thinking about how 
> > > > to explain it more clear.
> > > > Let's consider this code:
> > > > ```
> > > > if (x >= y)
> > > >   if (x != y)
> > > >     if (x <= y)
> > > > ```
> > > > If 'x >= y' is //true//, then following `x <= y` is  `Any` ( can be 
> > > > //true// or //false// ). //See table in the summary.//
> > > > If 'x != y' is //true//, then following `x <= y` is  `Any` ( can be 
> > > > //true// or //false// ).
> > > > If 'x >= y && x != y' is //true//, then following `x <= y` is `False` 
> > > > only (it means that we previously met two Any states, what I've called 
> > > > as **AnyX2**).
> > > > 
> > > > 
> > > > 
> > > > 
> > > Could you tell me how the reasoning would be different for 
> > > ```
> > > if (x >= y)
> > >   if (x == y)
> > >     if (x <= y)
> > > ```
> > > ?
> > Oh never mind, after `x == y` the `x <= y` would be `true`. But still, I 
> > find this logic hard to explain, understand, and prove correct.
> > I'd like to see at least some informal reasoning why will it always work. 
> > Also, if someone would like to extend this table in the future (e.g. with 
> > operator <=>), how to make it in such a way that does not break this logic.
> When I was creating a table it was just 6x6 without AnyX2 column. It should 
> tell of what RangeSet we shall return (true, false or empty). It covers cases 
> with only **one** preceding condition.
> Then I found that **two** certain sequential conditions can also tell us what 
> happen to the next condition. And those two conditions happened to be 
> described as `Any` in the table. Thus I deduced one more column for this 
> purpose.
> In other words:
> ```
> if (x >= y)     // 3. Aha, we found `x >= y`, which is also `Any` for `x <= 
> y` in the table
>   if (x != y)   // 2. Aha, we found `x == y`, which is `Any` for `x <= y` in 
> the table
>     if (x <= y) // 1. Reasoning a RangeSet about this `x <= y`
>     // 4. OK, we found two `Any` states. Let's look at the value in `AnyX2` 
> column for `x <= y`.
>     // 5. The value is `False`, so return `FalseRangeSet`, assuming CSA 
> builds false branch only.
> ```
> You can substitute any other row from the table and corresponding 
> Any-comparisons.
> Honestly, I don’t know how else to explain this. :)
> >I find this logic hard to explain, understand, and prove correct.
> If you come up with it, It becomes really easy and clear. You'll see. But I 
> agree that AnyX2 can be changed to something more obvious.
Ok, in the meantime I revisited the algorithm and I think now I understand why 
does it work.
I think the main observation I missed was the following:  **Every row has 
exactly two `Any` entries, so we only need to think about one combination per 
row.** I think originally I interpreted your matrix wrong (the transposed one) 
and it did not really add up.
So I think you should document the fact that each row has exactly two unknown 
cases. But for each of the rows, if we have both facts, we can actually 
evaluate the current operator.

The goal is that if someone wants to extend this code later on (e.g. if the 
standard committee introduces a new operator that we handle), the next person 
should be able to modify this algorithm without breaking the existing code. So 
as you can see, adding a new column/row could break this invariant of having 
exactly 2 unknowns in each row rendering this code incorrect. This is why it is 
important, to explain what makes the algorithm work.

I might miss something, but I do not really see test cases that are covering 
these.

I think some renaming might also help others understanding what is going on:
1) Capitalize variable names, for example, `tristate`.
2) Chose a bit more descriptive names. `tristate` from the previous example 
describes the type of the variable. But variable names should describe their 
contents rather than their type. It can be `Result` in case it describes the 
result of the operation. I think in 3-valued logic we tend to use `Unknown` 
rather than `Any`.  You have code like `auto tristate = 
CmpOpTable[IndexOP][Index];`. This is not really readable as it is really hard 
to tell what index corresponds to what. One of the indexes correspond to the 
evaluated operator while the other corresponds to the query. The variable names 
should reflect that. 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78933/new/

https://reviews.llvm.org/D78933



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D78933: [analyzer] RangeConstraintManager optimizations in comparison expressions

Reply via email to