[PATCH] D153059: [-Wunsafe-buffer-usage] Group parameter fix-its

Artem Dergachev via Phabricator via cfe-commits Thu, 22 Jun 2023 12:22:54 -0700

NoQ added inline comments.


================
Comment at: clang/lib/Analysis/UnsafeBufferUsage.cpp:2293
+    //      search of connected components.
+    if (!ParmsNeedFix.empty()) {
+      auto First = ParmsNeedFix.begin(), Last = First;
----------------
ziqingluo-90 wrote:
> NoQ wrote:
> > What about the situation where params aren't seen as unsafe yet, but 
> > they're discovered to be unsafe later? Eg.
> > ```
> > void foo(int *p, int *q) {
> >   int *p2 = p;
> >   p2[5] = 7;
> >   int *q2 = q;
> >   q2[5] = 7;
> > }
> > ```
> > Will we notice that `p` and `q` need to be fixed simultaneously?
> > 
> > ---
> > 
> > I suspect that the problem of parameter grouping can't be solved by 
> > pre-populating strategy implications between all parameters. It'll either 
> > cause you to implicate all parameters (even the ones that will never need 
> > fixing), or cause you to miss connections between parameters that do need 
> > fixing (as the connection is discovered later).
> > 
> > Connection between parameters needs to be added *after* the implications 
> > algorithm has finished. And once it's there (might be already there if I 
> > missed something), then this part of code probably becomes unnecessary.
> Yes.  They will be noticed.  Here is why:
> 
> The code here first forms a ring over `p` and  `q` in the assignment 
> (directed) graph:  
> ```
> p <--> q
> ```
> Then the two initialization statements (implemented in [[ 
> https://reviews.llvm.org/D150489 | D150489 ]]) add two more edges to the 
> graph:
> ```
> p2 --> p <--> q <-- q2
> ```
> The algorithm later searches the graph starting from unsafe variables `p2` 
> and `q2`, respectively,  Starting from `p2`, the algorithm discovers that 
> `p2` and `p`, as well as `p` and `q` depend on each other.  Starting from 
> `q2`,  the algorithm discovers that `q2` and `q`, as well as `p` and `q` 
> depend on each other.   The dependency graph is sort of undirected. So 
> eventually, the four variables `p2`, `p`, `q`, `q2` are in the same group.
> 
> The code here first forms a ring over `p` and  `q` in the assignment 
> (directed) graph

I don't think it does. The ring is formed over the `ParamsNeedFix` list, which 
is empty because none of these parameters are listed in `UnsafeOps.byVar`.


================
Comment at: 
clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp:13-15
+   expected-note{{change type of 'p' to 'std::span' to preserve bounds 
information, and change 'q' and 'a' to 'std::span' to propagate bounds 
information between them}} \
+   expected-note{{change type of 'q' to 'std::span' to preserve bounds 
information, and change 'p' and 'a' to 'std::span' to propagate bounds 
information between them}} \
+   expected-note{{change type of 'a' to 'std::span' to preserve bounds 
information, and change 'p' and 'q' to 'std::span' to propagate bounds 
information between them}}
----------------
ziqingluo-90 wrote:
> ziqingluo-90 wrote:
> > NoQ wrote:
> > > QoL thing: //"to propagate bounds information between them"// isn't the 
> > > correct reason in this case. Bounds information doesn't propagate between 
> > > `p` and `q` and `a`. Each of them carries its own, completely unrelated 
> > > bounds information.
> > > 
> > > We need to come up with a different reason. Or it might be better to 
> > > combine the warnings into one warning here, so that we didn't need a 
> > > reason:
> > > ```
> > > warning: 'p', 'q' and 'a' are unsafe pointers used for buffer access
> > > note: change type of 'p', 'q' and 'a' to 'std::span' to preserve bounds 
> > > information
> > > ```
> > > 
> > > This gets a bit worse when parameters are implicated indirectly, but it 
> > > still mostly works fine, for example:
> > > ```
> > > void foo(int *p, int *q) {
> > >   int *p2 = p;
> > >   p2[5] = 7;
> > >   int *q2 = q;
> > >   q2[5] = 7;
> > > }
> > > ```
> > > would produce:
> > > ```
> > > warning: 'p2' and 'q2' are unsafe pointers used for buffer access
> > > note: change type of 'p2' and 'q2' to 'std::span' to preserve bounds 
> > > information, and
> > >       change 'p' and 'q' to 'std::span' to propagate bounds information 
> > > between them
> > > ```
> > > 
> > > ---
> > > 
> > > In any case, this shows that the relationship between parameters (grouped 
> > > together simply for being parameters) is very different from the 
> > > relationship within groups of variables implicated by assignments (even 
> > > if two or more of them are parameters). All the way down to the 
> > > warning/note printer, we probably need to continue giving the parameter 
> > > relationship a special treatment.
> > This is a good argument!  An edge from `p` to `q` in an assignment graph 
> > stands for not only that fixing `p` implicates fixing `q` but also that the 
> > bounds information is propagated from `q` to `p`.   In this sense, the way 
> > I connect parameters is inappropriate.
> > 
> > But my concern is that although treating parameters specially could improve 
> > the quality of the diagnostic messages, the code may be more complex and 
> > less re-used.
> So for such an example
> ```
> int f(int *p, int *q) {
> int * a = p;
> int * b = q;
> 
> a[5] = 0;
> b[5] = 0;
> }
> ```
> We will fix all four variables `a, b, p, q` together.  But bounds information 
> propagation only happens between `a` and `p`,  and `b` and `q`, respectively. 
> Then what should the diagnostic message be?  How about something like
> ```
> change type of 'a' to 'std::span' to preserve bounds information, and change 
> 'p' to propagate bounds information between them. 
> To ensure parameters are changed together, also
> change type of 'b' to 'std::span' to preserve bounds information, and change 
> 'q' to propagate bounds information between them;
> ...
> ```
> for variable `a`.
I think the ideal narrative is to emit a single warning against the entire 
function, which covers all parameters at once, together with all related 
variables.

```lang=c++
int f(int *p, int *q) {  // warning: function 'f' performs unsafe buffer access 
over parameters 'p' and 'q'
                         // note: change type of 'p' and 'q' to 'std::span' to 
receive bounds information from the call site,
                         //       and change local variables 'a' and 'b' to 
'std::span' to propagate it to the access site

  int * a = p;           // note: bounds information needs to propagate from 
'p' to 'a' here
  int * b = q;           // note: bounds information needs to propagate from 
'q' to 'b' here

  a[5] = 0;              // note: used in buffer access here
  b[5] = 0;              // note: used in buffer access here
}
```

This makes a lot of sense given that we're fixing the entire function. This is 
quite a major change though, we need to think this through.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153059/new/

https://reviews.llvm.org/D153059

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D153059: [-Wunsafe-buffer-usage] Group parameter fix-its

Reply via email to