NoQ added inline comments.
================ Comment at: clang/lib/Analysis/UnsafeBufferUsage.cpp:2293 + // search of connected components. + if (!ParmsNeedFix.empty()) { + auto First = ParmsNeedFix.begin(), Last = First; ---------------- ziqingluo-90 wrote: > NoQ wrote: > > What about the situation where params aren't seen as unsafe yet, but > > they're discovered to be unsafe later? Eg. > > ``` > > void foo(int *p, int *q) { > > int *p2 = p; > > p2[5] = 7; > > int *q2 = q; > > q2[5] = 7; > > } > > ``` > > Will we notice that `p` and `q` need to be fixed simultaneously? > > > > --- > > > > I suspect that the problem of parameter grouping can't be solved by > > pre-populating strategy implications between all parameters. It'll either > > cause you to implicate all parameters (even the ones that will never need > > fixing), or cause you to miss connections between parameters that do need > > fixing (as the connection is discovered later). > > > > Connection between parameters needs to be added *after* the implications > > algorithm has finished. And once it's there (might be already there if I > > missed something), then this part of code probably becomes unnecessary. > Yes. They will be noticed. Here is why: > > The code here first forms a ring over `p` and `q` in the assignment > (directed) graph: > ``` > p <--> q > ``` > Then the two initialization statements (implemented in [[ > https://reviews.llvm.org/D150489 | D150489 ]]) add two more edges to the > graph: > ``` > p2 --> p <--> q <-- q2 > ``` > The algorithm later searches the graph starting from unsafe variables `p2` > and `q2`, respectively, Starting from `p2`, the algorithm discovers that > `p2` and `p`, as well as `p` and `q` depend on each other. Starting from > `q2`, the algorithm discovers that `q2` and `q`, as well as `p` and `q` > depend on each other. The dependency graph is sort of undirected. So > eventually, the four variables `p2`, `p`, `q`, `q2` are in the same group. > > The code here first forms a ring over `p` and `q` in the assignment > (directed) graph I don't think it does. The ring is formed over the `ParamsNeedFix` list, which is empty because none of these parameters are listed in `UnsafeOps.byVar`. ================ Comment at: clang/test/SemaCXX/warn-unsafe-buffer-usage-fixits-multi-parm-span.cpp:13-15 + expected-note{{change type of 'p' to 'std::span' to preserve bounds information, and change 'q' and 'a' to 'std::span' to propagate bounds information between them}} \ + expected-note{{change type of 'q' to 'std::span' to preserve bounds information, and change 'p' and 'a' to 'std::span' to propagate bounds information between them}} \ + expected-note{{change type of 'a' to 'std::span' to preserve bounds information, and change 'p' and 'q' to 'std::span' to propagate bounds information between them}} ---------------- ziqingluo-90 wrote: > ziqingluo-90 wrote: > > NoQ wrote: > > > QoL thing: //"to propagate bounds information between them"// isn't the > > > correct reason in this case. Bounds information doesn't propagate between > > > `p` and `q` and `a`. Each of them carries its own, completely unrelated > > > bounds information. > > > > > > We need to come up with a different reason. Or it might be better to > > > combine the warnings into one warning here, so that we didn't need a > > > reason: > > > ``` > > > warning: 'p', 'q' and 'a' are unsafe pointers used for buffer access > > > note: change type of 'p', 'q' and 'a' to 'std::span' to preserve bounds > > > information > > > ``` > > > > > > This gets a bit worse when parameters are implicated indirectly, but it > > > still mostly works fine, for example: > > > ``` > > > void foo(int *p, int *q) { > > > int *p2 = p; > > > p2[5] = 7; > > > int *q2 = q; > > > q2[5] = 7; > > > } > > > ``` > > > would produce: > > > ``` > > > warning: 'p2' and 'q2' are unsafe pointers used for buffer access > > > note: change type of 'p2' and 'q2' to 'std::span' to preserve bounds > > > information, and > > > change 'p' and 'q' to 'std::span' to propagate bounds information > > > between them > > > ``` > > > > > > --- > > > > > > In any case, this shows that the relationship between parameters (grouped > > > together simply for being parameters) is very different from the > > > relationship within groups of variables implicated by assignments (even > > > if two or more of them are parameters). All the way down to the > > > warning/note printer, we probably need to continue giving the parameter > > > relationship a special treatment. > > This is a good argument! An edge from `p` to `q` in an assignment graph > > stands for not only that fixing `p` implicates fixing `q` but also that the > > bounds information is propagated from `q` to `p`. In this sense, the way > > I connect parameters is inappropriate. > > > > But my concern is that although treating parameters specially could improve > > the quality of the diagnostic messages, the code may be more complex and > > less re-used. > So for such an example > ``` > int f(int *p, int *q) { > int * a = p; > int * b = q; > > a[5] = 0; > b[5] = 0; > } > ``` > We will fix all four variables `a, b, p, q` together. But bounds information > propagation only happens between `a` and `p`, and `b` and `q`, respectively. > Then what should the diagnostic message be? How about something like > ``` > change type of 'a' to 'std::span' to preserve bounds information, and change > 'p' to propagate bounds information between them. > To ensure parameters are changed together, also > change type of 'b' to 'std::span' to preserve bounds information, and change > 'q' to propagate bounds information between them; > ... > ``` > for variable `a`. I think the ideal narrative is to emit a single warning against the entire function, which covers all parameters at once, together with all related variables. ```lang=c++ int f(int *p, int *q) { // warning: function 'f' performs unsafe buffer access over parameters 'p' and 'q' // note: change type of 'p' and 'q' to 'std::span' to receive bounds information from the call site, // and change local variables 'a' and 'b' to 'std::span' to propagate it to the access site int * a = p; // note: bounds information needs to propagate from 'p' to 'a' here int * b = q; // note: bounds information needs to propagate from 'q' to 'b' here a[5] = 0; // note: used in buffer access here b[5] = 0; // note: used in buffer access here } ``` This makes a lot of sense given that we're fixing the entire function. This is quite a major change though, we need to think this through. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D153059/new/ https://reviews.llvm.org/D153059 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits