donat.nagy marked 4 inline comments as done. donat.nagy added a comment. Thanks for the suggestions, I answered those where I had something to say and I'll implement the rest of them.
In D156312#4576120 <https://reviews.llvm.org/D156312#4576120>, @steakhal wrote: > Have you considered checking the shift operands for taint? > I wonder if we could warn if we cannot prove the correct use of the shift > operator when either of the operands is tainted. It would be straightforward to add taint handling; but I'd prefer to leave it for a follow-up commit. Some experimentation would be needed to see whether it produces useful results (do real-world projects use shifts on tainted numbers? do we encounter false positives?). ================ Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:53 +class BitwiseShiftValidator { + // Primary mutable state: + CheckerContext &Ctx; ---------------- steakhal wrote: > Where does this comment corresponds to? The two data members (`Ctx` and `State`) that are directly below the comment. I tried to split the list of data members into three groups (primary mutable state, secondary mutable state, `const` members), but now I reconsider this I feel that these header comments are actually superfluous. I'll return to a variant of the earlier layout where I put `NonNegFlags`/`UpperBound` to the end and comment that they are only used for note tag creation; but don't emphasize the (immediately visible) `const`ness / non-`const`-ness of other members with comments. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:57-59 + // Secondary mutable state, used only for note tag creation: + enum { NonNegLeft = 1, NonNegRight = 2 }; + unsigned NonNegFlags = 0; ---------------- steakhal wrote: > How about using the same enum type for these two? It would signify that these > are used together. > Also, by only looking at the possible values, it's not apparent that these > are bitflag values. A suffix might help the reader. Using the enum type is a good idea, however I'd probably put the `Flag` suffix into the name of the type: ``` enum SignednessFlags : unsigned { NonNegLeft = 1, NonNegRight = 2 }; SignednessFlags NonNegOperands = 0; ``` ================ Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:60-63 + // If there is an upper bound, it is <= the size of a primitive type + // (expressed in bits), so this is a very safe sentinel value: + enum { NoUpperBound = 999 }; + unsigned UpperBound = NoUpperBound; ---------------- steakhal wrote: > Have you considered using `std::optional`? > Given that the dimension of this variable is "bits", I think it would be > great to have that in its name. I considered using `std::optional`, but using this "old-school" solution ensures that `NoUpperBound` is comparable to and larger than the "real" upper bound values, which lets me avoid some ugly boolean logic. I'll update the comment and probably also the variable name. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:106-109 + if (auto BRP = checkOvershift()) { + Ctx.emitReport(std::move(BRP)); + return; + } ---------------- steakhal wrote: > I'd prefer `BR` or `Report`. Since we know already that we are in the > path-sensitive domain, `P` has less value there. > I'd apply this concept everywhere in the patch. The `P` stands for the "Ptr" in the name of the type alias `BugReportPtr`; but your comment clearly highlights that this variable name is confusing ;). I'll probably avoid the `auto` and use `BugReportPtr Bug = ...` to remind the reader that this is a pointer (that can be null) and if it's non-null, then we found a bug. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:190-191 + std::string Msg = + llvm::formatv("The result of {0} shift is undefined because the right " + "operand{1} is not smaller than {2}, the capacity of '{3}'", + isLeftShift() ? "left" : "right", RightOpStr, LHSBitWidth, ---------------- steakhal wrote: > This is an important distinction, I use "not smaller than {2}" as a shorter synonym of "larger than or equal to {2}". I can choose some other wording if you feel that this is not clear enough. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:263 + + const SVal Right = Ctx.getSVal(Op->getRHS()); + ---------------- steakhal wrote: > I'd move this closer to the first "use" place. Good point, previously it was used by the calculation that is now performed by `assumeRequirement`. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:315 + default: + llvm_unreachable("this checker does not use other comparison operators"); + } ---------------- steakhal wrote: > I'd say that the current wording is correct because the the checker "use"s the comparison operators (in its implementation) instead of "accept"ing them from outside; but I can replace the message with e.g. "this //function// does not accept other comparison operators" if you'd prefer that. ================ Comment at: clang/lib/StaticAnalyzer/Checkers/BitwiseShiftChecker.cpp:301-302 + pluralSuffix(MaximalAllowedShift)); + R->addNote(LeftNote, PathDiagnosticLocation{LHS, Ctx.getSourceManager(), + Ctx.getLocationContext()}); + Ctx.emitReport(std::move(R)); ---------------- NoQ wrote: > donat.nagy wrote: > > donat.nagy wrote: > > > donat.nagy wrote: > > > > gamesh411 wrote: > > > > > donat.nagy wrote: > > > > > > NoQ wrote: > > > > > > > Can we just append this to the warning? The `addNote()` is useful > > > > > > > for notes that need to be placed in code outside of the execution > > > > > > > path, but if it's always next to the warning, it probably doesn't > > > > > > > make sense to display it separately. > > > > > > I didn't append this to the warning because I felt that the warning > > > > > > text is already too long. (By the way, an earlier internal variant > > > > > > of this checker produced two separate notes next to the warning > > > > > > message.) > > > > > > > > > > > > I tweaked the messages of this checker before initiating this > > > > > > review, but I feel that there's still some room for improvements. > > > > > > (E.g. in this particular case perhaps we could omit some of the > > > > > > details that are not immediately useful and could be manually > > > > > > calculated from other parts of the message...) > > > > > Just a thought on simplifying the diagnostics a bit: > > > > > > > > > > Warning: "Right operand is negative in left shift" > > > > > Note: "The result of left shift is undefined because the right > > > > > operand is negative" > > > > > Shortened: "Undefined left shift due to negative right operand" > > > > > > > > > > Warning: "Left shift by '34' overflows the capacity of 'int'" > > > > > Note: "The result of left shift is undefined because the right > > > > > operand '34' is not smaller than 32, the capacity of 'int'" > > > > > Shortened: "Undefined left shift: '34' exceeds 'int' capacity (32 > > > > > bits)" > > > > > > > > > > The more complex notes are a bit sketchy, as relevant information can > > > > > be lost, and the following solution is probably a bit too much, but > > > > > could prove to be an inspiration: > > > > > > > > > > Warning: "Left shift of '1024' overflows the capacity of 'int'" > > > > > CXX Note: "Left shift of '1024' is undefined because 'int' can hold > > > > > only 32 bits (including the sign bit), so some bits overflow" > > > > > CXX Note: "The value '1024' is represented by 11 bits, allowing at > > > > > most 21 bits for bitshift" > > > > > C Note: "Left shift of '1024' is undefined because 'int' can hold > > > > > only 31 bits (excluding the sign bit), so some bits overflow" > > > > > C Note: "The value '1024' is represented by 11 bits, allowing at most > > > > > 20 bits for bitshift" > > > > > > > > > > Shortened: > > > > > CXX Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' > > > > > capacity (32 bits, including sign)" > > > > > C Warning: "Undefined left shift: '1024' (11 bits) exceeds 'int' > > > > > capacity (31 bits, excluding sign)" > > > > Clarification about the `Msg`/`ShortMsg` distinction: > > > > I'm intentionally separating the short warning messages and the longer > > > > note messages because `createBugReport()` enforces the convention that > > > > it will always emit a warning and a note at the bug location. > > > > > > > > According to the comments in the source code, the intention is that the > > > > note contains all the relevant information, while the warning is a > > > > brief summary that can be displayed in situations where the notes > > > > wouldn't fit the UI. > > > > > > > > IIUC many checkers ignore this distinction and emit the same (often > > > > long and cumbersome) message both as a note and as a warning > > > > (`createBugReport()` has a variant which takes only one message > > > > parameter and passes it to both locations), but I tried to follow it > > > > because I think it's ugly when the same message is repeated twice and > > > > there is some sense in providing both a brief summary and a full > > > > description that doesn't use potentially-ambiguous abbreviations to > > > > save space. > > > > > > > > Of course I could also accept a community decision that this "brief > > > > warning / full note" distinction is deprecated and will be eliminated > > > > (because e.g. front-end utilities are not prepared to handle it), but > > > > in that case I'd strongly suggest a redesign where we eliminate the > > > > redundantly printed 'note' message. (There is no reason to say the same > > > > thing twice! There is no reason to say the same thing twice!) > > > > > > > > On the other hand, in addition to this `Msg`/`ShortMsg` distinction, > > > > this part of the code also adds the extra `LeftNote` (as a remnant from > > > > an earlier internal version of this checker), and that's that's what > > > > I'd like to merge into `Msg` (following NoQ's suggestion and keeping it > > > > distinct from the `ShortMsg`). > > > Among notes, my only planned change is that instead of > > > > > > > Warning: "Left shift of '1024' overflows the capacity of 'int'" > > > > CXX Note: "Left shift of '1024' is undefined because 'int' can hold > > > > only 32 bits (including the sign bit), so some bits overflow" > > > > CXX Note: "The value '1024' is represented by 11 bits, allowing at most > > > > 21 bits for bitshift" > > > > C Note: "Left shift of '1024' is undefined because 'int' can hold only > > > > 31 bits (excluding the sign bit), so some bits overflow" > > > > C Note: "The value '1024' is represented by 11 bits, allowing at most > > > > 20 bits for bitshift" > > > > > > I'll implement something like > > > > > > > Warning: "Left shift of '1024' overflows the capacity of 'int'" > > > > CXX Note: "Left shift of '1024' (represented by 11 bits) is undefined > > > > because 'int' can hold only 32 bits (including the sign bit), so some > > > > bits overflow" > > > > C Note: "Left shift of '1024' (represented by 11 bits) is undefined > > > > because 'int' can hold only 31 bits (excluding the sign bit), so some > > > > bits overflow" > > Correction: now I'm leaning towards just discarding the secondary note, > > because the message examples listed in the previous comment are just the > > variants where the right operand is unknown. In the more detailed message > > template "The shift '{0} << {1}' is undefined {2}, so {3} bit{4} > > overflow{5}" there is no place to insert the "represented by {} bits" > > message. > > > There's nothing wrong with long, multi-sentence diagnostic messages! > > Unlike the compiler proper, we aren't typically used from the command line, > so we aren't trying to fit into 80 columns. So we start our warnings with a > capital letter and expect them to be, at the very least, complete sentences. Even if the message length is not limited, I think it's better to delete the additional note line (instead of merging it with the "regular" note line) because it's just //too much information// and the gist of the issue is lost behind the flood of redundant numbers. However, this is just a subjective decision and I'm open to extending the message if you feel that something important is missing from it. ================ Comment at: clang/test/Analysis/bitwise-shift-common.c:108 + *p += 1 << a; + // expected-note@-1 {{Assuming right operand of bit shift is non-negative but less than 32}} + *p += 1 << (a + 32); ---------------- steakhal wrote: > Hmm, does this "but" actually mean "and" here? > Anyway, both using "but" and "and" there looks awkward. > It might be only me though. Yes, this sentence fragment is like "not too small but not too large", where "but" is equivalent to "and" and both are a bit awkward. ================ Comment at: clang/test/Analysis/bitwise-shift-sanity-checks.c:20-22 +// RUN: -Wno-shift-count-negative -Wno-shift-negative-value \ +// RUN: -Wno-shift-count-overflow -Wno-shift-overflow \ +// RUN: -Wno-shift-sign-overflow ---------------- steakhal wrote: > How about simplifying these lines to `-Wno-shift`? Have you considered it? Unfortunately there is no such shortcut: when I tried to execute trunk clang with it, I got "//warning: unknown warning option '-Wno-shift'//". Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D156312/new/ https://reviews.llvm.org/D156312 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits