Szelethus added a comment. I'm afraid it'll be even more time before I post a new diff. There are some things that `ProgramState` is lacking, so I'll get that fixed first in a different pull request first.
================ Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:258-260 + Report->addNote(NoteBuf, + PathDiagnosticLocation::create(FieldChain.getEndOfChain(), + Context.getSourceManager())); ---------------- NoQ wrote: > NoQ wrote: > > Szelethus wrote: > > > NoQ wrote: > > > > Aha, ok, got it, so you're putting the warning at the end of the > > > > constructor and squeezing all uninitialized fields into a single > > > > warning by presenting them as notes. > > > > > > > > Because for now notes aren't supported everywhere (and aren't always > > > > supported really well), i guess it'd be necessary to at least provide > > > > an option to make note-less reports before the checker is enabled by > > > > default everywhere. In this case notes contain critical pieces of > > > > information and as such cannot be really discarded. > > > > > > > > I'm not sure that notes are providing a better user experience here > > > > than simply saying that "field x.y->z is never initialized". With > > > > notes, the user will have to scroll around (at least in scan-build) to > > > > find which field is uninitialized. > > > > > > > > Notes will probably also not decrease the number of reports too much > > > > because in most cases there will be just one uninitialized field. > > > > Though i agree that when there is more than one report, the user will > > > > be likely to deal with all fields at once rather than separately. > > > > > > > > So it's a bit wonky. > > > > > > > > We might want to enable one mode or another in the checker depending on > > > > the analyzer output flag. Probably in the driver > > > > (`RenderAnalyzerOptions()`). > > > > > > > > It'd be a good idea to land the checker into alpha first and fix this > > > > in a follow-up patch because this review is already overweight. > > > > [...]i guess it'd be necessary to at least provide an option to make > > > > note-less reports before the checker is enabled by default everywhere. > > > > In this case notes contain critical pieces of information and as such > > > > cannot be really discarded. > > > This can already be achieved with `-analyzer-config > > > notes-as-events=true`. There no good reason however not to make an > > > additional flag that would emit warnings instead of notes. > > > > I'm not sure that notes are providing a better user experience here > > > > than simply saying that "field x.y->z is never initialized". With > > > > notes, the user will have to scroll around (at least in scan-build) to > > > > find which field is uninitialized. > > > This checker had a previous implementation that emitted a warning for > > > each uninitialized field, instead of squeezing them in a single warning > > > per constructor call. One of the major reasons behind rewriting it was > > > that it emitted so many of them that it was difficult to differentiate > > > which warning belonged to which constructor call. Also, in the case of > > > the LLVM/Clang project, the warnings from this checker alone would be in > > > the thousands. > > > > Notes will probably also not decrease the number of reports too much > > > > because in most cases there will be just one uninitialized field. > > > While this is true to some extent, it's not uncommon that a single > > > constructor call leaves 7 uninitialized -- in the LLVM/Clang project I > > > found one that had over 30! > > > Since its practically impossible to determine whether this was a > > > performance enhancement or just poor programming, the few cases where it > > > is intentional, an object would emit many more warnings would make make > > > majority of the findings (where an object truly had only 1 or 2 fields > > > uninit) a lot harder to spot in my opinion. > > > > > > In general, I think one warning per constructor call makes the best user > > > experience, but a temporary solution should be implemented until there's > > > better support for notes. > > > > > > I agree, this should be a topic of a follow-up patch. > > > > > > This can already be achieved with `-analyzer-config notes-as-events=true`. > > > > Yep. But the resulting user experience seems pretty bad to me. It was a > > good quick proof-of-concept trick to test things, but the fact that notes > > aren't path pieces is way too apparent in case of this checker. So i guess > > this flag was a bad idea. > > > > > Also, in the case of the LLVM/Clang project, the warnings from this > > > checker alone would be in the thousands. > > > > This makes me a bit worried, i wonder what's the reason why does the > > checker shout so loudly on a codebase that doesn't seem to have actual > > uninitialized value bugs all over the place. > > > > Are any of these duplicates found in the same constructor for the same > > field in different translation units? Suppose we discard the duplicates (if > > any) and warn exactly once (across the whole LLVM) for every uninitialized > > field (which is probably more than once per constructor). Then I wonder: > > 1. How many uninitialize fields would we be able to find this way? > > 2. How many of such fields were intentionally left uninitialized? > > 3. How many were found by deep inspection of the heap through field chains > > involving pointer dereference "links"? > (when i'm asking this sort of stuff, i don't mean you should look through > thousands of positives, a uniform random sample of ~50 should be just fine) > (but we really do need this info before enabling stuff by default) >>This can already be achieved with -analyzer-config notes-as-events=true. >Yep. But the resulting user experience seems pretty bad to me. It was a good >quick proof-of-concept trick to test things, but the fact that notes aren't >path pieces is way too apparent in case of this checker. So i guess this flag >was a bad idea. I totally agree, I meant that a note-less solution should only be a (arguably better) workaround just as `notes-as-events=true`. >Are any of these duplicates found in the same constructor for the same field >in different translation units? Suppose we discard the duplicates (if any) and >warn exactly once (across the whole LLVM) for every uninitialized field (which >is probably more than once per constructor). Sure, having the same field field reported from the same constructor call in different TUs happens quite a lot, but they can easily be uniqued. In the checkers current state, meaning that one warning per ctor call, the LLVM/Clang project in pedantic mode resulted in **409 non-unique warnings**, and using CodeChecker I found that **181 of these is unique**. >How many uninitialize fields would we be able to find this way? In its current state, in pedantic mode, after uniqueing **~581**. Now that's quite a bit off from what I remembered //"the warnings from this checker alone would be in the thousands"//, but having one warning per uninit field would still result in a ~320% increase in reports. >How many of such fields were intentionally left uninitialized? (I checked around 95 reports a couple weeks back before uploading this diff) **Almost all of them.** Note that this is a very performance-critical project. From what I saw, many times several fields are left uninitialized, but these fields are inaccessible due to a `kind` field being asserted at their getter functions. Also, I found cases where the constructor wasn't defaulted with `= default;`. In fact, I struggled to find a case where a field was 100% left out by accident. Maybe in `llvm/lib/IR/ConstantsContext.h`, calling `ConstantExprKeyType(ArrayRef<Constant *> Operands, const ConstantExpr *CE)`. This is not the case however in other projects I checked. In Xerces for example every find was a true positive. >How many were found by deep inspection of the heap through field chains >involving pointer dereference "links"? This is somewhat harder to measure. The best I could come up with is uniqueing the fieldchains from the plist files with lexicographical sorting. Your question can be answered by measuring the length of the fieldchains. I found * 3 fieldchains with the length of 5 * 18 fieldchains with the length of 4 * 62 fieldchains with the lenfth of 3 * 108 fieldchains with the length of 2 * 137 fieldchains with the length of 1 (that is a total of 328 unique fieldchains lexicographically) Because heap objects are not modeled, only 3 of the 328 fieldchains contains pointer dereferences. However, reference objects are very common from what I saw, but sadly I can't give you an exact number on those. I hope these answers were satisfying, but I'll follow up with more information if needed. ================ Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:359-372 + // Checking bases. + const auto *CXXRD = dyn_cast<CXXRecordDecl>(RD); + if (!CXXRD) + return ContainsUninitField; + + for (const CXXBaseSpecifier BaseSpec : CXXRD->bases()) { + const auto *Base = BaseSpec.getType()->getAsCXXRecordDecl(); ---------------- NoQ wrote: > Are there many warnings that will be caused by this but won't be caused by > conducting the check for the constructor-within-constructor that's currently > disabled? Not sure - is it even syntactically possible to not initialize the > base class? I'm not a 100% sure what you mean. Can you clarify? >Not sure - is it even syntactically possible to not initialize the base class? If I understand the question correctly, no, as far as I know. ================ Comment at: lib/StaticAnalyzer/Checkers/UninitializedObjectChecker.cpp:430 + if (T->isUnionType()) { + // TODO: does control ever reach here? + if (isUnionUninit(R)) { ---------------- NoQ wrote: > Add `llvm_unreachable("")` to find out :) It does! :) https://reviews.llvm.org/D45532 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits