hokein added a comment. >> If you're suggesting proceeding with this regex based solution, I > > don't think that's a good idea. Why commit a hack which people will object to > ever removing? Just see if we can do the right thing instead.
+1, my main concern is the complexity of the patch and maintenance burden of the python script. In https://reviews.llvm.org/D54141#1288811, @JonasToth wrote: > > - The output of clang-tidy diagnostic is YAML, and YAML is not an > > space-efficient format (just for human readability). If you want to save > > space further, we might consider using some compressed formats, e.g. > > llvm::bitcode. Given the reduced YAML result (5.4MB) is promising, this > > might not matter. > > The output were normal diagnostics written to stdout, deduplication happens > from there (see the test-cases). The files i created were just through piping > to filter some of the noise. > Without de-duplication its very hard to get something useful out of a run > with many checks activated for bigger projects (e.g. Blender and OpenCV are > useless to try, because they have some commonly used macros with a > check-violation. The buildbot filled 30GB of RAM before it crashed and > couldn't even finish the analysis of the project. Similar for LLVM). > > I would like to try the simple deduplication first and see if space is still > an issue. After all I want to just read the diagnostic and see whats > happening instantly and a more compressed format might not help there. I misthought that the output was the `-export-fixes`, but what you mean is the stdout of clang-tidy. Could you please explain your motivation of catching clang-tidy stdout? `--export-fixes` emits everything of `diagnostic` to YAML even the `diagnostic` doesn't have fixes. I guess the reason is that you want code snippets that you could show to users? If so, I think this is a separate UX problem, since we have everything in the emitted YAML, and we could construct whatever messages we want from it. A simpler approach maybe: 1. run clang-tidy in parallel on whole project, and emits a deduplicated result (`fixes.yaml`). 2. run a postprocessing in your buildbot that constructs diagnostic messages from `fixes.yaml`, and store it somewhere. 3. do whatever you want with output from 1) and 2). Step 1 could be done in upstream, probably via `AllTUsExecutor`, and deduplication can be done on the fly based on `<CheckName>::<FilePath>::<FileOffset>`; we still need `clang-apply-replacement` to deduplicate replacements; I'm happy to help with this. Step 2 could be done by your own, just a simple script. > At the moment clang-apply-replacements is called at the end of an clang-tidy > run in run-clang-tidy.py That means we produce ~GBs of Yaml first, to then > emit ~10MBs worth of it. That's why I suggest using some sort of other space-efficient formats to store the fixes. My intuition is that the final deduplicated result shouldn't be too large (even for YAML), because 1) no duplication 2) these are **actual diagnostics** in code, a healthy codebase shouldn't contain lots of problem 3) you have mentioned that you use it for small projects :) Repository: rCTE Clang Tools Extra https://reviews.llvm.org/D54141 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits