hokein added a comment.

>> If you're suggesting proceeding with this regex based solution, I
> 
> don't think that's a good idea. Why commit a hack which people will object to 
> ever removing? Just see if we can do the right thing instead.

+1, my main concern is the complexity of the patch and maintenance burden of 
the python script.

In https://reviews.llvm.org/D54141#1288811, @JonasToth wrote:

> > - The output of clang-tidy diagnostic is YAML, and YAML is not an 
> > space-efficient format (just for human readability). If you want to save 
> > space further, we might consider using some compressed formats, e.g. 
> > llvm::bitcode. Given the reduced YAML result (5.4MB) is promising, this 
> > might not matter.
>
> The output were normal diagnostics written to stdout, deduplication happens 
> from there (see the test-cases). The files i created were just through piping 
> to filter some of the noise.
>  Without de-duplication its very hard to get something useful out of a run 
> with many checks activated for bigger projects (e.g. Blender and OpenCV are 
> useless to try, because they have some commonly used macros with a 
> check-violation. The buildbot filled 30GB of RAM before it crashed and 
> couldn't even finish the analysis of the project. Similar for LLVM).
>
> I would like to try the simple deduplication first and see if space is still 
> an issue. After all I want to just read the diagnostic and see whats 
> happening instantly and a more compressed format might not help there.


I misthought that the output was the `-export-fixes`, but what you mean is the 
stdout of clang-tidy.

Could you please explain your motivation of catching clang-tidy stdout? 
`--export-fixes` emits everything of `diagnostic` to YAML even the `diagnostic` 
doesn't have fixes. I guess the reason is that you want code snippets that you 
could show to users? If so, I think this is a separate UX problem, since we 
have everything in the emitted YAML, and we could construct whatever messages 
we want from it. A simpler approach maybe:

1. run clang-tidy in parallel on whole project, and emits a deduplicated result 
(`fixes.yaml`).
2. run a postprocessing in your buildbot that constructs diagnostic messages 
from `fixes.yaml`, and store it somewhere.
3. do whatever you want with output from 1) and 2).

Step 1 could be done in upstream, probably via `AllTUsExecutor`, and 
deduplication can be done on the fly based on 
`<CheckName>::<FilePath>::<FileOffset>`; we still need 
`clang-apply-replacement` to deduplicate replacements; I'm happy to help with 
this. Step 2 could be done by your own, just a simple script.

> At the moment clang-apply-replacements is called at the end of an clang-tidy 
> run in run-clang-tidy.py That means we produce ~GBs of Yaml first, to then 
> emit ~10MBs worth of it.

That's why I suggest using some sort of other space-efficient formats to store 
the fixes. My intuition is that the final deduplicated result shouldn't be too 
large (even for YAML), because 1) no duplication 2) these are **actual 
diagnostics** in code, a healthy codebase shouldn't contain lots of problem 3) 
you have mentioned that you use it for small projects :)


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D54141



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to