JonasToth added a comment.

> Could you please explain your motivation of catching clang-tidy stdout? 
> `--export-fixes` emits everything of `diagnostic` to YAML even the 
> `diagnostic` doesn't have fixes. I guess the reason is that you want code 
> snippets that you could show to users? If so, I think this is a separate UX 
> problem, since we have everything in the emitted YAML, and we could construct 
> whatever messages we want from it.

A bit for pragmatic reasons and a bit precaution.

- You are right with the code-snippet. I want to check for false-positives in 
new clang-tidy checks, if I can just scroll through and see the code-snippet in 
question it is just practical.
- diagnostics in template-code might emit multiple warnings at the same 
code-position. There is a realistic chance that `warning: xy happened here` 
will be the same for all template-instantiations, and only the `note: type 'XY' 
does not match` gives the differentiating hint. If dedup happens _ONLY_ on the 
first warning I fear we might loose valid diagnostics! I did re-evaluate and it 
seems that the emitted yaml does not include the notes. That is an issues, for 
example the CSA output relies on the emitted `notes` that explain the path to 
the bug.
- I originally implemented it for my buildbot which parses the check-name, 
location and so on and then gives an ordered output for each check in a module 
and so on. I extracted the essence for deduplication. `-export-fixes` still 
emits the clang-tidy diagnostics, so for my concrete use-case YAML based 
de-duplication brings no value in its current form, as my BB still struggles 
with the amount of stdout.

> 1. run clang-tidy in parallel on whole project, and emits a deduplicated 
> result (`fixes.yaml`).
> 2. run a postprocessing in your buildbot that constructs diagnostic messages 
> from `fixes.yaml`, and store it somewhere.
> 3. do whatever you want with output from 1) and 2).
> 
>   Step 1 could be done in upstream, probably via `AllTUsExecutor`, and 
> deduplication can be done on the fly based on 
> `<CheckName>::<FilePath>::<FileOffset>`; we still need 
> `clang-apply-replacement` to deduplicate replacements; I'm happy to help with 
> this. Step 2 could be done by your own, just a simple script.
> 
>> At the moment clang-apply-replacements is called at the end of an clang-tidy 
>> run in run-clang-tidy.py That means we produce ~GBs of Yaml first, to then 
>> emit ~10MBs worth of it.
> 
> That's why I suggest using some sort of other space-efficient formats to 
> store the fixes. My intuition is that the final deduplicated result shouldn't 
> be too large (even for YAML), because 1) no duplication 2) these are **actual 
> diagnostics** in code, a healthy codebase shouldn't contain lots of problem 
> 3) you have mentioned that you use it for small projects :)

To 3) I do use it for all kinds of projects, LLVM and Blender are currently the 
biggest ones. I want to go for LibreOffice, Chromium and so on as well. But 
right now the amount of noise is the biggest obstacle. My goal is not to check 
if the project is healthy/provide a service for the project, but check if _we_ 
have bugs in our checks and if code-transformation is correct, false positives, 
too much output, misleading messages, ...

To 2) LLVM is very chatty as well, I don't consider LLVM to be a bad code-base. 
Take `readability-braces-around-statements` for example. I want to test if the 
check transform all possible places correctly, LLVM does not follow this style 
and LLVM has a lot of big headers that implement functionality that are 
transitively included a lot. LLVM is the one that overflowed my 32GB of RAM :)

To 1) I do agree and the data presented support that. I suspect that 
Yaml-to-stdout Ratio is maybe 2/3:1? So in the analyzed case we end up with 
10-15MB of data instead of ~600MB(all Yaml). Space optimization is something we 
can tackle after-wards as it does not seem to be pathological after 
deduplication.

In general: I somewhat consider this patch as rejected, I will still use it 
locally for my BB, but I think this revision should be closed. We could move 
the discussion here to the mailing-list if you want. It is ok to continue here 
as well, as we already started to make plans :)
My opinion is, that we should put as much of the deduplication into clang-tidy 
itself and not rely on tools like `run-clang-tidy.py` if we can.

So for me step 1. would be providing `AllTUsExecutor` in clang-tidy and make it 
parallel itself. For dedup we need hook the diagnostics. CSA has the 
`BugReport` class that could be hashed. clang-tidy currently doesn't have this, 
maybe a similar approach (or the same?) would help us out.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D54141



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to