nickdesaulniers added subscribers: justinstitt, nikic, tstellar, 
serge-sans-paille.
nickdesaulniers added a comment.

In D20401#3823476 <https://reviews.llvm.org/D20401#3823476>, @sammccall wrote:

> In D20401#2770059 <https://reviews.llvm.org/D20401#2770059>, @nickdesaulniers 
> wrote:
>
>> I know this was sped up slightly in 
>> 3339c568c43e4644f02289e5edfc78c860f19c9f, but this change makes 
>> `updateConsecutiveMacroArgTokens` the hottest function in clang in a bottom 
>> up profile of an entire build of the Linux kernel.  It thrashes the one 
>> entry LastFileIDLookup cache, and we end up looking up the same FileID again 
>> and again and again for each token when we expand nested function like 
>> macros.
>>
>> Is there anything we can do to speed this up?
>
> @hokein and I spent some time looking at this (initially trying to understand 
> behavior, now performance).

Wonderful! I'm glad to see you also identified this change in particular.  
Thanks as well for looking into this code.

> Short version is:
>
> - we can *simplify* the code a lot, we think it's now just partitioning based 
> on FileID and this can be done more clearly. This may have some speedups at 
> the margin.
> - in terms of performance: I suspect when clang is built by GCC it's doing 
> roughly 2x as much work as when it's built by clang. @nickdesaulniers can you 
> tell me which you're measuring/deploying? To give some idea if we're likely 
> to actually help much...

In all of my personal measurements, I've been bootstrapping clang with AOSP's 
clang, and `updateConsecutiveMacroArgTokens` is still the worst method in 
profiles.

But I wouldn't be surprised if other Linux distro's like RHEL bootstrap their 
clang distribution via GCC.  @tstellar or @serge-sans-paille or @nikic might 
know.  We did get a curious comment from a kernel developer recently claiming 
that clang was "twice as slow as GCC" which didn't make much sense; not sure if 
it was an exaggeration vs. precise measurement, but I wouldn't be surprised if 
evaluation order you identified plays into this, making the worst method even 
slower.  I'll try to find a link to the thread...

My summer intern @justinstitt looked into this case again briefly.  We found 
that minor improvements to `SourceManager::isWrittenInSameFile` to avoid a few 
more calls to `getFileID` to be irrelevant in terms of performance improvement. 
But we also didn't consider GCC-built-clang; we were using clang to bootstrap 
clang.

---

Meanwhile, I think besides evaluating the high level logic in in `TokenLexer` 
and how it might be improved, I think there's potentially an opportunity for a 
"AOS vs. SOA" speedup in `SourceManager`.  
`SourceManager::LoadedSLocEntryTable` is a 
`llvm::SmallVector<SrcMgr::SLocEntry>`. `SourceManager::getFileIDLoaded` only 
really cares about the `SLocEntry`'s `Offset`. I suspect we could get major 
memory locality wins by packing those into a standalone vector so that we could 
search them faster.

> @hokein or I will try to find time to take a stab at this.

Awesome, please keep me in the loop.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D20401/new/

https://reviews.llvm.org/D20401

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to