nickdesaulniers added subscribers: justinstitt, nikic, tstellar, serge-sans-paille. nickdesaulniers added a comment.
In D20401#3823476 <https://reviews.llvm.org/D20401#3823476>, @sammccall wrote: > In D20401#2770059 <https://reviews.llvm.org/D20401#2770059>, @nickdesaulniers > wrote: > >> I know this was sped up slightly in >> 3339c568c43e4644f02289e5edfc78c860f19c9f, but this change makes >> `updateConsecutiveMacroArgTokens` the hottest function in clang in a bottom >> up profile of an entire build of the Linux kernel. It thrashes the one >> entry LastFileIDLookup cache, and we end up looking up the same FileID again >> and again and again for each token when we expand nested function like >> macros. >> >> Is there anything we can do to speed this up? > > @hokein and I spent some time looking at this (initially trying to understand > behavior, now performance). Wonderful! I'm glad to see you also identified this change in particular. Thanks as well for looking into this code. > Short version is: > > - we can *simplify* the code a lot, we think it's now just partitioning based > on FileID and this can be done more clearly. This may have some speedups at > the margin. > - in terms of performance: I suspect when clang is built by GCC it's doing > roughly 2x as much work as when it's built by clang. @nickdesaulniers can you > tell me which you're measuring/deploying? To give some idea if we're likely > to actually help much... In all of my personal measurements, I've been bootstrapping clang with AOSP's clang, and `updateConsecutiveMacroArgTokens` is still the worst method in profiles. But I wouldn't be surprised if other Linux distro's like RHEL bootstrap their clang distribution via GCC. @tstellar or @serge-sans-paille or @nikic might know. We did get a curious comment from a kernel developer recently claiming that clang was "twice as slow as GCC" which didn't make much sense; not sure if it was an exaggeration vs. precise measurement, but I wouldn't be surprised if evaluation order you identified plays into this, making the worst method even slower. I'll try to find a link to the thread... My summer intern @justinstitt looked into this case again briefly. We found that minor improvements to `SourceManager::isWrittenInSameFile` to avoid a few more calls to `getFileID` to be irrelevant in terms of performance improvement. But we also didn't consider GCC-built-clang; we were using clang to bootstrap clang. --- Meanwhile, I think besides evaluating the high level logic in in `TokenLexer` and how it might be improved, I think there's potentially an opportunity for a "AOS vs. SOA" speedup in `SourceManager`. `SourceManager::LoadedSLocEntryTable` is a `llvm::SmallVector<SrcMgr::SLocEntry>`. `SourceManager::getFileIDLoaded` only really cares about the `SLocEntry`'s `Offset`. I suspect we could get major memory locality wins by packing those into a standalone vector so that we could search them faster. > @hokein or I will try to find time to take a stab at this. Awesome, please keep me in the loop. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D20401/new/ https://reviews.llvm.org/D20401 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits