[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-26 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. > I don't recall any issues on my last benchmark, but i'll run the patch across > all of our modular files and see if anything comes up. I ran the patch against ~1500 sources here and there were no unexpected warnings or failures. Repository: rG LLVM Github Monorepo C

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-26 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. In D109632#3088479 , @vsapsai wrote: > Code-wise I'm not aware of any remaining issues. Still need to update the > commit message and to re-run the clang test suite. But you can totally use > the patch for testing. I plan to updat

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-26 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. In D109632#3086044 , @rmaz wrote: > @vsapsai i'll abandon this diff then, thanks for your extensive feedback on > the approach. Is D110123 shippable > already, or are there some more corner cas

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-25 Thread Richard Howell via Phabricator via cfe-commits
rmaz abandoned this revision. rmaz added a comment. @vsapsai i'll abandon this diff then, thanks for your extensive feedback on the approach. Is D110123 shippable already, or are there some more corner cases to cover? Repository: rG LLVM Github Monorepo CH

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-25 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. In D109632#3085187 , @dexonsmith wrote: > But another benefit of not double-storing transitively imported methods is > that it makes the PCMs more independent, tacking slightly closer to > "ImmediateDep1.pcm" being reproducible ev

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-25 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added a comment. In D109632#3081580 , @vsapsai wrote: > We are targeting the use case where in impl.m we have > > @import ImmediateDep1; > @import ImmediateDep2; > ... > @import ImmediateDepN; > > and each of ImmediateDep has `@import S

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-24 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. Raw data with some additional visualization is available at https://observablehq.com/@vsapsai/method_pool-performance-comparison Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D109632/new/ https://reviews.llvm.org/D109632 _

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-22 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. We are targeting the use case where in impl.m we have @import ImmediateDep1; @import ImmediateDep2; ... @import ImmediateDepN; and each of ImmediateDep has `@import SharedDep;`. For simplicity we'll consider that we are working with a single selector as each of

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-21 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added a comment. In D109632#3079745 , @vsapsai wrote: > In D109632#3079455 , @rmaz wrote: > >> So given these numbers are we good to go ahead with set dedupe approach? > > I'd rather get an opinion on t

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-21 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. In D109632#3079455 , @rmaz wrote: > So given these numbers are we good to go ahead with set dedupe approach? I'd rather get an opinion on this from other reviewers. It's not purely a numbers game, there can be other reasons to p

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-21 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. I am seeing the clean build times behaving the same as the populated ones, just slower: | *File*| *Baseline (s)* | *Set Dedupe* | *No External* | | IGMFVC.mm | 230| 194 | 195 | | So given these numbers are we good to go ahead with set ded

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-21 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. Curiously, the results for the second project are unexpected and bizarre. | **File**| **Baseline (s)** | **Set Dedupe (s)** | **No external (s)** | **Set Dedupe (percentage of baseline)** | **No external (percentage of baseline)** | | Project2. File1 | 0.3785821

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-20 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. In D109632#3076648 , @rmaz wrote: > In D109632#3076586 , @vsapsai wrote: > >> I'm curious to get the results for an empty module cache because clean >> builds are also important for us. >

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-20 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. And for the empty module cache the numbers are | **File**| **Baseline (s)** | **Set Dedupe (s)** | **No external (s)** | **Set Dedupe (percentage of baseline)** | **No external (percentage of baseline)** | | Project1. File1 | 14.080673| 13.6296118

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-20 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. In D109632#3076586 , @vsapsai wrote: > Methodology: clear a modules cache, compile a file once to pre-populate the > cache, compile file 8 times and measure elapsed time, take the time average. This is the same approach I used, alt

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-20 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. Sorry for the delay, I haven't finished testing more projects yet but here are my preliminary numbers | **File**| **Baseline (s)** | **Set Dedupe (s)** | **No external (s)** | **Set Dedupe (percentage of baseline)** | **No external (percentage of baseline)** |

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-14 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. In D109632#3062608 , @vsapsai wrote: > I can be wrong but I like writing less data as it can result in savings for > more projects. For example, if compile time is dominated by method > deserialization and not by `Sema::addMethodTo

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-13 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. It's good to know `ASTReader::ReadMethodPool` is pretty close for both approaches. And as far as I understand, it includes the code ReadMethodPoolVisitor Visitor(*this, Sel, PriorGeneration); ModuleMgr.visit(Visitor); so the module visiting doesn't seem to be too ex

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-08 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. In D109632#3046976 , @vsapsai wrote: > These are interesting results. I'm curious to measure the time spent in > `ASTReader::ReadMethodPool`. Updated numbers from today, looks like we added a lot more modular deps since last week.

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-06 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. These are interesting results. I'm curious to measure the time spent in `ASTReader::ReadMethodPool`. I'm planning to preserve the order of methods roughly by `s/InstanceMethods.append/InstanceMethods.prepend/` in `ReadMethodPoolVisitor`. `isAcceptableMethodMismatch` is

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-04 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. In D109632#3037501 , @vsapsai wrote: > My assumption was that all dependent modules are in memory at this point. And > we visit transitive modules only once, so I don't expect it to be a big > performance hit (though I can be wrong

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-10-01 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. In D109632#3033489 , @rmaz wrote: > In D109632#3032381 , @vsapsai wrote: > >> What's interesting, I was able to trigger more diagnostic. Specifically, I >> got warnings about `length` amb

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-30 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. In D109632#3032381 , @vsapsai wrote: > What's interesting, I was able to trigger more diagnostic. Specifically, I > got warnings about `length` ambiguity because in NSStatusItem it is CGFloat >

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-29 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. I've updated D110123 to be the way I was planning it to be and I was testing locally with it. So far, with this change the speedup over a baseline is ~20%, though measurements aren't super rigorous. I.e., no multiple runs to warm up th

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-28 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. Avoiding serializing external methods in `ASTMethodPoolTrait` is working successfully, although the times are not as good as using set deduplication: | **file** | **baseline** | **set dedupe** | **no external** | | Total | 149.68 | 92.97 | 109.83

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-28 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. In D109632#3022209 , @vsapsai wrote: > That patch looks correct, I was experimenting with exactly the same local > change. Have you tried D110123 on your > original build? In my testing with synth

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-24 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. In D109632#3021644 , @rmaz wrote: > @vsapsai just checking on where we are with this one. Is the current plan to > investigate an approach only serializing the methods declared in each module, > or are we good to go ahead with t

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-24 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. In D109632#3013620 , @dexonsmith wrote: > In D109632#3013523 , @vsapsai wrote: > >> 2. Serialize only methods owned by the current module (and change >> `ReadMethodPoolVisitor` appropria

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-24 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. @vsapsai just checking on where we are with this one. Is the current plan to investigate an approach only serializing the methods declared in each module, or are we good to go ahead with the set de-duplication approach? I tried profiling with D110123

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-21 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added a comment. In D109632#3013523 , @vsapsai wrote: > 2. Serialize only methods owned by the current module (and change > `ReadMethodPoolVisitor` appropriately). Would that require visiting all in-memory modules every time there's a global

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-21 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. In D109632#3012647 , @rmaz wrote: >> What folks are thinking about writing less in METHOD_POOL? > > I prefer the idea of it, but I think the `ReadMethodPoolVisitor` also has to > be changed for this to work. When it finds a selec

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-21 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. > What folks are thinking about writing less in METHOD_POOL? I prefer the idea of it, but I think the `ReadMethodPoolVisitor` also has to be changed for this to work. When it finds a selector in a module it will return true, which causes the search to stop descending into

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-20 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. Looks like in `METHOD_POOL` we don't need the transitive closure for performance reasons. And we are kinda trying to store only data the module owns > // Only write this selector if it's not in an existing AST or something > // changed. But even if a single method i

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-20 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added inline comments. Comment at: clang/include/clang/Sema/Sema.h:1434-1436 +bool addMethod(ObjCMethodDecl *Method) { + return AddedMethods.insert(Method).second; +} rmaz wrote: > rmaz wrote: > > dexonsmith wrote: > > > Hmm, I was imagini

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-20 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added a comment. In D109632#3007615 , @vsapsai wrote: > I don't remember for sure but I don't think there is a consistent policy > about a module storing transitive data or only data it owns. I suspect we > might be using both approaches and

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-17 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. In D109632#3006520 , @rmaz wrote: > The case we have is more like: > > .m -> A -> long list of partially shared deps -> Foundation >-> B -> long list of partially shared deps -> Foundation >-> C -> long list of

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-17 Thread Richard Howell via Phabricator via cfe-commits
rmaz added inline comments. Comment at: clang/include/clang/Sema/Sema.h:1434-1436 +bool addMethod(ObjCMethodDecl *Method) { + return AddedMethods.insert(Method).second; +} rmaz wrote: > dexonsmith wrote: > > Hmm, I was imagining that the set would be

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-17 Thread Richard Howell via Phabricator via cfe-commits
rmaz added inline comments. Comment at: clang/include/clang/Sema/Sema.h:1434-1436 +bool addMethod(ObjCMethodDecl *Method) { + return AddedMethods.insert(Method).second; +} dexonsmith wrote: > Hmm, I was imagining that the set would be more encapsulat

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-17 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. In D109632#3005512 , @vsapsai wrote: > Thanks for the explanation! I'm still curious to reproduce the problem > locally and have created a test case generator > https://gist.github.com/vsapsai/f9d3603dde95eebd23248da4d7b4f5ec It cr

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-16 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. In D109632#3000709 , @rmaz wrote: > The speedup is coming from reducing the number of times > `Sema::addMethodToGlobalList` is called when looking up methods loaded from > modules. From what I can see each serialized module will

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-16 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added inline comments. Comment at: clang/include/clang/Sema/Sema.h:1434-1436 +bool addMethod(ObjCMethodDecl *Method) { + return AddedMethods.insert(Method).second; +} Hmm, I was imagining that the set would be more encapsulated than this,

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-16 Thread Richard Howell via Phabricator via cfe-commits
rmaz updated this revision to Diff 373072. rmaz added a comment. Update with single DenseSet per GlobalMethodPool Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D109632/new/ https://reviews.llvm.org/D109632 Files: clang/include/clang/Sema/Sema.h

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-16 Thread Richard Howell via Phabricator via cfe-commits
rmaz added inline comments. Comment at: clang/include/clang/Sema/Sema.h:1423-1426 +ObjCMethodList InstanceMethods; +ObjCMethodList FactoryMethods; +llvm::DenseSet AddedInstanceMethods; +llvm::DenseSet AddedFactoryMethods; dexonsmith wrote: > Do th

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-15 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added inline comments. Comment at: clang/include/clang/Sema/Sema.h:1423-1426 +ObjCMethodList InstanceMethods; +ObjCMethodList FactoryMethods; +llvm::DenseSet AddedInstanceMethods; +llvm::DenseSet AddedFactoryMethods; Do these two sets r

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-14 Thread Richard Howell via Phabricator via cfe-commits
rmaz updated this revision to Diff 372598. rmaz added a comment. update to fix lint warnings Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D109632/new/ https://reviews.llvm.org/D109632 Files: clang/include/clang/Sema/Sema.h clang/lib/Sema/SemaC

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-14 Thread Richard Howell via Phabricator via cfe-commits
rmaz updated this revision to Diff 372597. rmaz added a comment. Updated to use a GlobalMethods struct which contains the global method lists as well as the sets used to de-duplicate added methods. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D10963

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-14 Thread Richard Howell via Phabricator via cfe-commits
rmaz added a comment. In D109632#3000469 , @vsapsai wrote: > I'm a little bit confused here. Where is the speed-up coming from? Is it > because ObjCMethodList for `init` not being serialized? I'm asking because > right now I don't see how `seen` helps w

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-14 Thread Volodymyr Sapsai via Phabricator via cfe-commits
vsapsai added a comment. I'm a little bit confused here. Where is the speed-up coming from? Is it because ObjCMethodList for `init` not being serialized? I'm asking because right now I don't see how `seen` helps with that. My current understanding is that `Sema::addMethodToGlobalList` is too ex

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-14 Thread Richard Howell via Phabricator via cfe-commits
rmaz added inline comments. Comment at: clang/lib/Serialization/ASTReader.cpp:8188-8190 + for (auto *L = &List; L; L = L->getNext()) { +seen.insert(L->getMethod()); + } dexonsmith wrote: > I find quadratic algorithms a bit scary, even when current benchmark

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-14 Thread Duncan P. N. Exon Smith via Phabricator via cfe-commits
dexonsmith added a reviewer: vsapsai. dexonsmith added a subscriber: vsapsai. dexonsmith added a comment. Thanks for looking at this! I have a couple of comments inline. @vsapsai, can you also take a look? Comment at: clang/lib/Serialization/ASTReader.cpp:8188-8190 + for (aut

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-14 Thread Manman Ren via Phabricator via cfe-commits
manmanren added a comment. @dexonsmith @bruno: are you okay with this change? It looks good to me :] Thanks, Manman Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D109632/new/ https://reviews.llvm.org/D109632 ___

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-10 Thread Richard Howell via Phabricator via cfe-commits
rmaz added inline comments. Comment at: clang/lib/Serialization/ASTReader.cpp:8194 +if (seen.insert(M).second) { + S.addMethodToGlobalList(&List, M); +} manmanren wrote: > Does it make sense to check for duplication inside addMethodToGlobalList, as

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-10 Thread Manman Ren via Phabricator via cfe-commits
manmanren added a comment. This looks good to me in general. Since it should not change functionality, it may not be possible to write a test case. Manman Comment at: clang/lib/Serialization/ASTReader.cpp:8194 +if (seen.insert(M).second) { + S.addMethodToGlobalList(&

[PATCH] D109632: [clang] de-duplicate methods from AST files

2021-09-10 Thread Richard Howell via Phabricator via cfe-commits
rmaz created this revision. rmaz requested review of this revision. Herald added a project: clang. Herald added a subscriber: cfe-commits. This diff will de-duplicate methods read from AST files before inserting them in to a global method pool list. When reading ObjCMethodDecl from AST files we ca