sammccall added inline comments.
================ Comment at: clangd/Symbol.h:37 +// The class presents a C++ symbol, e.g. class, function. +struct Symbol { + // The symbol identifier, using USR. ---------------- malaperle wrote: > sammccall wrote: > > malaperle wrote: > > > malaperle wrote: > > > > sammccall wrote: > > > > > hokein wrote: > > > > > > malaperle wrote: > > > > > > > sammccall wrote: > > > > > > > > hokein wrote: > > > > > > > > > malaperle wrote: > > > > > > > > > > I think it would be nice to have methods as an interface to > > > > > > > > > > get this data instead of storing them directly. So that an > > > > > > > > > > index-on-disk could go fetch the data. Especially the > > > > > > > > > > occurrences which can take a lot of memory (I'm working on > > > > > > > > > > a branch that does that). But perhaps defining that > > > > > > > > > > interface is not within the scope of this patch and could > > > > > > > > > > be better discussed in D40548 ? > > > > > > > > > I agree. We can't load all the symbol occurrences into the > > > > > > > > > memory since they are too large. We need to design interface > > > > > > > > > for the symbol occurrences. > > > > > > > > > > > > > > > > > > We could discuss the interface here, but CodeCompletion is > > > > > > > > > the main thing which this patch focuses on. > > > > > > > > > We can't load all the symbol occurrences into the memory > > > > > > > > > since they are too large > > > > > > > > > > > > > > > > I've heard this often, but never backed up by data :-) > > > > > > > > > > > > > > > > Naively an array of references for a symbol could be doc ID + > > > > > > > > offset + length, let's say 16 bytes. > > > > > > > > > > > > > > > > If a source file consisted entirely of references to > > > > > > > > 1-character symbols separated by punctuation (1 reference per 2 > > > > > > > > bytes) then the total size of these references would be 8x the > > > > > > > > size of the source file - in practice much less. That's not > > > > > > > > very big. > > > > > > > > > > > > > > > > (Maybe there are edge cases with macros/templates, but we can > > > > > > > > keep them under control) > > > > > > > I'd have to break down how much memory it used by what, I'll come > > > > > > > back to you on that. Indexing llvm with ClangdIndexDataStorage, > > > > > > > which is pretty packed is about 200MB. That's already a lot > > > > > > > considering we want to index code bases many times bigger. But > > > > > > > I'll try to come up with more precise numbers. I'm open to > > > > > > > different strategies. > > > > > > > > > > > > > I can see two points here: > > > > > > > > > > > > 1. For all symbol occurrences of a TU, it is not quite large, and > > > > > > we can keep them in memory. > > > > > > 2. For all symbol occurrences of a whole project, it's not a good > > > > > > idea to load all of them into memory (For LLVM project, the size of > > > > > > YAML dataset is ~1.2G). > > > > > (This is still a sidebar - not asking for any changes) > > > > > > > > > > The YAML dataset is not a good proxy for how big the data is (at > > > > > least without an effort to estimate constant factor). > > > > > And "it's not a good idea" isn't an assertion that can hold without > > > > > reasons, assumptions, and data. > > > > > If the size turns out to be, say, 120MB for LLVM, and we want to > > > > > scale to 10x that, and we're spending 500MB/file for ASTs, then it > > > > > might well be a good trade. > > > > > The YAML dataset is not a good proxy for how big the data is (at > > > > > least without an effort to estimate constant factor). > > > > > > > > Indeed. I'll try to come up with more realistic numbers. There are > > > > other things not accounted for in the 16 bytes mentioned above, like > > > > storing roles and relations. > > > > > > > > > 500MB/file for ASTs > > > > > > > > What do you mean? 500MB worth of occurrences per file? Or Preambles > > > > perhaps? > > > > What do you mean? 500MB worth of occurrences per file? Or Preambles > > > > perhaps? > > > > > > Oh I see, the AST must be in memory for fast reparse. I just tried > > > opening 3 files at the same time I it was already around 500MB. Hmm, > > > that's a bit alarming. > > Right, just that we have to consider RAM usage for the index in the context > > of clangd's overall requirements - if other parts of clangd use 1G of ram > > for typical work on a large project, then we shouldn't rule out spending a > > couple of 100MB on the index if it adds a lot of value. > Agreed we have to consider the overall requirements. I think over 1GB of RAM > is not good for our use case, whether is comes from the AST or index. I think > it's perfectly normal if we have different requirements but we can see > discuss how to design things so there are options to use either more RAM or > disk space. It seems the AST would be the most important factor for now so > perhaps it's something we should start investigating/discussing. Agree, this is another thing we can discuss tonight. ================ Comment at: clangd/index/Index.h:52 +private: + friend class llvm::DenseMapInfo<clang::clangd::SymbolID>; + ---------------- ioeric wrote: > warning: class 'DenseMapInfo' was previously declared as a struct > [-Wmismatched-tags] Fixed in rCTE320554 Repository: rCTE Clang Tools Extra https://reviews.llvm.org/D40897 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits