ioeric added inline comments.
================ Comment at: clang-tools-extra/clangd/index/dex/Dex.cpp:251 Bytes += InvertedIndex.getMemorySize(); - for (const auto &P : InvertedIndex) - Bytes += P.second.bytes(); + for (const auto &TokenToPostingList : InvertedIndex) + Bytes += TokenToPostingList.first.Data.size() + ---------------- kbobyrev wrote: > ioeric wrote: > > kbobyrev wrote: > > > ioeric wrote: > > > > Would `InvertedIndex.getMemorySize()` be a better estimate? > > > IIUC this is only precise for the objects which do not make any > > > allocations (e.g. `std::vector` and `std::string` memory estimate would > > > not be "correct"). > > > > > > From the documentation > > > > > > > This is just the raw memory used by DenseMap. If entries are pointers > > > > to objects, the size of the referenced objects are not included. > > > > > > I also have `Bytes += InvertedIndex.getMemorySize();` above, so the > > > purpose of this code would be to take into account the size of these > > > "reference objects". > > > > > > However, since `PostingList::bytes()` also returns underlying > > > `std::vector` size, this code will probably add these `std::vector` > > > objects size twice (the first one was in > > > `InvertedIndex.getMemorySize()`). I should fix that. > > > `If entries are pointers to objects, the size of the referenced objects > > > are not included.` > > This applies to *pointers* to objects, but the `PostingList`s in > > InvertedList are not pointerd? So it seems to me that > > `InvertedIndex.getMemorySize()` covers everything in the `InvertedIndex`. > > One way to verify is probably check the size calculated with loop against > > `InvertedIndex.getMemorySize()`. > As discussed offline, pointers and references are an example of objects which > would be incorrectly measured by `DesneMap::getMemorySize()`, but > `std::vector` and `std::string` are pointers in a way because they also just > store a pointer to the underlying storage which is hidden from > `DenseMap::getMemorySize()`; thus, it would be more precise to use the > proposed scheme for memory estimation. I see. I was expecting DenseMap to maintain an arena which could provide memory usage of all owned data, but it seems to only consider `sizeof(KV-Pair)`... If the posting lists (vectors) are not covered, shouldn't we also also add the size of tokens (string) here? https://reviews.llvm.org/D52503 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits