ioeric added inline comments.

================
Comment at: clang-tools-extra/clangd/index/dex/Dex.cpp:251
   Bytes += InvertedIndex.getMemorySize();
-  for (const auto &P : InvertedIndex)
-    Bytes += P.second.bytes();
+  for (const auto &TokenToPostingList : InvertedIndex)
+    Bytes += TokenToPostingList.first.Data.size() +
----------------
kbobyrev wrote:
> ioeric wrote:
> > kbobyrev wrote:
> > > ioeric wrote:
> > > > Would `InvertedIndex.getMemorySize()` be a better estimate?
> > > IIUC this is only precise for the objects which do not make any 
> > > allocations (e.g. `std::vector` and `std::string` memory estimate would 
> > > not be "correct").
> > > 
> > > From the documentation
> > > 
> > > > This is just the raw memory used by DenseMap. If entries are pointers 
> > > > to objects, the size of the referenced objects are not included.
> > > 
> > > I also have `Bytes += InvertedIndex.getMemorySize();` above, so the 
> > > purpose of this code would be to take into account the size of these 
> > > "reference objects".
> > > 
> > > However, since `PostingList::bytes()` also returns underlying 
> > > `std::vector` size, this code will probably add these `std::vector` 
> > > objects size twice (the first one was in 
> > > `InvertedIndex.getMemorySize()`). I should fix that.
> > > `If entries are pointers to objects, the size of the referenced objects 
> > > are not included.`
> > This applies to *pointers* to objects, but the `PostingList`s in 
> > InvertedList are not pointerd? So it seems to me that 
> > `InvertedIndex.getMemorySize()` covers everything in the `InvertedIndex`. 
> > One way to verify is probably check the size calculated with loop against 
> > `InvertedIndex.getMemorySize()`.
> As discussed offline, pointers and references are an example of objects which 
> would be incorrectly measured by `DesneMap::getMemorySize()`, but 
> `std::vector` and `std::string` are pointers in a way because they also just 
> store a pointer to the underlying storage which is hidden from 
> `DenseMap::getMemorySize()`; thus, it would be more precise to use the 
> proposed scheme for memory estimation.
I see. I was expecting DenseMap to maintain an arena which could provide memory 
usage of all owned data, but it seems to only consider `sizeof(KV-Pair)`...

If the posting lists (vectors) are not covered, shouldn't we also also add the 
size of tokens (string) here?


https://reviews.llvm.org/D52503



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to