[clang] Reduce memory usage in AST parent map generation by lazily checking if nodes have been seen (PR #129934)

via cfe-commits Wed, 05 Mar 2025 17:54:12 -0800

higher-performance wrote:

Ah sorry I forgot to reply to your review.


> We might take a look at `SetVector` and see if there are any learnings we can 
> take from that. It has a set and a vector together, but might have some 
> interesting things.

I did take a look at this actually, but it looked equivalent to what I was 
doing before, particularly with respect to storing the expensive keys 
(`DynTypedNode`) twice. It just uses `SmallVector<T, ...>` and `DenseSet<T>` 
underneath, whereas I just happened to use `SmallDenseSet<T, ...>` instead of 
`DenseSet<T>`.

> Finally... I wonder if we were to just store this as a sorted-vector of 
> `DynTypedNode` + `insert order`, then do a copy & sort on access of `view`.

I don't think that works. You don't want to do a ton of work on every `view()` 
-- it's assumed to be cheap, and called often. Also, maintaining sort order 
gets expensive the more nodes you add; you don't want to do it every insertion.

Furthermore not elements without memoization data aren't comparable, but they 
still need to appear in the `view` *in the same order as they were inserted*. 
This means either they need to be in the same vector as the comparable elements 
(which makes it impossible to sort the comparable portion), or we need to 
create a façade over two completely disjoint containers.

The implementation in this PR addresses all of those constraints without any 
drawbacks, I think. There's less code compared to writing a facade, and 
performance-wise it feels just about optimal.

> Can we do some benchmarking to figure out which things are done the 'most 
> often' with what values of `N`? it might better guide this.

Re: benchmarking -- note that I did a general benchmark both with the test case 
that was provided in the associated issue, as well as with an expensive TU that 
we had internally. It had great performance in both cases. Regarding N -- I 
assume N is referring to the total container sizes (per the other comment)? The 
thing is, I don't get the impression there's much room for improvement here. 
The memory usage was just high by a constant (10x) factor. Some ~5x of that 
almost certainly came from the data type itself (storing `DynTypedNode` in the 
cache which is 40 bytes, instead of `DynTypedNode*`, which is 8 bytes). Adding 
laziness on top of that has shrunk the difference so much that the performance 
plots are already very close to coinciding.

https://github.com/llvm/llvm-project/pull/129934
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Reduce memory usage in AST parent map generation by lazily checking if nodes have been seen (PR #129934)

Reply via email to