[PATCH] D126723: [pseudo] GC GSS nodes, reuse them with a freelist

Haojian Wu via Phabricator via cfe-commits Wed, 01 Jun 2022 03:21:18 -0700

hokein added inline comments.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:87
     NewHeads.clear();
+    MaybeGC();
     glrReduce(PendingReduce, Params,
----------------
sammccall wrote:
> hokein wrote:
> > I would just call it before the `NewHeads.clear()`, and run the `gc` on the 
> > `NewHeads` directly.
> > 
> > It simplifies the implementation (no need to traverse three pending 
> > actions, and no duplicated elements in Roots). The downside is that some 
> > dead heads (heads with no available actions on `Terminal.symbol()`) will 
> > not be GCed in this run, but I think it is negligible, as those will be 
> > GCed in next run.
> Oops, I forgot that Pending* are all empty at the beginning of this loop. I 
> should probably add an assertion for that.
> 
> We can call it right at the top of the loop, yes? It doesn't need to be 
> between AddSsteps and NewHeads.clear().
> 
> > run the gc on the NewHeads directly
> Unfortunately not: the GC naturally consumes the argument (it becomes the 
> stack for the DFS) so we need a copy.
> We could move the copy into gc() but:
>  - this way we can reuse the same vector every time and don't have to allocate
>  - soon we'll have error-recovery candidates as additional roots we want to 
> pass in
> Oops, I forgot that Pending* are all empty at the beginning of this loop. I 
> should probably add an assertion for that.

Yes. I would probably do it after glrReduce and glrShift calls.

> We can call it right at the top of the loop, yes? It doesn't need to be 
> between AddSsteps and NewHeads.clear().

Yes, exactly.  We could also do that at the end of the loop.

> > run the gc on the NewHeads directly
> Unfortunately not: the GC naturally consumes the argument (it becomes the 
> stack for the DFS) so we need a copy.
> We could move the copy into gc() but:
>  - this way we can reuse the same vector every time and don't have to allocate
>  - soon we'll have error-recovery candidates as additional roots we want to 
> pass in

oh, I didn't notice that. I think making a copy is fine.



================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:391
+                              llvm::ArrayRef<const Node *> Parents) {
+  ++NodeCount;
+  Node *Result = new (allocate(Parents.size()))
----------------
sammccall wrote:
> hokein wrote:
> > The NodeCount becomes vague now, we update in both addNode and allocate 
> > methods, I think we should have two separate counts.
> - Oops, that duplicated ++NodeCount is just a mistake.
> - names are vague now, both "NodeCount" and "Allocated" are dubious. Maybe 
> I'll rename to `NodesCreated` and `Alive`, WDYT?
> - are there actually more metrics we want to record? we have nodes-created, 
> currently-live-nodes (Allocated.size()), nodes-destroyed (`NodeCount - 
> Allocated.size()`). We could track max-live-node but I think Arena.bytes() is 
> fine for our purposes.
> - do we want to expose any of those numbers through APIs? I couldn't work out 
> what I'd actually do with them.
> names are vague now, both "NodeCount" and "Allocated" are dubious. Maybe I'll 
> rename to NodesCreated and Alive, WDYT?

Sounds good to me.

> are there actually more metrics we want to record? we have nodes-created, 
> currently-live-nodes (Allocated.size()), nodes-destroyed (NodeCount - 
> Allocated.size()). We could track max-live-node but I think Arena.bytes() is 
> fine for our purposes.

I think these are sufficient, should provide enough details about 
GSS-memory-related information. The other things I will like to record is the 
max-active-heads, but that's a different bit.

> do we want to expose any of those numbers through APIs? I couldn't work out 
> what I'd actually do with them.

I think we should avoid to expose an API per field, we probably just need a 
single API which returns a Stats struct containing all these things. We can do 
it later.




================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:403
+  ++NodeCount;
+  if (FreeList.size() <= Parents)
+    FreeList.resize(Parents + 1);
----------------
sammccall wrote:
> hokein wrote:
> > What's the practical size of the `FreeList`? I think we can just find one, 
> > and use it as initial default size, to save some cost of resize. 
> It's whatever the max #parents is - maybe 10 or something?
> I don't think it's worth adding a hard-coded guess to save ~1 resize of a 
> single vector per parse!
I would image there will be multiple resize calls, if the path like 
`allocate(1), allocate(3), ..., allocate(max)`, which I think it is practical? 
unless the first call is `allocate(max)` which is unlikely I think.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126723/new/

https://reviews.llvm.org/D126723

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D126723: [pseudo] GC GSS nodes, reuse them with a freelist

Reply via email to