ameyc commented on PR #12186:
URL: https://github.com/apache/datafusion/pull/12186#issuecomment-2344426743
Taking a second look at implementing this the using a global hashmap with
pointers to Arc<ExecuptionPlan> and solution is somewhat more hacky and error
prone and as developers trying to build on top of DataFusion makes our
experience feel pretty brittle.
While we are interested in using the node_id for stateful checkpointing, it
does make sense for this to be on an ExecutionPlan since it is a node on
PhysicalPlan graph.
> In some use cases, one wants an ID for every node the plan tree (e.g. for
display/UI purposes). In others, what is actually necessary is an ID per stream
(any kind of stateful work).
Streams can easily, derive their id "{node_id}_{stream_id}"
> Having a default None value for IDs is potentially problematic (apparently
we ran into bugs caused by this after initially trying such an approach).
Presumably for any user interested in using the "node_id", they'd make their
operators implement "with_node_id".
> One may want different types for the ID in different use cases (usize may
not be appropriate for all).
For additional types of node_ids, I suppose one could maintain them in a
HashMap<usize, T> where usize is the node_id and not a ptr to Arc<dyn
ExecutionPlan>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]