ozankabak commented on PR #12186:
URL: https://github.com/apache/datafusion/pull/12186#issuecomment-2311761786

   We had similar challenges when using DataFusion in a context similar to 
yours (checkpointing etc.) I have consulted with my team on how we solved them, 
and discussed this general approach. I can share the following 
thoughts/concerns:
   
   - In some use cases, one wants an ID for every node the plan tree. In 
others, what is actually necessary is an ID per stream.
   - Having a default `None` value for IDs is potentially problematic 
(apparently we ran into bugs caused by this after initially trying such an 
approach).
   - One may want different types for the ID in different use cases (`usize` 
may not be appropriate for all)
   
   Therefore, what we think is the right approach is to traverse the tree (for 
plans or streams, depending on your use case), generate the IDs as you see fit, 
and store it in a map container that associates nodes with your IDs. I don't 
think doing this upstream inside `ExecutionProperties` is the right approach. 
This is also somewhat evident when you look at what is inside `PlanProperties`: 
Execution mode, ordering information, partitioning information, and equivalence 
information -- they are all derived properties of that emerge from the plan 
tree (not set externally)
   
   Apologies for not being able to provide feedback and start the detailed 
discussion before you prepared a PR, but in my defense you were too fast :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to