ozankabak commented on PR #12186: URL: https://github.com/apache/datafusion/pull/12186#issuecomment-2311761786
We had similar challenges when using DataFusion in a context similar to yours (checkpointing etc.) I have consulted with my team on how we solved them, and discussed this general approach. I can share the following thoughts/concerns: - In some use cases, one wants an ID for every node the plan tree. In others, what is actually necessary is an ID per stream. - Having a default `None` value for IDs is potentially problematic (apparently we ran into bugs caused by this after initially trying such an approach). - One may want different types for the ID in different use cases (`usize` may not be appropriate for all) Therefore, what we think is the right approach is to traverse the tree (for plans or streams, depending on your use case), generate the IDs as you see fit, and store it in a map container that associates nodes with your IDs. I don't think doing this upstream inside `ExecutionProperties` is the right approach. This is also somewhat evident when you look at what is inside `PlanProperties`: Execution mode, ordering information, partitioning information, and equivalence information -- they are all derived properties of that emerge from the plan tree (not set externally) Apologies for not being able to provide feedback and start the detailed discussion before you prepared a PR, but in my defense you were too fast :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org