waynexia commented on issue #13525: URL: https://github.com/apache/datafusion/issues/13525#issuecomment-2495492546
I can relate to (compared with GreptimeDB) the situation & challenges from @scsmithr's share: the dependency, a bit of headache upgrade procedure etc. I'd like to share some of my experiences: For the consideration of workload, we choose to upgrade DataFusion periodically instead of continuously (like Ubuntu vs. Arch). Hence (1) related dependencies are locked before the next upgrade and (2) need to handle a bunch of accumulated API changes and do a regression test. Since we rarely long for a new API from Arrow eagerly, the first point is acceptable for us. But 2 in contrast, can be classified as the most painful thing of the entire experience 🤣 I tried to conclude this, but it turns out the breaking change of existing API is not the root cause, as they are always explicit and can be solved easily, especially we (the DataFusion) will `#[deprecated]` an API for a while before removing it. However, those non-breaking changes at the API level are painful: they are implicit so you won't notice them until something went wrong. E.g.: a new property interface on the plan and some optimizer rules using this property to rewrite the plan. I haven't found a good solution or suggestion for this problem. I am not even sure if it's possible to maintain all those complex connections among so many plans and optimizers. Given DataFusion is highly extensible and allows user to define their own plan, type or rule etc, this becomes harder to handle. We can't write tests for things that don't exist.. For execution, I tried an approach (slides here https://github.com/apache/datafusion/discussions/10341#discussioncomment-10110273) that rewrote the plan to enable execution across multiple nodes, which seems to have a similar interface to the `RemotePhysicalPlanner` from your link. Though this is not yet completed and is still under (inactive 😢) development, it looks viable to me. >Blog about how to compile DataFusion for WASM?? Willing to draft one. Including some small lessons learned from making a Rust-WASM object and the API consideration. I wrote a few notes but never had the motivation (lazy, in other words 🙈) to organize them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
