shehabgamin commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2617036154
I love the idea of collaborating on Spark compatible `UDF`s. As of writing, `243/402` Spark functions doc-tests pass on Sail. We haven't focused on performance yet and instead have been focusing on just knocking all of them out because there are so many of them. Our implementations can be found: - https://github.com/lakehq/sail/tree/main/crates/sail-plan/src/function - https://github.com/lakehq/sail/tree/main/crates/sail-plan/src/extension/function I will say that we have encountered numerous problems relying on downstream DataFusion-based crates, to the extent that we have removed all of them as dependencies (and included appropriate `[Credit]` comments in various places to acknowledge the original sources). The issue isn't with the crates themselves but arises when it's time to upgrade DataFusion versions, requiring us to wait for each crate to update and release a new version. We haven't done as good a job as @andygrove and the Comet folks with documenting what we do and don't support (we're currently a small team of just two people). However, we do have test reports for every pull request to track coverage. Sail has also put lots of efforts in supporting PySpark’s 10+ APIs for Python `UDF`s/`UDAF`s/`UDWF`s/`UDTF`s. We support all the PySpark `UDF` types except one (the experimental `applyInPandasWithState()` method of `pyspark.sql.GroupedData`): - https://github.com/lakehq/sail/tree/main/crates/sail-python-udf - https://docs.lakesail.com/sail/latest/guide/tasks/udf/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org