shehabgamin commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2617036154

   I love the idea of collaborating on Spark compatible `UDF`s.
   
   As of writing, `243/402` Spark functions doc-tests pass on Sail. We haven't 
focused on performance yet and instead have been focusing on just knocking all 
of them out because there are so many of them. Our implementations can be found:
   - https://github.com/lakehq/sail/tree/main/crates/sail-plan/src/function
   - 
https://github.com/lakehq/sail/tree/main/crates/sail-plan/src/extension/function
   
   I will say that we have encountered numerous problems relying on downstream 
DataFusion-based crates, to the extent that we have removed all of them as 
dependencies (and included appropriate `[Credit]` comments in various places to 
acknowledge the original sources). The issue isn't with the crates themselves 
but arises when it's time to upgrade DataFusion versions, requiring us to wait 
for each crate to update and release a new version.
   
   We haven't done as good a job as @andygrove and the Comet folks with 
documenting what we do and don't support (we're currently a small team of just 
two people). However, we do have test reports for every pull request to track 
coverage.
   
   Sail has also put lots of efforts in supporting PySpark’s 10+ APIs for 
Python `UDF`s/`UDAF`s/`UDWF`s/`UDTF`s. We support all the PySpark `UDF` types 
except one (the experimental `applyInPandasWithState()` method of 
`pyspark.sql.GroupedData`):
   - https://github.com/lakehq/sail/tree/main/crates/sail-python-udf
   - https://docs.lakesail.com/sail/latest/guide/tasks/udf/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to