Omega359 commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627152543
I'm of the opinion that while I could see the benefit of spark udfs in datafusion I really think they would be best handled as a datafusion-contrib. That is mostly for 3 reasons: 1. Most df users would never need them 2. It's more maintenance, and testing them is non-trivial (I spin up docker images of spark for my testing but it's single node - not clustered). It's generally slow, especially when compared to rust tests and sqllogictests. 3. Lastly, while spark is a common use case via comet, sail, etc for datafusion it's not the only one where a custom set of udf's might be useful. I'm not sure we want to say yes to spark but no to other udf suites. I personally think it would be awesome to have all the udf suites within a common repo where you could feature include just the suite you wanted and then either bulk add them to a context or pick and choose. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org