Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

via GitHub Fri, 31 Jan 2025 04:33:06 -0800


Omega359 commented on issue #5600:
URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2627152543


   I'm of the opinion that while I could see the benefit of spark udfs in 
datafusion I really think they would be best handled as a datafusion-contrib. 
That is mostly for 3 reasons:
   
   1. Most df users would never need them
   2. It's more maintenance, and testing them is non-trivial (I spin up docker 
images of spark for my testing but it's single node - not clustered). It's 
generally slow, especially when compared to rust tests and sqllogictests.
   3. Lastly, while spark is a common use case via comet, sail, etc for 
datafusion it's not the only one where a custom set of udf's might be useful. 
I'm not sure we want to say yes to spark but no to other udf suites.
   
   I personally think it would be awesome to have all the udf suites within a 
common repo where you could feature include just the suite you wanted and then 
either bulk add them to a context or pick and choose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] [DISCUSSION] Add separate crate to cover spark builtin functions [datafusion]

Reply via email to