alamb commented on issue #5600: URL: https://github.com/apache/datafusion/issues/5600#issuecomment-2628065146
> Perhaps as an alternative we could setup a datafusion-udfs (pick an appropriate name) under the apache umbrella and managed by datafusion pmc's where this could live? Just a thought. We can definitely do this with minimal organization overhead (aka create https://github.com/apache/datafusion-functions-spark for example) However then there would be an additional release that would need to happen and the release overhead (3 day voting, etc) is pretty high > I'm of the opinion that while I could see the benefit of spark udfs in datafusion I really think they would be best handled as a datafusion-contrib. That is mostly for 3 reasons: I personally think Spark functions satisfy my criteria for inclusion in the main datafusion repo: 1. are used by many other systems (not just Comet, for example) 2. have enough interest level we we would have the bandwidth to maintain them 3. Would be an excellent advertisement for DataFusion's extensibility Having Spark compatible functions in the main repo would also help relieve the tension of postgres vs Spark semantics in functions ## Proposal (looking for 🧱 -bats) 1. Make a `datafusion-functions-spark` crate in `datafusion/src/functios-spark` 2. Do *not* add a dependency on `datafusion` crate (`datafusion/core`) I think we could also do some neat things like allow datafusion-cli to run in either mode, such as ```shell $ datafusion-cli Running in Postges mode > ``` ```shell $ datafusion-cli --spark Running in Spark mode > ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org