alamb commented on PR #16409: URL: https://github.com/apache/datafusion/pull/16409#issuecomment-2973641672
> I'm personally intrigued tbh but I'd say the DF core should be agnostic of specific data-driven architecture(like Spark) even if we do a lot of Spark integration like Sail or Comet. > > imho data-driven arch should be living and addressing in some bridge project which take care on Apache Spark specifics comparing to DF, including INT96, decimals, nested types, some null handlings, etc.. > > @alamb @andygrove as initiators WDYT? It is my opinion that both standalone .slt tests (such as are in this PR) and more substantial "run queries in both systems and compare" style integration tests (such as are in the comet repo) are needed. The value of standalone .slt tests is they keep the barrier to contribution low(er) -- the hope is that we'll get the function library moved / ported over with community help and having the tests waiting will keep the context required by new contributors low. As valuable as having tests that actually start Spark, etc (as done in Comet) are, putting them in the main DataFusion repo would make it even harder to contribute to DataFusion due to having to set up the dependencies, understand spark errors, etc. @comphead if you are suggesting creating a new repository / project for running the integration tests I think that is quite an interesting idea, and maybe we can make a separate ticket. BTW there is quite a bit of more discussion about spark functions testing strategy here for anyone else following along - https://github.com/apache/datafusion/pull/14392#pullrequestreview-2588483892 # Suggestion What I suggest we do is update the README.md file https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/spark/README.md, explaining what is going on Maybe we can add a section something like the following ``` ## Implementation Status Implementing the `datafusion-spark` compatible functions project is still a work in progress. Many of the tests in this directory are commented out and are waiting for help with implementation For more information please see: * [The `datafusion-spark` Epic](https://github.com/apache/datafusion/issues/15914) * The porting script (see here for script) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org