alamb commented on PR #16409:
URL: https://github.com/apache/datafusion/pull/16409#issuecomment-2973641672

   > I'm personally intrigued tbh but I'd say the DF core should be agnostic of 
specific data-driven architecture(like Spark) even if we do a lot of Spark 
integration like Sail or Comet.
   > 
   > imho data-driven arch should be living and addressing in some bridge 
project which take care on Apache Spark specifics comparing to DF, including 
INT96, decimals, nested types, some null handlings, etc..
   > 
   > @alamb @andygrove as initiators WDYT?
   
   It is my opinion that both standalone .slt tests (such as are in this PR) 
and more substantial "run queries in both systems and compare" style 
integration tests (such as are in the comet repo) are needed. 
   
   The value of standalone .slt tests is they keep the barrier to contribution 
low(er) -- the hope is that we'll get the function library moved / ported over 
with community help and having the tests waiting will keep the context required 
by new contributors  low. 
   
   As valuable as having tests that actually start Spark, etc (as done in 
Comet) are, putting them in the main DataFusion repo would make it even harder 
to contribute to DataFusion due to having to set up the dependencies, 
understand spark errors, etc. 
   
   @comphead  if you are suggesting creating a new repository / project for 
running the integration tests I think that is quite an interesting idea, and 
maybe we can make a separate ticket. 
   
   BTW there is quite a bit of more discussion about spark functions testing 
strategy here for anyone else following along
   - 
https://github.com/apache/datafusion/pull/14392#pullrequestreview-2588483892
   
   # Suggestion 
   
   What I suggest we do is update the README.md file 
https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/spark/README.md,
 explaining what is going on
   
   Maybe we can add a section something like the following
   
   ```
   ## Implementation Status
   
   Implementing the `datafusion-spark` compatible functions project is still a 
work in progress. Many of the tests in this directory are commented out and are 
waiting for help with implementation
   
   For more information please see:
   * [The `datafusion-spark` 
Epic](https://github.com/apache/datafusion/issues/15914)
   * The porting script (see here for script)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to