linhr commented on issue #15914:
URL: https://github.com/apache/datafusion/issues/15914#issuecomment-2935366003

   >  @linhr has some ideas around making sqllogictests easier to work with
   
   Here is my idea to automate the test setup, without bringing Spark as a hard 
dependency.
   1. We create a Python script to generate SLT files.
       1. Define a set of input values for each data type, and optionally 
define custom input values for each function.
       2. Use PySpark to execute the function, get the output, and write the 
SLT file. Make sure the values are formatted in the way that the SLT engine can 
understand. Input data type casting can also be done here.
   2. For DataFusion developers who work on Spark functions, they run the 
Python script to update the SLT files whenever needed. PySpark is needed in a 
local virtualenv and we provide instructions for this setup.
   3. For DataFusion developers who do not work on Spark functions, they run 
the DataFusion tests in the existing way. They do not need to be aware of how 
the Spark functions SLT files are generated.
   4. We add a CI workflow that is triggered when Spark functions SLT files are 
changed, to make sure they are generated without unintended manual modification.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to