linhr commented on issue #15914: URL: https://github.com/apache/datafusion/issues/15914#issuecomment-2935366003
> @linhr has some ideas around making sqllogictests easier to work with Here is my idea to automate the test setup, without bringing Spark as a hard dependency. 1. We create a Python script to generate SLT files. 1. Define a set of input values for each data type, and optionally define custom input values for each function. 2. Use PySpark to execute the function, get the output, and write the SLT file. Make sure the values are formatted in the way that the SLT engine can understand. Input data type casting can also be done here. 2. For DataFusion developers who work on Spark functions, they run the Python script to update the SLT files whenever needed. PySpark is needed in a local virtualenv and we provide instructions for this setup. 3. For DataFusion developers who do not work on Spark functions, they run the DataFusion tests in the existing way. They do not need to be aware of how the Spark functions SLT files are generated. 4. We add a CI workflow that is triggered when Spark functions SLT files are changed, to make sure they are generated without unintended manual modification. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org