timsaucer commented on issue #696: URL: https://github.com/apache/datafusion-python/issues/696#issuecomment-2110851781
There are a couple of things we'll need to do that immediately come to mind: - A few of the examples have some differences with what the spec shows as the expected outcome. We need to check if these are a problem in my code or something to do with the size of the generated data set I used. I will take this on. - Decide if we want to run against the answers file or against the one line output shown in the spec. If we want to go against the answer file, update the test to use the query file parameters instead of the values in the spec. - Pull the code for every example into a function that can return a dataframe - Write code to compare the output dataframe to the return value from the test. We can probably limit decimal precision to 2 decimal places since this is designed to report at the level of $0.01 - Update CI to pull the dbgen docker image and generate a 1 Gb dataset, the smallest we can expect to get consistent results based on what I've read I hadn't originally expected the examples to be used in this way, but it's a very good idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
