Re: [I] Ensure examples stay updated in CI. [datafusion-python]

via GitHub Tue, 14 May 2024 11:25:23 -0700


timsaucer commented on issue #696:
URL: 
https://github.com/apache/datafusion-python/issues/696#issuecomment-2110851781


   There are a couple of things we'll need to do that immediately come to mind:
   
   - A few of the examples have some differences with what the spec shows as 
the expected outcome. We need to check if these are a problem in my code or 
something to do with the size of the generated data set I used. I will take 
this on.
   - Decide if we want to run against the answers file or against the one line 
output shown in the spec. If we want to go against the answer file, update the 
test to use the query file parameters instead of the values in the spec.
   - Pull the code for every example into a function that can return a dataframe
   - Write code to compare the output dataframe to the return value from the 
test. We can probably limit decimal precision to 2 decimal places since this is 
designed to report at the level of $0.01
   - Update CI to pull the dbgen docker image and generate a 1 Gb dataset, the 
smallest we can expect to get consistent results based on what I've read
   
   I hadn't originally expected the examples to be used in this way, but it's a 
very good idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Ensure examples stay updated in CI. [datafusion-python]

Reply via email to