timsaucer commented on PR #1036:
URL: 
https://github.com/apache/datafusion-python/pull/1036#issuecomment-2721032760

   I am testing my updated consolidated code now. When running on a 10 GB scale 
factor TPC-H query, I get comparable times for both q1 and q2, which take 10 
and 53 s on my m4 pro to run at that scale. I will next test on 1gb scale 
factor and then the tiny batches that were discussed in #1015 
   
   
![render-with-limited-returns](https://github.com/user-attachments/assets/3544ce81-8bbf-4cf4-9a64-afc193c5fb8c)
   
   One metric is comparing `df.show()` with `df.__repr__()`. The former calls 
the previous code essentially. The latter is the updated call. I also tested 
against main to find comparable values.
   
   For q2 for example:
   
   df.show():  53.881720781326294
   df.__repr__(): 52.33351922035217
   
   
   When dropping down to a 1GB data set
   
   df.show() took:  0.8244500160217285
   df.__repr__() took 0.8161180019378662
   
   The same 1GB against main
   
   df.show() took:  0.8473942279815674
   df.__repr__() took 0.8100850582122803
   
   Finally, for the tiny dataset (increased to 3 record batches so we do get 
multiple processing steps)
   
   Average runtime over 100 runs: 0.001016 seconds (this branch)
   Average runtime over 100 runs: 0.001011 seconds (main)
   
   And lastly, to verify it also resolves #1014
   
   <img width="984" alt="image" 
src="https://github.com/user-attachments/assets/1a0ccd6a-ad0c-475f-a6ca-da1758930423";
 />
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to