AdamGS opened a new issue, #17494:
URL: https://github.com/apache/datafusion/issues/17494

   When running TPC-DS q72, I've noticed that regardless of the underlying file 
format, latency increases dramatically even with relatively modest scale 
factors like 10. I've measured the query at around 2.4 seconds with SF=1, but 
over 60s when SF=10.
   
   When running in my benchmarking setup, the plan is (as you can see - its 
extremely join heavy) 
[here](https://gist.github.com/AdamGS/cea5816b321ca70323975c05d6048f36).
   
   Profiling the query using samply (This is with `branch-50` over parquet, 
SF=1):
   <img width="2244" height="1350" alt="Image" 
src="https://github.com/user-attachments/assets/8449ce83-cc22-48cd-942b-68f705766f22";
 />
   
   By playing around with it, seems like most time is spent in the loop inside 
the `chain_traverse` macro. I've tried a few common performance techniques - 
making it an explicitly inlined generic function, changing how the indices and 
values memory is managed/written to, but nothing made a noticeable difference.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to