waruto210 commented on issue #13099:
URL: https://github.com/apache/datafusion/issues/13099#issuecomment-2514512002

   > > @alamb I would really appreciate any advice you could give when you have 
a moment.
   > 
   > I think we would have to get some detailed profiling to really know for 
sure, but I suspect that ClickBench has non trivial caches (buffer caching, 
page caches, etc)
   > 
   > DataFusion, as a serverless engine, does not have any such caching (the 
only difference between cold/hot run is that on the hot run, data from disk 
will be in the Linux page cache (so may not do any actual IO)
   > 
   > It might also help to break down which queries showed the biggest 
discrepancy -- were they queries that already ran in 100ms (in which case 
caching , avoiding re-reading metadata might be a bigger part of processing)
   
   For parquet files, ClickHouse uses local mode. In my understanding, in local 
mode, ClickHouse, like DataFusion, is a stateless query engine with only Linux 
page cache available. So I'm very surprised by these results. I will conduct 
more experiments to try to find out the reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to