waruto210 commented on issue #13099: URL: https://github.com/apache/datafusion/issues/13099#issuecomment-2514512002
> > @alamb I would really appreciate any advice you could give when you have a moment. > > I think we would have to get some detailed profiling to really know for sure, but I suspect that ClickBench has non trivial caches (buffer caching, page caches, etc) > > DataFusion, as a serverless engine, does not have any such caching (the only difference between cold/hot run is that on the hot run, data from disk will be in the Linux page cache (so may not do any actual IO) > > It might also help to break down which queries showed the biggest discrepancy -- were they queries that already ran in 100ms (in which case caching , avoiding re-reading metadata might be a bigger part of processing) For parquet files, ClickHouse uses local mode. In my understanding, in local mode, ClickHouse, like DataFusion, is a stateless query engine with only Linux page cache available. So I'm very surprised by these results. I will conduct more experiments to try to find out the reason. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
