waruto210 commented on issue #13099: URL: https://github.com/apache/datafusion/issues/13099#issuecomment-2514484066
I'm very pleased to see DataFusion achieving such results. However, I encountered some anomalies while trying to reproduce the benchmark, so I'd like to ask for some guidance. Following the scripts in the ClickBench repository, I ran ClickBench on partitioned parquet files. During the cold run phase, DataFusion was about 20% faster than ClickHouse, but in the hot run phase, DataFusion was about 20% slower than ClickHouse. We used a machine with specifications similar to c6a.4xlarge, featuring 16 vCPUs, 32GB of memory, and an SSD with 2GB/s bandwidth. Additionally, we ran ClickBench on a machine with similar specifications but using HDD, and the results were consistent - DataFusion was slower than ClickHouse in the hot run phase. This was quite unexpected, and I'd like to know if there might be some configuration/compilation parameters that could be causing this issue. @alamb I would really appreciate any advice you could give when you have a moment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
