Kontinuation commented on PR #1511: URL: https://github.com/apache/datafusion-comet/pull/1511#issuecomment-2736412022
I have ran TPC-H SF=100 benchmarks with various off-heap size configurations. The results showed that * interleave_record_batch is slower than main branch when no spilling happens. * interleave_record_batch is faster running Q10 when off-heap memory is less than 8GB, while the main branch could be slower than Spark because of excessive spilling. The following table shows detailed results. | On-heap size | Off-heap size | Spark 3.5.4 | Comet main | Comet interleave_record_batch | Bar plot | |--|--|--|--|--|--| | 3g | 3g | 1054 s | 551 s | 523 s |  | | 3g | 5g | 1050s | 512s | 522s |  | | 3g | 8g | 1032s | 490s | 492s |  | Comet main could be slower when running Q10 because it suffers from excessive spilling. Q10 shuffle writes batches containing string columns, the current shuffle writer implementation pre-allocates lots of space for string array builders so it consumes lots of memory even when only a few batches were ingested. We've already seen this in https://github.com/apache/datafusion-comet/issues/887. Here is the comparison of Spark metrics for CometExchange nodes: | Comet main | Comet interleave_record_batch | |--|--| | <img width="574" alt="comet-main-exchange" src="https://github.com/user-attachments/assets/5850e81e-8153-4be5-a7e9-aa7bd7cf021d" /> | <img width="571" alt="comet-interleave-exchange" src="https://github.com/user-attachments/assets/3c4e30b7-8ab6-469b-a6ce-111efce98e6f" /> | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org