andygrove commented on issue #1382: URL: https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2674736651
> [@kazuyukitanimura](https://github.com/kazuyukitanimura) I am not sure but I think the slowness comes from CometExchange that is executed after the join with BuildLeft > > this is for original Comet > > CometExchange > > shuffle records written: 65,254,713 > number of spills: 2,619 > shuffle write time total (min, med, max ) > 29.7 s (2 ms, 43 ms, 138 ms ) > number of input batches: 100,000 > records read: 65,254,713 > memory pool time total (min, med, max ) > 1.2 m (25 ms, 113 ms, 183 ms ) > local bytes read total (min, med, max ) > 513.1 MiB (4.3 MiB, 7.7 MiB, 9.5 MiB ) > fetch wait time total (min, med, max ) > 0 ms (0 ms, 0 ms, 0 ms ) > remote bytes read total (min, med, max ) > 3.4 GiB (37.6 MiB, 52.2 MiB, 54.1 MiB ) > repartition time total (min, med, max ) > 50.8 s (21 ms, 53 ms, 315 ms ) > decoding and decompression time total (min, med, max ) > 3.3 m (1.6 s, 2.4 s, 8.6 s ) > local blocks read: 171,154 > spilled bytes: 2,250,148,478,976 > remote blocks read: 1,160,828 > data size total (min, med, max ) > 4.4 GiB (4.5 MiB, 6.8 MiB, 7.2 MiB ) > native shuffle writer time total (min, med, max ) > 4.8 m (100 ms, 341 ms, 1.5 s ) > number of partitions: 2,000 > encoding and compression time total (min, med, max ) > 1.9 m (34 ms, 102 ms, 744 ms ) > remote reqs duration total (min, med, max ) > 1.2 m (339 ms, 639 ms, 2.2 s ) > shuffle bytes written total (min, med, max ) > 3.9 GiB (2.6 MiB, 6.2 MiB, 6.8 MiB ) > > and here is the metric with my change > > ``` > CometExchange > > shuffle records written: 65,254,713 > number of spills: 17,160 > shuffle write time total (min, med, max ) > 3.0 m (2 ms, 243 ms, 878 ms ) > number of input batches: 11,485,874 > records read: 65,254,713 > memory pool time total (min, med, max ) > 8.9 m (36 ms, 782 ms, 1.7 s ) > local bytes read total (min, med, max ) > 748.1 MiB (1611.3 KiB, 7.9 MiB, 9.8 MiB ) > fetch wait time total (min, med, max ) > 0 ms (0 ms, 0 ms, 0 ms ) > remote bytes read total (min, med, max ) > 4.8 GiB (12.7 MiB, 52.0 MiB, 55.9 MiB ) > repartition time total (min, med, max ) > 4.2 m (62 ms, 294 ms, 2.0 s ) > decoding and decompression time total (min, med, max ) > 18.8 m (7.4 s, 9.9 s, 36.4 s ) > local blocks read: 174,515 > spilled bytes: 16,134,291,652,608 > remote blocks read: 1,157,467 > data size total (min, med, max ) > 5.9 GiB (6.0 MiB, 9.1 MiB, 9.6 MiB ) > native shuffle writer time total (min, med, max ) > 22.1 m (152 ms, 1.7 s, 6.7 s ) > number of partitions: 2,000 > encoding and compression time total (min, med, max ) > 3.7 m (21 ms, 244 ms, 2.0 s ) > remote reqs duration total (min, med, max ) > 57.0 s (71 ms, 337 ms, 1.6 s ) > shuffle bytes written total (min, med, max ) > 5.6 GiB (2.1 MiB, 8.7 MiB, 9.2 MiB ) > ``` > > It is weird that most of the metrics including spill size and execution time get 7-8x higher. I don't know why it happens but I am trying to figure out. Spilling 16 TB (`16,134,291,652,608`) is definitely not desirable. I suspect that our spilling is currently too aggressive. Could you try allocating more off-heap memory to avoid spilling and see how the performance looks? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org