ayushi-agarwal opened a new issue, #10214: URL: https://github.com/apache/incubator-gluten/issues/10214
### Description Hi @marin-ma! In one of our workloads we are seeing high deserialize time in columnar exchange node when shuffle data is read. This is hash based shuffle. The partitions were set to 1920 initially, we tried to decrease it to 1000 and then to 500 which decreases this time but still it's significant. What I observerd is that the number of batches become really huge after shuffle read annd number of rows per batch is very low. It improves with decreasing shuffle partitions but if we decrease it too much, we are not able to utilize the parallelism for other operators in the stage after shuffle read like sort, window, join and project with UDF. Wanted to get your input on this issue. This is the screenshot with 1000 partitions. <img width="456" height="984" alt="Image" src="https://github.com/user-attachments/assets/f9ec4eab-9007-4dab-8a25-5be2c9efb2ef" /> This is with 500 partitions <img width="456" height="984" alt="Image" src="https://github.com/user-attachments/assets/1a5634ae-e43d-42c4-8177-87ba1488156f" /> The number of batches reduced from 148 million to 74 million but it's much higher than input batches which is 1.6 million. I tried sort based shuffle also but didn't see any improvement. Could you give some pointers that can help improve its performance? ### Gluten version None -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
