gabotechs commented on PR #22010:
URL: https://github.com/apache/datafusion/pull/22010#issuecomment-4386367667

   > @gabotechs I notice peak memory is quite a bit higher here?
   
   I can imagine how this can happen in cases where the fanout is very big, it 
boils down to the gating mechanism implemented in `RepartitionExec` today:
   
   
https://github.com/apache/datafusion/blob/dcf648255b92a34798871139aeba12d95f8f3032/datafusion/physical-plan/src/repartition/distributor_channels.rs#L21-L36
   
   Before this PR, the batches that were flowing through there where of size 
`batch_size / output_partitions`, but with this PR, they are of size 
`batch_size`.
   
   The memory reporting there seems quite unstable though, for example, this 
other runs show the same peak memory usage:
   - https://github.com/apache/datafusion/pull/22010#issuecomment-4374065644
   - https://github.com/apache/datafusion/pull/22010#issuecomment-4374088542
   - https://github.com/apache/datafusion/pull/22010#issuecomment-4374117175


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to