2010YOUY01 commented on PR #14644:
URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2658837563

   > I had a hard time making DataFusion Comet work on cloud instances with 4GB 
memory per CPU core, partially because DataFusion is very likely to allocates 
more memory than reserved and run into OOM, or run into various kinds of memory 
reservation failures. 
   
   Yes, this feature is quite bug-prone, perhaps we should mark it as 
experimental to prevent someone to use it in production. Thank you so much for 
the efforts.
   
   Here are my thoughts on the changes
   
   > 1. Don't `try_collect` the result of merging all at once. We consume the 
merged batches one after another and reserve memory for each batch. Once the 
reservation fails we switch to "spill mode" and write all future batches into 
the spill file. This resolves the 2X memory allocation problem ("the second 
problem") reported by [Memory account not adding up in SortExec 
#10073](https://github.com/apache/datafusion/issues/10073), as well as this 
comment: [External sorting not working for (maybe only for string columns??) 
#12136 
(comment)](https://github.com/apache/datafusion/issues/12136#issuecomment-2642135559)
   
   > 3. Reserves more memory for ingested batches to leave some room for 
merging. This PR reserves 2X memory for each batch, this works for most of the 
queries in sort-tpch benchmark (all except Q5 and Q8). User still have to 
configure `sort_spill_reservation_bytes` when memory reserved is not big enough 
for merging. I don't think it is a good change but this is the only solution I 
can think of to compensate for the extra memory usage for the row 
representation of sorted columns.
   
   The 2X memory problem is: specifying a query to run under 100M memory, the 
measured physical memory is 200M. (though the reality is even worse than 2X 🤦🏼 
, see https://github.com/apache/datafusion/pull/14142)
   This is caused by, when the first time OOM happens:
   - there are already 1X batches in memory
   - Then it will be sorted and merged at once, meaning original batches should 
hold until all sorted runs are generated (now memory footprint is already 2X)
   - In the merging phase, additional col->row conversion consumes some extra 
memory (2X+)
   
   I can get the high-level idea of the more conservative memory accounting in 
point 3, it is used to also account for merged batches, but I get lost in the 
memory estimation details in the implementation (especially is this 2X 
estimation amplification only for merged batches, or also intermediate `Row`s), 
could you explain with a concrete example how everything is calculated? (for 
example, there is a `ExternalSorter` with 100M memory budget, and it will 
consume 10 batches, each with 20M size, how memory estimation is calculated in 
each step)
   
   > 2. `shrink_to_fit` every sorted batches reduce the memory footprint of 
sorted batches, otherwise sorted string arrays may take 2X the original space 
in the worst case, due to exponential growth of `MutableBuffer` for storing 
variable length binary values. `shrink_to_fit` is a no-op for primitive-type 
columns returned by `take_arrays` since they already have the right capacity, 
and benchmarking showed no significant performance regression for non-primitive 
types such as string arrays, so I think it is a good change. This resolves "the 
first problem" reported by [Memory account not adding up in SortExec 
#10073](https://github.com/apache/datafusion/issues/10073).
   
   I think the buffer resizing mechanism is not doubling each time, the default 
policy will allocate new constant size buffers 
https://docs.rs/arrow-array/54.1.0/src/arrow_array/builder/generic_bytes_view_builder.rs.html#120-122,
 so this change might not help
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to