Re: [PR] GC `StringViewArray` in `CoalesceBatchesStream` [datafusion]

via GitHub Thu, 25 Jul 2024 08:57:27 -0700


alamb commented on PR #11587:
URL: https://github.com/apache/datafusion/pull/11587#issuecomment-2250770960


   > The idea is to calcualte an ideal_buffer_size, and if the actual buffer 
size is twice as larger, then we do gc.
   We also use the ideal_buffer_size to set optimal block_size value, so that 
we never waste a single byte.
   
   I think this heuristic sounds good
   
   > Calculating the ideal_buffer_size needs to traverse the views, it is 
actually cheap as the batches are pretty small for low cardinality filters, 
which is most cases. The worst case is that we need to check 8192 views, which 
is also not too bad.
   
   I agree
   
   
   Any chance you can add tests for this code showing how the heuristics work 
(perhaps either based on  https://github.com/XiangpengHao/datafusion/pull/1 or 
directly merging it in)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] GC `StringViewArray` in `CoalesceBatchesStream` [datafusion]

Reply via email to