ctsk commented on PR #16463:
URL: https://github.com/apache/datafusion/pull/16463#issuecomment-2994386331

   @Dandandan I believe that that heuristic does not make sense in this 
context. The reason why the gc is introduced here is mainly to reduce the size 
of the data buffer vector of StringView/ByteView arrays, not to save memory.
   
   Sadly, the condition wouldn't even trigger consistently if I used the same 
threshold, because the batches come from a CoalesceBatchesExec which already 
applied the same logic (before concat - but the ratio of data buffer size to 
referenced size would remain the same..)
   
   A static threshold for the number of data buffer sizes could make sense, but 
it seems fiddly to me. I've outlined in the associated issue, why I prefer 
fixing the issue in arrow-rs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to