alamb commented on PR #11587: URL: https://github.com/apache/datafusion/pull/11587#issuecomment-2250770960
> The idea is to calcualte an ideal_buffer_size, and if the actual buffer size is twice as larger, then we do gc. We also use the ideal_buffer_size to set optimal block_size value, so that we never waste a single byte. I think this heuristic sounds good > Calculating the ideal_buffer_size needs to traverse the views, it is actually cheap as the batches are pretty small for low cardinality filters, which is most cases. The worst case is that we need to check 8192 views, which is also not too bad. I agree Any chance you can add tests for this code showing how the heuristics work (perhaps either based on https://github.com/XiangpengHao/datafusion/pull/1 or directly merging it in)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
