2010YOUY01 commented on issue #14510:
URL: https://github.com/apache/datafusion/issues/14510#issuecomment-2702809806

   > I am interested in this function.
   > 
   > [Tag memory that is allocated through the buffer manager, and add 
duckdb_memory() function by Mytherin · Pull Request #10496 · 
duckdb/duckdb](https://github.com/duckdb/duckdb/pull/10496) `DuckDB` adds 
memory tags when the buffer manager allocates memory, making it easier to trace 
memory usage.
   > 
   > Does `DataFusion` also need to implement it in a similar way?
   
   No, they're different. I believe duckdb's memory pool is a textbook buffer 
pool, which manages memory spilling and reading back automatically for the 
operators.
   For DataFusion, `MemoryPool` perhaps is a misleading naming, it's actually a 
'memory tracker', operators have to use this tracker to estimate if it has 
exceeded the memory limit. If so they will explicitly do the spilling, or fail 
with a user friendly error message.
   
   > I have just started to learn `DataFusion`’s memory management and I 
noticed: 
https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/trait.MemoryPool.html
   > 
   > > Rather than tracking all allocations, DataFusion takes a pragmatic 
approach: Intermediate memory used as data streams through the system is not 
accounted (it assumed to be “small”) but the large consumers of memory must 
register and constrain their use. This design trades off the additional code 
complexity of memory tracking with limiting resource usage.
   > 
   > Do we need to trace all memory allocations or just focus on the part 
managed by `MemoryPool`?
   
   Regarding tracking those small allocations by internal `MemoryPool`, maybe 
not. However, if we can use the system memory profiler, or internal metrics 
from a memory allocator (e.g. mimalloc), it would be great to also integrate 
them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to