2010YOUY01 commented on issue #14510: URL: https://github.com/apache/datafusion/issues/14510#issuecomment-2702809806
> I am interested in this function. > > [Tag memory that is allocated through the buffer manager, and add duckdb_memory() function by Mytherin · Pull Request #10496 · duckdb/duckdb](https://github.com/duckdb/duckdb/pull/10496) `DuckDB` adds memory tags when the buffer manager allocates memory, making it easier to trace memory usage. > > Does `DataFusion` also need to implement it in a similar way? No, they're different. I believe duckdb's memory pool is a textbook buffer pool, which manages memory spilling and reading back automatically for the operators. For DataFusion, `MemoryPool` perhaps is a misleading naming, it's actually a 'memory tracker', operators have to use this tracker to estimate if it has exceeded the memory limit. If so they will explicitly do the spilling, or fail with a user friendly error message. > I have just started to learn `DataFusion`’s memory management and I noticed: https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/trait.MemoryPool.html > > > Rather than tracking all allocations, DataFusion takes a pragmatic approach: Intermediate memory used as data streams through the system is not accounted (it assumed to be “small”) but the large consumers of memory must register and constrain their use. This design trades off the additional code complexity of memory tracking with limiting resource usage. > > Do we need to trace all memory allocations or just focus on the part managed by `MemoryPool`? Regarding tracking those small allocations by internal `MemoryPool`, maybe not. However, if we can use the system memory profiler, or internal metrics from a memory allocator (e.g. mimalloc), it would be great to also integrate them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org