Re: Adding cpp memory profiling to Arrow

2022-07-06 Thread Weston Pace
Memory profiling would be very helpful. Thanks for looking into this. A few thoughts: * Peak allocation is an important number for many users. One major goal for Acero is to get to a point where it can constrain peak allocation to a preconfigured amount for a single query. We are close but not

Re: Adding cpp memory profiling to Arrow

2022-07-06 Thread Rok Mihevc
I'm also working on exposing jemalloc statistics [1] if you'd want to directly access those. Rok [1] https://github.com/apache/arrow/pull/13516 On Wed, Jul 6, 2022 at 11:40 PM Rok Mihevc wrote: > I'm also working on exposing jemalloc statistics if you'd want to directly > access those. > > Rok

Re: Adding cpp memory profiling to Arrow

2022-07-06 Thread Rok Mihevc
I'm also working on exposing jemalloc statistics if you'd want to directly access those. Rok On Wed, Jul 6, 2022 at 10:54 PM Ákos Hadnagy wrote: > Hi all, > > > As Will pointed it out, there’s an effort to integrate OTel and Acero, and > recently I did a few experiments to collect “big allocati

RE: Adding cpp memory profiling to Arrow

2022-07-06 Thread Ákos Hadnagy
Hi all, As Will pointed it out, there’s an effort to integrate OTel and Acero, and recently I did a few experiments to collect “big allocations” as events in the OTel traces. I haven’t made it into a PR yet, but if you’re interested, I can brush it up a bit and publish. Once you have the tra

Re: Adding cpp memory profiling to Arrow

2022-07-06 Thread Will Jones
Hi Ivan, Earlier we did add some instructions to profile memory allocations on the memory pools ("big allocations" as described by Sasha above). Docs are here [1]. If you do come up with some other method, it would be great to document in an adjacent section :) Another suggestion I heard a while

Re: Adding cpp memory profiling to Arrow

2022-07-06 Thread Sasha Krassovsky
Hi Ivan, Inside of Acero, we can think of allocations as coming in two classes: - "Big” allocations, which go through `MemoryPool`, using `Buffer`. These are used for representing columns of input data and hash tables. - “Small” allocations, which are usually small, local STL containers like st

Adding cpp memory profiling to Arrow

2022-07-06 Thread Ivan Chau
Hi all, My name is Ivan -- some of you may know me from some of my contributions benchmarking node performances on Acero. Thank you for all the help so far! In addition to my runtime benchmarking, I am interested in pursuing some method of memory profiling to further assess our streaming capab