Would this also explain the lack of allocations, reallocations or frees when creating a pipeline with just a source and a sink?
For example, we do not see logs for a regular source, a table source node, or a streaming file reader node (using RecordBatchFileReader and MakeReaderGenerator) to generate for a regular source node. -----Original Message----- From: Weston Pace <weston.p...@gmail.com> Sent: Monday, July 11, 2022 4:37 PM To: dev@arrow.apache.org Subject: Re: cpp Memory Pool Clarification > Is there anything else I'd need to change? Maybe try something like this: https://github.com/westonpace/arrow/commit/15ac0d051136c585cda63297e48f17557808d898 > Beyond that, we should also expect to see some allocations from > TableSourceNode going through the logging memory pool, even if AsOfJoinNode > was using the default memory pool instead of the Exec Plan's pool, but I am > not seeing anything come through... TableSourceNode wouldn't need to allocate since it runs against memory that's already been allocated. It might split input into smaller batches but slicing tables / arrays is a zero-copy operation that does not require allocating new buffers. On Mon, Jul 11, 2022 at 12:46 PM Ivan Chau <ivan.c...@twosigma.com> wrote: > > Yeah this behavior is certainly a bit strange then. > > The only alteration I am making is changing the way we create the Execution > Context in the benchmark file. > > Something like: > > ``` > auto logging_pool = LoggingMemoryPool(default_memory_pool()); > ExecContext ctx(&logging_pool, ...); > ``` > > Is there anything else I'd need to change? > > Beyond that, we should also expect to see some allocations from > TableSourceNode going through the logging memory pool, even if AsOfJoinNode > was using the default memory pool instead of the Exec Plan's pool, but I am > not seeing anything come through... > > -----Original Message----- > From: Weston Pace <weston.p...@gmail.com> > Sent: Monday, July 11, 2022 2:47 PM > To: dev@arrow.apache.org > Subject: Re: cpp Memory Pool Clarification > > Are you changing the default memory pool to a LoggingMemoryPool? > Where are you doing this? For a benchmark I think you would need to change > the implementation in the benchmark file itself. > > Similarly, is AsofJoinNode using the default memory pool or the memory pool > of the exec plan? It should be exclusively using the latter but it's easy > sometimes to overlook using the default memory pool. It probably won't make > too much of a difference at the end of the day as benchmarks normally > configure an exec plan to use the default memory pool and so the two pools > would be the same. > > > My expectation is that we would see some pretty sizable calls to Allocate > > when we begin to read files or to create tables, but that is not evident. > > Yes, the materializtion step of an asof join uses array builders and those > will be allocating buffers from a memory pool. > > > 1) To my understanding, only large allocations will call Allocate. > > Are there allocations (for files, table objects), which despite > > being of large size, do not call Allocate? > > No. There is no size limit for the allocator. Instead, when people were > talking about "large allocations" and "small allocations" in the previous > thread is was more of a general concept. > > For example, if I create an array builder, add some items to it, and then > create an array then this will always use a memory pool for the allocation. > This will be true even if I create an array with a single element in it (in > which case the allocation is often padded for alignment purposes). > > On the other hand, schemas keep their fields in a std::vector which never > uses the memory pool for allocation. This is true even if I have 10,000 > columns and the vector's memory is actually quite large. > > However, in general, arrays tend to be quite large and schemas tend to be > quite small. > > > 2) How can maximum_peak_memory be nonzero if we have not seen any > > calls to Allocate/Reallocate/Free? > > I don't think that is possible. > > On Mon, Jul 11, 2022 at 10:44 AM Ivan Chau <ivan.m.c...@gmail.com> wrote: > > > > Hi all, > > > > I've been doing some testing with LoggingMemoryPool to benchmark our > > AsOfJoin implementation > > <https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/exec/asof_join_node.cc>. > > Our underlying memory pool for the LoggingMemoryPool is the > > default_memory_pool (this is process-wide). > > > > Curiously enough, I don't see any allocations, reallocations, or > > frees when we run our benchmarking code. I also see that the > > max_memory property of the memory pool (which is documented as the > > peak memory allocation), is nonzero (1.2e9 bytes). > > > > My expectation is that we would see some pretty sizable calls to > > Allocate when we begin to read files or to create tables, but that is not > > evident. > > > > 1) To my understanding, only large allocations will call Allocate. > > Are there allocations (for files, table objects), which despite > > being of large size, do not call Allocate? > > > > 2) How can maximum_peak_memory be nonzero if we have not seen any > > calls to Allocate/Reallocate/Free? > > > > Thank you!