[C++] MakeReaderGenerator Behavior using GetCPUThreadPool

2022-07-25 Thread Ivan Chau
Hey all, While investigating the in-order behavior of the SourceNode, we found some interesting observations: 1) The ExecContext should use nullptr for its executor to guarantee any sequential behavior (as discussed previously). We found cases where our File BatchReader was reading out of order w

RE: [C++] ResumeProducing Future Causing Blocking

2022-07-22 Thread Ivan Chau
iew?usp=sharing Proposed Serial: https://drive.google.com/file/d/1JpQiIVaGAL9mrderkid5uf888zGx21fO/view?usp=sharing On Fri, Jul 22, 2022 at 12:32 PM Ivan Chau wrote: > > Hi Weston, > > Not sure if the diagrams came through here -- is there some other place I > need to view them? >

RE: [C++] ResumeProducing Future Causing Blocking

2022-07-22 Thread Ivan Chau
ps://github.com/apache/arrow/pull/12468 I will try and make an example diagram for threaded execution (executor != nullptr) tomorrow and also make some diagrams on how sequencing might be tackled. [1] https://github.com/apache/arrow/pull/12468 [2] https://issues.apache.org/jira/browse/ARROW-16072 [

RE: [C++] ResumeProducing Future Causing Blocking

2022-07-21 Thread Ivan Chau
duleTask, etc.), but I believe this waits for the task to complete, so it causes blocking in the processing. Do you have any suggestions for a temporary workaround? Ivan -Original Message- From: Ivan Chau Sent: Thursday, July 21, 2022 9:28 AM To: dev@arrow.apache.org Subject: RE: [

RE: [C++] ResumeProducing Future Causing Blocking

2022-07-21 Thread Ivan Chau
Producing/ResumeProducing > unfortunately. It is currently not tested anywhere as far as I can tell and > ignored by a lot of nodes (such as HashJoinNode). Michal and I have some work > in progress involving a new scheduler with first-class support for back > pressure. > > Sasha > > > On Jul 20, 2022, at 1:49 PM, Ivan Chau wrote: > > > > backpressure_future_ >

[C++] ResumeProducing Future Causing Blocking

2022-07-20 Thread Ivan Chau
Hi all, I am currently working on writing a manual back-pressure mechanism for AsOfJoin. We are trying a simple version where we maintain a buffer of batches from one of the input sources (our left table source). We want to pause production when we reach a certain number of batches, and resume pro

ExecutionContext, batch ordering clarification

2022-07-19 Thread Ivan Chau
Hi all, I am doing some investigations of the AsOfJoinNode, and consequently have come across some strange behavior when experimenting with the ExecutionContext and in-memory / file streaming source nodes. Our AsOfJoin algorithm requires that the input be in chronological order with respect to on

cpp: Debugging 'plan destruction before finishing'

2022-07-14 Thread Ivan Chau
Hi all, I've been encountering a "plan destruction before finishing" output occurring with the AsOfJoin node, particularly when joining large tables. My execution context is configured with the default memory pool and a nullptr for the executor. I am calling StartAndCollect

RE: cpp Memory Pool Clarification

2022-07-12 Thread Ivan Chau
n a std::vector which never > uses the memory pool for allocation. This is true even if I have 10,000 > columns and the vector's memory is actually quite large. > > However, in general, arrays tend to be quite large and schemas tend to be > quite small. > > > 2) How

RE: cpp Memory Pool Clarification

2022-07-11 Thread Ivan Chau
t seen any > calls to Allocate/Reallocate/Free? I don't think that is possible. On Mon, Jul 11, 2022 at 10:44 AM Ivan Chau wrote: > > Hi all, > > I've been doing some testing with LoggingMemoryPool to benchmark our > AsOfJoin implementation > <https://github.com

cpp Memory Pool Clarification

2022-07-11 Thread Ivan Chau
Hi all, I've been doing some testing with LoggingMemoryPool to benchmark our AsOfJoin implementation . Our underlying memory pool for the LoggingMemoryPool is the default_memory_pool (this is process-wide).

Adding cpp memory profiling to Arrow

2022-07-06 Thread Ivan Chau
Hi all, My name is Ivan -- some of you may know me from some of my contributions benchmarking node performances on Acero. Thank you for all the help so far! In addition to my runtime benchmarking, I am interested in pursuing some method of memory profiling to further assess our streaming capab