Hey all,

While investigating the in-order behavior of the SourceNode, we found some
interesting observations:

1) The ExecContext should use nullptr for its executor to guarantee any
sequential behavior (as discussed previously). We found cases where our
File BatchReader was reading out of order with a multi-threaded ExecContext.
2) Ideally, to manage our memory footprint (via bounded queues), we would
like each of our inputs to belong to a single thread. This way, if
something blocks, it does not impact reading the input needed to unblock it
from another source. We found that using MakeReaderGenerator for our
in-memory table sources (the basis for our file reader source node) allows
us to do that by specifying an executor (separate thread pools) as a
parameter, and also suggests the following conditions:
  2a) Even when initialized with arrow::internal::GetCPUThreadPool(), it
seems each source node is dedicated to its own thread. We are not sure why
this is the case because of the shared nature of the pool, or if it is just
a coincidence.
  2b) Our initial implementation was creating separate memory pools with a
capacity of one thread for each of our sources with MakeEternal, which has
the same behaviors as 2a.

As an additional question, we added an assertion to check for ordering with
DCHECK_GE. I expected it to create some sort of Fatal exception when the
condition was false, but this doesn't seem to happen -- is this expected,
or does it change w.r.t the build type? We are currently on release
settings.

Ivan

Reply via email to