Hello everyone,
I have a question regarding the StructType and its corresponding
StructArray in Apache Arrow.
>From the documentation, I understand that StructType is categorized as a
NestedType, which is a data type whose full structure depends "on one or
more other child types". This implies th
It sort of depends what your RecordBatchReader is doing under the hood. If
it is NOT giving up the GIL then you should be fine as long as your
processing is slower than your reading. However, if read_next_batch does
not give up the GIL and that's your bottleneck, then your Ray app isn't
going to
I could be wrong, but fundamentally the best approach is for the reader to be
maintained at the "server" ("the distributed database") and each client in the
distributed compute environment to send get requests (either DoGet or some
RPC/REST call to Next()).
If you dont want to duplicate data, th
I have a distributed database that returns query responses with a
RecordBatchReader.
I'd like to distribute consumption of the query response by iterating the
reader across a distributed compute environment (ray.io). I.e. round robin
the calling read_next_batch over different nodes of the cluste