[QUESTION][C++] Understanding a StructType

2025-02-24 Thread Артем Тарасов
Hello everyone, I have a question regarding the StructType and its corresponding StructArray in Apache Arrow. >From the documentation, I understand that StructType is categorized as a NestedType, which is a data type whose full structure depends "on one or more other child types". This implies th

Re: distributed processing of RecordBatchReader.read_next_batch()?

2025-02-24 Thread Weston Pace
It sort of depends what your RecordBatchReader is doing under the hood. If it is NOT giving up the GIL then you should be fine as long as your processing is slower than your reading. However, if read_next_batch does not give up the GIL and that's your bottleneck, then your Ray app isn't going to

Re: distributed processing of RecordBatchReader.read_next_batch()?

2025-02-24 Thread Aldrin
I could be wrong, but fundamentally the best approach is for the reader to be maintained at the "server" ("the distributed database") and each client in the distributed compute environment to send get requests (either DoGet or some RPC/REST call to Next()). If you dont want to duplicate data, th

distributed processing of RecordBatchReader.read_next_batch()?

2025-02-24 Thread chris snow
I have a distributed database that returns query responses with a RecordBatchReader. I'd like to distribute consumption of the query response by iterating the reader across a distributed compute environment (ray.io). I.e. round robin the calling read_next_batch over different nodes of the cluste