I could be wrong, but fundamentally the best approach is for the reader to be maintained at the "server" ("the distributed database") and each client in the distributed compute environment to send get requests (either DoGet or some RPC/REST call to Next()). If you dont want to duplicate data, the reader should exist in one place. If you want minimal overhead, then each client should make 1 request (or minimal requests) for some chunk of data. How that is implemented is going to depend on your frameworks (ray.io and how your database returns something to the reader). I'm going to guess that you maybe have a main client that is managed by the ray.io framework that is held in some distributed shared memory (or serves as the "server" the reader actually lives at) and each worker is going to have a client or wrapper that sends requests to the main client. I'm not sure if that's helpful for you, but since its a framework dependent question I would recommend asking devs of the frameworks (though maybe they watch this mailing list). Sent from Proton Mail for iOS On Mon, Feb 24, 2025 at 14:38, chris snow <chsnow...@gmail.com> wrote: I have a distributed database that returns query responses with a RecordBatchReader. I'd like to distribute consumption of the query response by iterating the reader across a distributed compute environment ( ray.io). I.e. round robin the calling read_next_batch over different nodes of the cluster, Is this possible? Ideally, with pyarrow, but I would be interested in other languages. Thanks!
signature.asc
Description: OpenPGP digital signature