Hi Li, It'd depend on how exactly you expect everything to fit together, and I think the way you'd go about it would depend on what exactly the application is. For instance, you could have the application code do everything up through DoGet and get a reader, then create a SourceNode from the reader and continue from there.
Otherwise, I would think the way to go would be to be able to create a node from a FlightDescriptor (which would contain the URL/parameters in your example). In that case, I think it'd fit into Arrow Dataset, under ARROW-10524 [1]. In that case, I'd equate GetFlightInfo to dataset discovery, and each FlightEndpoint in the FlightInfo to a Fragment. As a bonus, there's already good integration between Dataset and Acero and this should naturally do things like read the FlightEndpoints in parallel with readahead and so on. That means: you'd start with the FlightDescriptor, and create a Dataset from it. This will call GetFlightInfo under the hood. (There's a minor catch here: this assumes the service that returns the FlightInfo can embed an accurate schema into it. If that's not true, there'll have to be some finagling with various ways of getting the actual schema, depending on what exactly your service supports.) Once you have a Dataset, you can create an ExecPlan and proceed like normal. Of course, if you then want to get things into Python, R, Substrait, etc... that requires some more work - especially for Substrait where I'm not sure how best to encode a custom source like that. [1]: https://issues.apache.org/jira/browse/ARROW-10524 -David On Wed, Aug 31, 2022, at 17:09, Li Jin wrote: > Hello! > > I have recently started to look into integrating Flight RPC with Acero > source/sink node. > > In Flight, the life cycle of a "read" request looks sth like: > > - User specifies a URL (e.g. my_storage://my_path) and parameter (e.g., > begin = "20220101", end = "20220201") > - Client issue GetFlightInfo and get FlightInfo from server > - Client issue DoGet with the FlightInfo and get a stream reader > - Client calls Nextuntil stream is exhausted > > My question is, how does the above life cycle fit in an Acero node? In > other words, what are the proper places in Acero node lifecycle to issue > the corresponding flight RPC? > > Appreciate any thoughts, > Li