My initial inclination is towards #3 but I'd be curious what others think.
In the case of #3, I wonder if it makes sense to then pull the Schema off
the GetFlightInfo response...

On Fri, Jun 28, 2019 at 10:57 AM Ryan Murray <rym...@dremio.com> wrote:

> Hi All,
>
> I have been working on building an arrow flight source for spark. The goal
> here is for Spark to be able to use a group of arrow flight endpoints to
> get a dataset pulled over to spark in parallel.
>
> I am unsure of the best model for the spark <-> flight conversation and
> wanted to get your opinion on the best way to go.
>
> I am breaking up the query to flight from spark into 3 parts:
> 1) get the schema using GetFlightInfo. This is needed to do further lazy
> operations in Spark
> 2) get the endpoints by calling GetFlightInfo a 2nd time with a different
> argument. This returns the list endpoints on the parallel flight server.
> The endpoints are not available till data is ready to be fetched, which is
> done after the schema but is needed before DoGet is called.
> 3) call get stream on all endpoints from 2
>
> I think I have to do each step however I don't like having to call getInfo
> twice, it doesn't seem very elegant. I see a few options:
> 1) live with calling GetFlightInfo twice and with a custom bytes cmd to
> differentiate the purpose of each call
> 2) add an argument to GetFlightInfo to tell it its being called only for
> the schema
> 3) add another rpc endpoint: ie GetSchema(FlightDescriptor) to return just
> the Schema in question
> 4) use DoAction and wrap the expected FlightInfo in a Result
>
> I am aware that 4 is probably the least disruptive but I'm also not a fan
> as (to me) it implies performing an action on the server side. Suggestions
> 2 & 3 are larger changes and I am reluctant to do that unless there is a
> consensus here. None of them are great options and I am wondering what
> everyone thinks the best approach might be? Particularly as I think this is
> likely to come up in more applications than just spark.
>
> Best,
> Ryan
>

Reply via email to