hi Vinay -- it would be ideal to discuss here on the mailing list if possible for the benefit of others reading, particularly if it impacts Flight development
- Wes On Fri, Oct 25, 2019 at 8:07 AM Vinay Kesarwani <vnkesarw...@gmail.com> wrote: > > Hi Ryan, > > Thanks for your quick response. > > I am aligned with your references and would like to discuss further to take > it forward. > > Thanks, > Vinay > > On Fri, Oct 18, 2019 at 11:51 PM Ryan Murray <rym...@dremio.com> wrote: > > > Hey Vinay, > > > > This Spark source might be of interest [1]. We had discussed the > > possibility of it being moved into Arrow proper as a contrib module when > > more stable. > > > > This is doing something similar to what you are suggesting: talking to a > > cluster of Flight servers from Spark. This deals more with the client side > > and less with the server side however. It talks to a single Flight > > 'coordinator' and uses getSchema/getFlightInfo to tell the coordinator it > > wants a particular dataset. The coordinator then gives a list of flight > > tickets with portions of the required datasets. A client can a) ask for the > > entire dataset from the coordinator b) iterate serially through the tickets > > and assemble the whole dataset on the client side or (in the case of the > > Spark connector) fetch tickets in parallel. > > > > I think the server side as you described above doesn't yet exist in a > > standalone form although the spark connector was developed in conjunction > > with [2] as the server. This is however highly dependent on the > > implementation details of the Dremio engine as it is taking care of the > > coordination between the flight workers. The idea is identical to yours > > however: a coordinator engine, a distributed store for engine meta, worker > > engines which create/serve the Arrow buffers. > > > > Would be happy to discuss further if you are interested in working on this > > stuff! > > > > Best, > > Ryan > > > > [1] https://github.com/rymurr/flight-spark-source > > [2] https://github.com/dremio-hub/dremio-flight-connector > > > > On Fri, Oct 18, 2019 at 3:05 PM Vinay Kesarwani <vnkesarw...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I am trying to establish following architecture > > > > > > My approach for flight horizontal scaling is to launch > > > 1-Apache flight server in each node > > > 2-one node declared as coordinator > > > 3-Publish coordinator info to a shared service [zookeeper] > > > 4-Launch worker node --> get coordinator node info from [zookeeper] > > > 5-Worker publishes its info to [zookeeper] to consumed by others > > > > > > Client connects to coordinator: > > > 1- Calls getFlightInfo(desc) > > > 2-Here Co-coordinator node overrides getFlightInfo() > > > 3-getFlightInfo() method internally get worker info based on the > > descriptor > > > from zookeeper > > > 4-Client consumes data from each endpoint in iterative manner OR in > > > parallel[not sure how] > > > -->getData() > > > > > > PutData() > > > 5-Client calls putdata() to put data in different nodes in flight stream > > > 6-Iterate through the endpoints and matches worker node IP > > > 7-if Worker IP matches with endpoint; worker put data in that node flight > > > server. > > > 8-On putting any new stream/updated; worker node info is updated in > > > zookeeper > > > 9-In case worker IP doesn't match with the endpoint we need to put data > > in > > > any other worker node; and publish the info in zookeeper. > > > > > > [in future distributed-client and distributed end point] example: spark > > > workers to Apache arrow flight cluster > > > > > > [image: image] > > > < > > > > > https://user-images.githubusercontent.com/6141965/67092386-b0012c00-f1cc-11e9-9ce2-d657001a85f7.png > > > > > > > > > > Just wanted to discuss if any PR is in progress for horizontal scaling in > > > Arrow flight, or any design doc is under discussion. > > > > > > > > > -- > > > > Ryan Murray | Principal Consulting Engineer > > > > +447540852009 | rym...@dremio.com > > > > <https://www.dremio.com/> > > Check out our GitHub <https://www.github.com/dremio>, join our community > > site <https://community.dremio.com/> & Download Dremio > > <https://www.dremio.com/download> > >