hi Vinay -- it would be ideal to discuss here on the mailing list if
possible for the benefit of others reading, particularly if it impacts
Flight development

- Wes

On Fri, Oct 25, 2019 at 8:07 AM Vinay Kesarwani <vnkesarw...@gmail.com> wrote:
>
> Hi Ryan,
>
> Thanks for your quick response.
>
> I am aligned with your references and would like to discuss further to take
> it forward.
>
> Thanks,
> Vinay
>
> On Fri, Oct 18, 2019 at 11:51 PM Ryan Murray <rym...@dremio.com> wrote:
>
> > Hey Vinay,
> >
> > This Spark source might be of interest [1]. We had discussed the
> > possibility of it being moved into Arrow proper as a contrib module when
> > more stable.
> >
> > This is doing something similar to what you are suggesting: talking to a
> > cluster of Flight servers from Spark. This deals more with the client side
> > and less with the server side however. It talks to a single Flight
> > 'coordinator' and uses getSchema/getFlightInfo to tell the coordinator it
> > wants a particular dataset. The coordinator then gives a list of flight
> > tickets with portions of the required datasets. A client can a) ask for the
> > entire dataset from the coordinator b) iterate serially through the tickets
> > and assemble the whole dataset on the client side or (in the case of the
> > Spark connector) fetch tickets in parallel.
> >
> > I think the server side as you described above doesn't yet exist in a
> > standalone form although the spark connector was developed in conjunction
> > with [2] as the server. This is however highly dependent on the
> > implementation details of the Dremio engine as it is taking care of the
> > coordination between the flight workers. The idea is identical to yours
> > however: a coordinator engine, a distributed store for engine meta, worker
> > engines which create/serve the Arrow buffers.
> >
> > Would be happy to discuss further if you are interested in working on this
> > stuff!
> >
> > Best,
> > Ryan
> >
> > [1] https://github.com/rymurr/flight-spark-source
> > [2] https://github.com/dremio-hub/dremio-flight-connector
> >
> > On Fri, Oct 18, 2019 at 3:05 PM Vinay Kesarwani <vnkesarw...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am trying to establish following architecture
> > >
> > > My approach for flight horizontal scaling is to launch
> > > 1-Apache flight server in each node
> > > 2-one node declared as coordinator
> > > 3-Publish coordinator info to a shared service [zookeeper]
> > > 4-Launch worker node --> get coordinator node info from [zookeeper]
> > > 5-Worker publishes its info to [zookeeper] to consumed by others
> > >
> > > Client connects to coordinator:
> > > 1- Calls getFlightInfo(desc)
> > > 2-Here Co-coordinator node overrides getFlightInfo()
> > > 3-getFlightInfo() method internally get worker info based on the
> > descriptor
> > > from zookeeper
> > > 4-Client consumes data from each endpoint in iterative manner OR in
> > > parallel[not sure how]
> > > -->getData()
> > >
> > > PutData()
> > > 5-Client calls putdata() to put data in different nodes in flight stream
> > > 6-Iterate through the endpoints and matches worker node IP
> > > 7-if Worker IP matches with endpoint; worker put data in that node flight
> > > server.
> > > 8-On putting any new stream/updated; worker node info is updated in
> > > zookeeper
> > > 9-In case worker IP doesn't match with the endpoint we need to put data
> > in
> > > any other worker node; and publish the info in zookeeper.
> > >
> > > [in future distributed-client and distributed end point] example: spark
> > > workers to Apache arrow flight cluster
> > >
> > > [image: image]
> > > <
> > >
> > https://user-images.githubusercontent.com/6141965/67092386-b0012c00-f1cc-11e9-9ce2-d657001a85f7.png
> > > >
> > >
> > > Just wanted to discuss if any PR is in progress for horizontal scaling in
> > > Arrow flight, or any design doc is under discussion.
> > >
> >
> >
> > --
> >
> > Ryan Murray  | Principal Consulting Engineer
> >
> > +447540852009 | rym...@dremio.com
> >
> > <https://www.dremio.com/>
> > Check out our GitHub <https://www.github.com/dremio>, join our community
> > site <https://community.dremio.com/> & Download Dremio
> > <https://www.dremio.com/download>
> >

Reply via email to