Horizontal scaling design suggestion: Apache arrow flight

Vinay Kesarwani Fri, 18 Oct 2019 07:06:05 -0700

Hi,

I am trying to establish following architecture


My approach for flight horizontal scaling is to launch
1-Apache flight server in each node
2-one node declared as coordinator
3-Publish coordinator info to a shared service [zookeeper]
4-Launch worker node --> get coordinator node info from [zookeeper]
5-Worker publishes its info to [zookeeper] to consumed by others

Client connects to coordinator:
1- Calls getFlightInfo(desc)
2-Here Co-coordinator node overrides getFlightInfo()
3-getFlightInfo() method internally get worker info based on the descriptor
from zookeeper
4-Client consumes data from each endpoint in iterative manner OR in
parallel[not sure how]
-->getData()

PutData()
5-Client calls putdata() to put data in different nodes in flight stream
6-Iterate through the endpoints and matches worker node IP
7-if Worker IP matches with endpoint; worker put data in that node flight
server.
8-On putting any new stream/updated; worker node info is updated in
zookeeper
9-In case worker IP doesn't match with the endpoint we need to put data in
any other worker node; and publish the info in zookeeper.

[in future distributed-client and distributed end point] example: spark
workers to Apache arrow flight cluster

[image: image]
<https://user-images.githubusercontent.com/6141965/67092386-b0012c00-f1cc-11e9-9ce2-d657001a85f7.png>

Just wanted to discuss if any PR is in progress for horizontal scaling in
Arrow flight, or any design doc is under discussion.

Horizontal scaling design suggestion: Apache arrow flight

Reply via email to