Hi Community Members

We are planning to build an Arrow flight server on top of data lying in s3.


*Detailed Use Case:*


The requirement is we need to sync data from HDFS to a short term storage
S3 is our case. Basically a DataSync Service between cloud storages


I have already built the service using Apache Pekko / Akka HDFS & S3
connectors, and data is in sync with HDFS & S3.


Now comes the data reading part for end users. The data is stored in
Cloudian s3 (Cloudian managed S3 not AWS) short term storage in parquet. We
want to build a Data as a Service on top of the data lying in S3 and expose
API endpoints for clients to query. The data lying will be short term, data
may be of week or months (max 3 months) use-cases varies from teams to
teams.


So we felt Apache Sql Flight Server will be the best suited for our use
case and the client should send a FlightDescriptor object wrapped with the
sql query.


We parsed the query and query s3 using the aws s3 sdks, and return the
data, but the issue is we will end up building our own query parser, which
is a bigger task.

Is there any other approach we can try out ?


Thanks,

Susmit

Reply via email to