If your clients are sending full SQL queries to be executed, and you need to 
execute them against S3 on the server, why not consider something like Apache 
DataFusion or DuckDB to implement that part instead of building the query 
parser/engine yourself? (There are probably already examples of wrapping both 
these projects in Flight SQL floating around.)

On Wed, Oct 16, 2024, at 21:38, Susmit Sarkar wrote:
> Hi Community Members
>
>
> We are planning to build an Arrow flight server on top of data lying in s3.
>
>
> *Detailed Use Case:*
>
>
> The requirement is we need to sync data from HDFS to a short term storage
> S3 is our case. Basically a DataSync Service between cloud storages
>
>
> I have already built the service using Apache Pekko / Akka HDFS & S3
> connectors, and data is in sync with HDFS & S3.
>
>
> Now comes the data reading part for end users. The data is stored in
> Cloudian s3 (Cloudian managed S3 not AWS) short term storage in parquet. We
> want to build a Data as a Service on top of the data lying in S3 and expose
> API endpoints for clients to query. The data lying will be short term, data
> may be of week or months (max 3 months) use-cases varies from teams to
> teams.
>
>
> So we felt Apache Sql Flight Server will be the best suited for our use
> case and the client should send a FlightDescriptor object wrapped with the
> sql query.
>
>
> We parsed the query and query s3 using the aws s3 sdks, and return the
> data, but the issue is we will end up building our own query parser, which
> is a bigger task.
>
> Is there any other approach we can try out ?
>
>
> Thanks,
>
> Susmit

Reply via email to