Thanks David and Felipe for your help, I will definitely try out and keep
you folks updated.

Do you folks believe Duckdb and Datafusion (latter being similar to spark
sql) will be an overkill?

Thanks,
Susmit

On Wed, Oct 16, 2024 at 8:25 PM Felipe Oliveira Carvalho <
felipe...@gmail.com> wrote:

> Hi Susmit,
>
> For an example of what David Li is proposing, you can take a look at this
> project (https://github.com/voltrondata/sqlflite). It's a Flight SQL
> server
> (in C++ though) that can forward queries to either SQLite or DuckDB.
>
> --
> Felipe
>
> On Wed, Oct 16, 2024 at 10:22 AM David Li <lidav...@apache.org> wrote:
>
> > If your clients are sending full SQL queries to be executed, and you need
> > to execute them against S3 on the server, why not consider something like
> > Apache DataFusion or DuckDB to implement that part instead of building
> the
> > query parser/engine yourself? (There are probably already examples of
> > wrapping both these projects in Flight SQL floating around.)
> >
> > On Wed, Oct 16, 2024, at 21:38, Susmit Sarkar wrote:
> > > Hi Community Members
> > >
> > >
> > > We are planning to build an Arrow flight server on top of data lying in
> > s3.
> > >
> > >
> > > *Detailed Use Case:*
> > >
> > >
> > > The requirement is we need to sync data from HDFS to a short term
> storage
> > > S3 is our case. Basically a DataSync Service between cloud storages
> > >
> > >
> > > I have already built the service using Apache Pekko / Akka HDFS & S3
> > > connectors, and data is in sync with HDFS & S3.
> > >
> > >
> > > Now comes the data reading part for end users. The data is stored in
> > > Cloudian s3 (Cloudian managed S3 not AWS) short term storage in
> parquet.
> > We
> > > want to build a Data as a Service on top of the data lying in S3 and
> > expose
> > > API endpoints for clients to query. The data lying will be short term,
> > data
> > > may be of week or months (max 3 months) use-cases varies from teams to
> > > teams.
> > >
> > >
> > > So we felt Apache Sql Flight Server will be the best suited for our use
> > > case and the client should send a FlightDescriptor object wrapped with
> > the
> > > sql query.
> > >
> > >
> > > We parsed the query and query s3 using the aws s3 sdks, and return the
> > > data, but the issue is we will end up building our own query parser,
> > which
> > > is a bigger task.
> > >
> > > Is there any other approach we can try out ?
> > >
> > >
> > > Thanks,
> > >
> > > Susmit
> >
>

Reply via email to