Re: Query of Arrow Flight SQL with S3 as a storage for parquet files

2024-11-18 Thread David Li
Hi Susmit, You can pass headers: see the documentation [1]. [1]: https://arrow.apache.org/adbc/current/python/api/adbc_driver_flightsql.html#adbc_driver_flightsql.ConnectionOptions.RPC_CALL_HEADER_PREFIX -David On Mon, Nov 18, 2024, at 16:16, Susmit Sarkar wrote: > Hi Community Members, > > We

Re: Query of Arrow Flight SQL with S3 as a storage for parquet files

2024-11-17 Thread Susmit Sarkar
Hi Community Members, We went ahead with duckdb, i have a very basic query, with this setup can we use Arrow ADBC, to interact with the flight sql server which internally is a wrapper on top of DuckDB to query the data from s3 and stream back to client For every client request the credentials i m

Re: Query of Arrow Flight SQL with S3 as a storage for parquet files

2024-10-16 Thread Susmit Sarkar
Thank you, will keep posted in the same thread Regards, Susmit On Wed, Oct 16, 2024 at 9:45 PM Weston Pace wrote: > > Do you folks believe Duckdb and Datafusion (latter being similar to spark > sql) will be an overkill? > > No, I don't believe it would be overkill. > > I also wouldn't compare e

Re: Query of Arrow Flight SQL with S3 as a storage for parquet files

2024-10-16 Thread Weston Pace
> Do you folks believe Duckdb and Datafusion (latter being similar to spark sql) will be an overkill? No, I don't believe it would be overkill. I also wouldn't compare either one to Spark SQL. Spark SQL is meant to be a distributed query engine that typically requires a cluster of some sort to o

Re: Query of Arrow Flight SQL with S3 as a storage for parquet files

2024-10-16 Thread Susmit Sarkar
Thanks David and Felipe for your help, I will definitely try out and keep you folks updated. Do you folks believe Duckdb and Datafusion (latter being similar to spark sql) will be an overkill? Thanks, Susmit On Wed, Oct 16, 2024 at 8:25 PM Felipe Oliveira Carvalho < felipe...@gmail.com> wrote:

Re: Query of Arrow Flight SQL with S3 as a storage for parquet files

2024-10-16 Thread Felipe Oliveira Carvalho
Hi Susmit, For an example of what David Li is proposing, you can take a look at this project (https://github.com/voltrondata/sqlflite). It's a Flight SQL server (in C++ though) that can forward queries to either SQLite or DuckDB. -- Felipe On Wed, Oct 16, 2024 at 10:22 AM David Li wrote: > If

Re: Query of Arrow Flight SQL with S3 as a storage for parquet files

2024-10-16 Thread David Li
If your clients are sending full SQL queries to be executed, and you need to execute them against S3 on the server, why not consider something like Apache DataFusion or DuckDB to implement that part instead of building the query parser/engine yourself? (There are probably already examples of wra

Query of Arrow Flight SQL with S3 as a storage for parquet files

2024-10-16 Thread Susmit Sarkar
Hi Community Members We are planning to build an Arrow flight server on top of data lying in s3. *Detailed Use Case:* The requirement is we need to sync data from HDFS to a short term storage S3 is our case. Basically a DataSync Service between cloud storages I have already built the servic