Hi Susmit, You can pass headers: see the documentation [1].
[1]: https://arrow.apache.org/adbc/current/python/api/adbc_driver_flightsql.html#adbc_driver_flightsql.ConnectionOptions.RPC_CALL_HEADER_PREFIX -David On Mon, Nov 18, 2024, at 16:16, Susmit Sarkar wrote: > Hi Community Members, > > We went ahead with duckdb, i have a very basic query, with this setup can > we use Arrow ADBC, to interact with the flight sql server which internally > is a wrapper on top of DuckDB to query the data from s3 and stream back to > client > > For every client request the credentials i mean the access and secret keys > are passed as part of the doget API ticket information, is it possible to > pass the same with ADBC to flight server? > > *data_stream: FlightStreamReader = client.do_get(ticket)* > > > Thanks, > > Susmit > > On Wed, Oct 16, 2024 at 10:43 PM Susmit Sarkar <susmitsir...@gmail.com> > wrote: > >> Thank you, will keep posted in the same thread >> >> Regards, >> Susmit >> >> On Wed, Oct 16, 2024 at 9:45 PM Weston Pace <weston.p...@gmail.com> wrote: >> >>> > Do you folks believe Duckdb and Datafusion (latter being similar to >>> spark >>> sql) will be an overkill? >>> >>> No, I don't believe it would be overkill. >>> >>> I also wouldn't compare either one to Spark SQL. Spark SQL is meant to be >>> a distributed query engine that typically requires a cluster of some sort >>> to operate at full performance. A distributed query engine would probably >>> be overkill for your situation. >>> >>> Both DuckDb and Datafusion are meant to be lightweight, embeddable, single >>> node (i.e. not distributed) query engine libraries. These are probably a >>> good fit for your use case. >>> >>> -Weston >>> >>> On Wed, Oct 16, 2024 at 8:17 AM Susmit Sarkar <susmitsir...@gmail.com> >>> wrote: >>> >>> > Thanks David and Felipe for your help, I will definitely try out and >>> keep >>> > you folks updated. >>> > >>> > Do you folks believe Duckdb and Datafusion (latter being similar to >>> spark >>> > sql) will be an overkill? >>> > >>> > Thanks, >>> > Susmit >>> > >>> > On Wed, Oct 16, 2024 at 8:25 PM Felipe Oliveira Carvalho < >>> > felipe...@gmail.com> wrote: >>> > >>> > > Hi Susmit, >>> > > >>> > > For an example of what David Li is proposing, you can take a look at >>> this >>> > > project (https://github.com/voltrondata/sqlflite). It's a Flight SQL >>> > > server >>> > > (in C++ though) that can forward queries to either SQLite or DuckDB. >>> > > >>> > > -- >>> > > Felipe >>> > > >>> > > On Wed, Oct 16, 2024 at 10:22 AM David Li <lidav...@apache.org> >>> wrote: >>> > > >>> > > > If your clients are sending full SQL queries to be executed, and you >>> > need >>> > > > to execute them against S3 on the server, why not consider something >>> > like >>> > > > Apache DataFusion or DuckDB to implement that part instead of >>> building >>> > > the >>> > > > query parser/engine yourself? (There are probably already examples >>> of >>> > > > wrapping both these projects in Flight SQL floating around.) >>> > > > >>> > > > On Wed, Oct 16, 2024, at 21:38, Susmit Sarkar wrote: >>> > > > > Hi Community Members >>> > > > > >>> > > > > >>> > > > > We are planning to build an Arrow flight server on top of data >>> lying >>> > in >>> > > > s3. >>> > > > > >>> > > > > >>> > > > > *Detailed Use Case:* >>> > > > > >>> > > > > >>> > > > > The requirement is we need to sync data from HDFS to a short term >>> > > storage >>> > > > > S3 is our case. Basically a DataSync Service between cloud >>> storages >>> > > > > >>> > > > > >>> > > > > I have already built the service using Apache Pekko / Akka HDFS & >>> S3 >>> > > > > connectors, and data is in sync with HDFS & S3. >>> > > > > >>> > > > > >>> > > > > Now comes the data reading part for end users. The data is stored >>> in >>> > > > > Cloudian s3 (Cloudian managed S3 not AWS) short term storage in >>> > > parquet. >>> > > > We >>> > > > > want to build a Data as a Service on top of the data lying in S3 >>> and >>> > > > expose >>> > > > > API endpoints for clients to query. The data lying will be short >>> > term, >>> > > > data >>> > > > > may be of week or months (max 3 months) use-cases varies from >>> teams >>> > to >>> > > > > teams. >>> > > > > >>> > > > > >>> > > > > So we felt Apache Sql Flight Server will be the best suited for >>> our >>> > use >>> > > > > case and the client should send a FlightDescriptor object wrapped >>> > with >>> > > > the >>> > > > > sql query. >>> > > > > >>> > > > > >>> > > > > We parsed the query and query s3 using the aws s3 sdks, and return >>> > the >>> > > > > data, but the issue is we will end up building our own query >>> parser, >>> > > > which >>> > > > > is a bigger task. >>> > > > > >>> > > > > Is there any other approach we can try out ? >>> > > > > >>> > > > > >>> > > > > Thanks, >>> > > > > >>> > > > > Susmit >>> > > > >>> > > >>> > >>> >>