So this is now both a Flight SQL producer and consumer for Spark? That is very 
cool.

A couple things I was wondering about:

- How do you think this compares to the Spark Connect proposal? [1]
- Have you considered ADBC [2] instead of Flight SQL for the DataSourceV2 
implementation? While still under development, the hope is to unify things like 
Flight SQL, arrow-jdbc, etc. under a single umbrella.
- Lastly, where do you see this progressing from here on out? Do you hope to 
upstream into Spark?

[1]: 
https://databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html
[2]: https://github.com/apache/arrow-adbc

-David

On Sat, Jul 23, 2022, at 21:44, Gavin Ray wrote:
> This sounds pretty darn nifty!
> I don't have much of value to offer, but the idea sounds like a great one
> to me =)
>
> On Sat, Jul 23, 2022 at 5:18 PM Tornike Gurgenidze <togur...@freeuni.edu.ge>
> wrote:
>
>> David, thank you for the reply.
>>
>> I recently managed to find the time to get back to the repo. I thought I
>> would post the status update for anyone interested.
>>
>> The project started out as just FlightSql implementation, but I ended up
>> splitting it into smaller components:
>>
>> 1. SparkFlightManager - a lower-level, more of a utility class, that
>> enables easier development of Spark-backed FlightServers. It is supposed to
>> take care of FlightServer cluster management, distribution of Spark query
>> results to the FlightServer nodes, service discovery and so on, permitting
>> a developer to focus on just expressing the intended business logic in
>> Spark. There's a reference FlightServer implementation (
>>
>> https://github.com/tokoko/SparkFlightSql/blob/main/src/main/scala/com/tokoko/spark/flight/example/SparkParquetFlightProducer.scala
>> )
>> that illustrates how a simple parquet reader server can be implemented
>> using SparkFlightManager.
>>
>> 2. SparkFlightSql - SparkFlightSqlProducer class that relies on
>> SparkFlightManager for most of the technical stuff and focuses on simply
>> mapping Spark Catalog API metadata to the FlightSql specification.
>>
>> 3. FlightSql DataSourceV2 - pretty self-explanatory, there's now also the
>> beginnings of a DataSourceV2 implementation supporting BATCH_READ.
>>
>> Once again, if anyone's interested enough to contribute or maybe has a use
>> case for SparkFlightManager, please feel free to reach out.
>> --
>> Tornike
>>
>> On Sun, May 29, 2022 at 5:26 AM David Li <lidav...@apache.org> wrote:
>>
>> > Hi Tornike,
>> >
>> > I'll have to take a closer look later when I can get back in front of a
>> > real computer but I just want to say that this is super awesome, and
>> thank
>> > you for sharing!
>> >
>> > I think we've kicked around the idea of "contrib" projects in the past.
>> > Maybe this can be the impetus to take up that idea? Regardless I want to
>> > say that if you have any questions or feedback about Arrow and Flight SQL
>> > please feel free to post it here.
>> >
>> > -David
>> >
>> > On Sat, May 28, 2022, at 18:48, Tornike Gurgenidze wrote:
>> > > Hi,
>> > >
>> > > I'm not sure this is the right place to be posting this, so I apologize
>> > in
>> > > advance.
>> > >
>> > > Recently I started a PoC for Arrow Flight SQL Server with Spark
>> backend (
>> > > https://github.com/tokoko/SparkFlightSql). The main goal is to create
>> a
>> > > SparkThriftServer alternative that will benefit from FlightSql protocol
>> > and
>> > > will also be distributed in nature, i.e. query results won't have to
>> pass
>> > > through a single server.
>> > >
>> > > I thought it might be interesting for those of you who are also
>> familiar
>> > > with Spark. I don't have much of an experience with Arrow, so I would
>> > > appreciate any sort of involvement from Arrow community.
>> > >
>> > > Regards,
>> > > Tornike
>>

Reply via email to