So this is now both a Flight SQL producer and consumer for Spark? That is very cool.
A couple things I was wondering about: - How do you think this compares to the Spark Connect proposal? [1] - Have you considered ADBC [2] instead of Flight SQL for the DataSourceV2 implementation? While still under development, the hope is to unify things like Flight SQL, arrow-jdbc, etc. under a single umbrella. - Lastly, where do you see this progressing from here on out? Do you hope to upstream into Spark? [1]: https://databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html [2]: https://github.com/apache/arrow-adbc -David On Sat, Jul 23, 2022, at 21:44, Gavin Ray wrote: > This sounds pretty darn nifty! > I don't have much of value to offer, but the idea sounds like a great one > to me =) > > On Sat, Jul 23, 2022 at 5:18 PM Tornike Gurgenidze <togur...@freeuni.edu.ge> > wrote: > >> David, thank you for the reply. >> >> I recently managed to find the time to get back to the repo. I thought I >> would post the status update for anyone interested. >> >> The project started out as just FlightSql implementation, but I ended up >> splitting it into smaller components: >> >> 1. SparkFlightManager - a lower-level, more of a utility class, that >> enables easier development of Spark-backed FlightServers. It is supposed to >> take care of FlightServer cluster management, distribution of Spark query >> results to the FlightServer nodes, service discovery and so on, permitting >> a developer to focus on just expressing the intended business logic in >> Spark. There's a reference FlightServer implementation ( >> >> https://github.com/tokoko/SparkFlightSql/blob/main/src/main/scala/com/tokoko/spark/flight/example/SparkParquetFlightProducer.scala >> ) >> that illustrates how a simple parquet reader server can be implemented >> using SparkFlightManager. >> >> 2. SparkFlightSql - SparkFlightSqlProducer class that relies on >> SparkFlightManager for most of the technical stuff and focuses on simply >> mapping Spark Catalog API metadata to the FlightSql specification. >> >> 3. FlightSql DataSourceV2 - pretty self-explanatory, there's now also the >> beginnings of a DataSourceV2 implementation supporting BATCH_READ. >> >> Once again, if anyone's interested enough to contribute or maybe has a use >> case for SparkFlightManager, please feel free to reach out. >> -- >> Tornike >> >> On Sun, May 29, 2022 at 5:26 AM David Li <lidav...@apache.org> wrote: >> >> > Hi Tornike, >> > >> > I'll have to take a closer look later when I can get back in front of a >> > real computer but I just want to say that this is super awesome, and >> thank >> > you for sharing! >> > >> > I think we've kicked around the idea of "contrib" projects in the past. >> > Maybe this can be the impetus to take up that idea? Regardless I want to >> > say that if you have any questions or feedback about Arrow and Flight SQL >> > please feel free to post it here. >> > >> > -David >> > >> > On Sat, May 28, 2022, at 18:48, Tornike Gurgenidze wrote: >> > > Hi, >> > > >> > > I'm not sure this is the right place to be posting this, so I apologize >> > in >> > > advance. >> > > >> > > Recently I started a PoC for Arrow Flight SQL Server with Spark >> backend ( >> > > https://github.com/tokoko/SparkFlightSql). The main goal is to create >> a >> > > SparkThriftServer alternative that will benefit from FlightSql protocol >> > and >> > > will also be distributed in nature, i.e. query results won't have to >> pass >> > > through a single server. >> > > >> > > I thought it might be interesting for those of you who are also >> familiar >> > > with Spark. I don't have much of an experience with Arrow, so I would >> > > appreciate any sort of involvement from Arrow community. >> > > >> > > Regards, >> > > Tornike >>