A couple of related (possibly useful?) pointers here: - Dask-sql [1] uses Calcite in a python context. Might be some good stuff to leverage there. - I'm working on compiling Calcite as a GraalVM shared native library [2] as part of Substrait [3] with the goal of ultimately having a friendly C binding [4] for use in non-jvm worlds. This connects to work being done by others to support tools like Arrow and Velox [5] as Substrait targets (and thus completing the path from c interface to native execution via Calcite).
[1] https://github.com/dask-contrib/dask-sql [2] https://issues.apache.org/jira/browse/CALCITE-4786 [3] https://github.com/substrait-io/substrait/pull/120 [4] https://github.com/jacques-n/substrait/pull/3 [5] https://github.com/oap-project/gazelle-jni/tree/velox_dev On Mon, Jan 31, 2022 at 3:32 PM Nicola Vitucci <nicola.vitu...@gmail.com> wrote: > Hi Eugen, Michael, Gavin, > > Thank you very much for your input. Answering to your suggestions: > > - Phoenix client: I saw it but decided not to use it because it does not > seem very active and up to date (its Avatica version is 1.10, while latest > is 1.20). I may still give it a try though. > - Arrow Flight: I think it can be very useful especially, like Michael > mentioned, if it were integrated with Avatica as a transport; at the > moment, though, it is not. > > I am basically looking for a (relatively) easy and ready to implement, easy > to keep up to date, and reasonably performant solution. Although it incurs > some overhead, a solution based on Python + Java seems to me the most > reasonable for the time being. Do you have any other suggestions or > recommendations? > > Thanks again, > > Nicola > > > > Il giorno lun 31 gen 2022 alle ore 17:04 Michael Mior <mm...@apache.org> > ha > scritto: > > > Flight is definitely another consideration for the future. Personally I > > think it would be most interesting to integrate Flight with Avatica as an > > alternative transport. But it would certainly also be useful to allow the > > Arrow adapter to connect to any Flight endpoint. > > > > -- > > Michael Mior > > mm...@apache.org > > > > > > Le lun. 31 janv. 2022 à 10:00, Gavin Ray <ray.gavi...@gmail.com> a > écrit : > > > > > This is really interesting stuff you've done in the example notebooks > > > > > > Nicola & Michael, I wonder if you could benefit from the > > recently-released > > > Arrow Flight SQL? > > > > > > > > > https://www.dremio.com/subsurface/arrow-flight-and-arrow-flight-sql-accelerating-data-movement/ > > > > > > I have asked Jacques about this a bit -- it's meant to be a > > standardization > > > for communicating SQL queries and metadata with Arrow. > > > I'm not intimately familiar with it, but it seems like it could be a > good > > > base to build a Calcite backend for Arrow from? > > > > > > They have a pretty thorough Java example in the repository: > > > > > > > > > https://github.com/apache/arrow/blob/968e6ea488c939c0e1f2bfe339a5a9ed1aed603e/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java#L169-L180 > > > > > > On Mon, Jan 31, 2022 at 8:47 AM Michael Mior <mm...@apache.org> wrote: > > > > > > > You may want to keep an eye on CALCITE-2040 ( > > > > https://issues.apache.org/jira/browse/CALCITE-2040). I have a > student > > > who > > > > is working on a Calcite adapter for Apache Arrow. We're basically > hung > > up > > > > waiting on the Arrow team to release a compatible JAR. This still > won't > > > > fully solve your problem though as the first version of the adapter > is > > > only > > > > capable of reading from Arrow files. However, the goal is eventually > to > > > > allow passing a memory reference into the adapter so that it would be > > > > possible to make use of Arrow data which is constructed in-memory > > > > elsewhere. > > > > -- > > > > Michael Mior > > > > mm...@apache.org > > > > > > > > > > > > Le dim. 30 janv. 2022 à 17:36, Nicola Vitucci < > > nicola.vitu...@gmail.com> > > > a > > > > écrit : > > > > > > > > > Hi all, > > > > > > > > > > What would be the best way to use Calcite with Python? I've come up > > > with > > > > > two potential solutions: > > > > > > > > > > - using the jaydebeapi package, to connect via the JDBC driver > > directly > > > > > from a JVM created via jpype; > > > > > - using Apache Arrow via the pyarrow package, to connect in > basically > > > the > > > > > same way but creating Arrow objects with JdbcToArrowUtils (and > > > optionally > > > > > converting them to Pandas). > > > > > > > > > > Although the former is more straightforward, the latter allows to > > > achieve > > > > > better performance (see [1] for instance) since it's exactly what > > Arrow > > > > is > > > > > for. I've created two Jupyter notebooks [2] showing each solution. > > What > > > > > would you recommend? Is there an even better approach? > > > > > > > > > > Thanks, > > > > > > > > > > Nicola > > > > > > > > > > [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html > > > > > [2] > > > > > https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python > > > > > > > > > > > > > > >