Re: Using Calcite with Python

Jacques Nadeau Mon, 31 Jan 2022 16:58:40 -0800

A couple of related (possibly useful?) pointers here:

   - Dask-sql [1] uses Calcite in a python context. Might be some good
   stuff to leverage there.
   - I'm working on compiling Calcite as a GraalVM shared native library
   [2] as part of Substrait [3] with the goal of ultimately having a friendly
   C binding [4] for use in non-jvm worlds. This connects to work being done
   by others to support tools like Arrow and Velox [5] as Substrait targets
   (and thus completing the path from c interface to native execution via
   Calcite).



[1] https://github.com/dask-contrib/dask-sql
[2] https://issues.apache.org/jira/browse/CALCITE-4786
[3] https://github.com/substrait-io/substrait/pull/120
[4] https://github.com/jacques-n/substrait/pull/3
[5] https://github.com/oap-project/gazelle-jni/tree/velox_dev

On Mon, Jan 31, 2022 at 3:32 PM Nicola Vitucci <[email protected]>
wrote:

> Hi Eugen, Michael, Gavin,
>
> Thank you very much for your input. Answering to your suggestions:
>
> - Phoenix client: I saw it but decided not to use it because it does not
> seem very active and up to date (its Avatica version is 1.10, while latest
> is 1.20). I may still give it a try though.
> - Arrow Flight: I think it can be very useful especially, like Michael
> mentioned, if it were integrated with Avatica as a transport; at the
> moment, though, it is not.
>
> I am basically looking for a (relatively) easy and ready to implement, easy
> to keep up to date, and reasonably performant solution. Although it incurs
> some overhead, a solution based on Python + Java seems to me the most
> reasonable for the time being. Do you have any other suggestions or
> recommendations?
>
> Thanks again,
>
> Nicola
>
>
>
> Il giorno lun 31 gen 2022 alle ore 17:04 Michael Mior <[email protected]>
> ha
> scritto:
>
> > Flight is definitely another consideration for the future. Personally I
> > think it would be most interesting to integrate Flight with Avatica as an
> > alternative transport. But it would certainly also be useful to allow the
> > Arrow adapter to connect to any Flight endpoint.
> >
> > --
> > Michael Mior
> > [email protected]
> >
> >
> > Le lun. 31 janv. 2022 à 10:00, Gavin Ray <[email protected]> a
> écrit :
> >
> > > This is really interesting stuff you've done in the example notebooks
> > >
> > > Nicola & Michael, I wonder if you could benefit from the
> > recently-released
> > > Arrow Flight SQL?
> > >
> > >
> >
> https://www.dremio.com/subsurface/arrow-flight-and-arrow-flight-sql-accelerating-data-movement/
> > >
> > > I have asked Jacques about this a bit -- it's meant to be a
> > standardization
> > > for communicating SQL queries and metadata with Arrow.
> > > I'm not intimately familiar with it, but it seems like it could be a
> good
> > > base to build a Calcite backend for Arrow from?
> > >
> > > They have a pretty thorough Java example in the repository:
> > >
> > >
> >
> https://github.com/apache/arrow/blob/968e6ea488c939c0e1f2bfe339a5a9ed1aed603e/java/flight/flight-sql/src/test/java/org/apache/arrow/flight/sql/example/FlightSqlExample.java#L169-L180
> > >
> > > On Mon, Jan 31, 2022 at 8:47 AM Michael Mior <[email protected]> wrote:
> > >
> > > > You may want to keep an eye on CALCITE-2040 (
> > > > https://issues.apache.org/jira/browse/CALCITE-2040). I have a
> student
> > > who
> > > > is working on a Calcite adapter for Apache Arrow. We're basically
> hung
> > up
> > > > waiting on the Arrow team to release a compatible JAR. This still
> won't
> > > > fully solve your problem though as the first version of the adapter
> is
> > > only
> > > > capable of reading from Arrow files. However, the goal is eventually
> to
> > > > allow passing a memory reference into the adapter so that it would be
> > > > possible to make use of Arrow data which is constructed in-memory
> > > > elsewhere.
> > > > --
> > > > Michael Mior
> > > > [email protected]
> > > >
> > > >
> > > > Le dim. 30 janv. 2022 à 17:36, Nicola Vitucci <
> > [email protected]>
> > > a
> > > > écrit :
> > > >
> > > > > Hi all,
> > > > >
> > > > > What would be the best way to use Calcite with Python? I've come up
> > > with
> > > > > two potential solutions:
> > > > >
> > > > > - using the jaydebeapi package, to connect via the JDBC driver
> > directly
> > > > > from a JVM created via jpype;
> > > > > - using Apache Arrow via the pyarrow package, to connect in
> basically
> > > the
> > > > > same way but creating Arrow objects with JdbcToArrowUtils (and
> > > optionally
> > > > > converting them to Pandas).
> > > > >
> > > > > Although the former is more straightforward, the latter allows to
> > > achieve
> > > > > better performance (see [1] for instance) since it's exactly what
> > Arrow
> > > > is
> > > > > for. I've created two Jupyter notebooks [2] showing each solution.
> > What
> > > > > would you recommend? Is there an even better approach?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Nicola
> > > > >
> > > > > [1] https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html
> > > > > [2]
> > > >
> https://github.com/nvitucci/calcite-sparql/tree/v0.0.2/examples/python
> > > > >
> > > >
> > >
> >
>

Re: Using Calcite with Python

Reply via email to