Re: JDBC Adapter for Apache-Arrow

Jacques Nadeau Tue, 09 Jan 2018 19:41:03 -0800

We have some stuff I  Dremio that we've planned on open sourcing but
haven't yet done so. We should try to get that out for others to consume.


On Jan 7, 2018 11:49 AM, "Uwe L. Korn" <uw...@xhochy.com> wrote:

> Has anyone made progress on the JDBC adapter yet?
>
> I recently came across a lot projects with good JDBC drivers but not so
> good drivers in Python. Having an Arrow-JDBC adaptor would make these query
> engines much more useful to the Python community. Being an Arrow committer
> and one of the turbodbc authors, I have quite some knowledge in this area
> but my Java is a bit rusty and I have never dealt with JDBC, so I‘m looking
> for someone to collaborate on this feature.
>
> Also this might be my ultimate chance to also get contributing to the Java
> part of Apache Arrow.
>
> Uwe
>
> > Am 07.11.2017 um 20:01 schrieb Julian Hyde <jh...@apache.org>:
> >
> > I have logged https://issues.apache.org/jira/browse/CALCITE-2040 (I
> > logged it within Calcite because this makes more sense that this is an
> > Arrow adapter within Calcite than a Calcite adapter within Arrow).
> >
> > Note the last paragraph about
> > https://issues.apache.org/jira/browse/CALCITE-2025 and bioinformatics
> > file formats. Readers for these formats would be useful extensions to
> > Arrow regardless of whether the data was ultimately going to be
> > queried using SQL. (Contributions welcome!) Calcite's bio adapter
> > would build upon the Arrow readers in two respects:  (1) to read
> > metadata from these files (e.g. are there any extra fields?) and (2)
> > to push down processing (filters, projects) into the reader.
> >
> > Julian
> >
> >
> > On Tue, Nov 7, 2017 at 10:21 AM, Atul Dambalkar
> > <atul.dambal...@xoriant.com> wrote:
> >> Hi,
> >>
> >> Don' t mean to interrupt the current discussion threads. But, based on
> the discussions so far on the JDBC Adapter piece, are we in a position to
> create a JIRA ticket for this as well as the other piece about adding a
> direct Arrow objects creation support from JDBC drivers? If yes, I can
> certainly go ahead and create JIRA for JDBC Adapter work.
> >>
> >> Julian, would you like to create the JIRA for the other item that you
> proposed.
> >>
> >> -Atul
> >>
> >> -----Original Message-----
> >> From: Atul Dambalkar
> >> Sent: Thursday, November 02, 2017 2:59 PM
> >> To: dev@arrow.apache.org
> >> Subject: RE: JDBC Adapter for Apache-Arrow
> >>
> >> I also like the approach of adding an interface and making it art of
> Arrow, so any specific JDBC driver can implement that interface to directly
> expose Arrow objects without having to create JDBC objects in the first
> place. One such implementation could be for Avatica itself what Julian was
> suggesting earlier.
> >>
> >> -----Original Message-----
> >> From: Julian Hyde [mailto:jh...@apache.org]
> >> Sent: Tuesday, October 31, 2017 4:28 PM
> >> To: dev@arrow.apache.org
> >> Subject: Re: JDBC Adapter for Apache-Arrow
> >>
> >> Yeah, I agree, it should be an interface defined as part of Arrow. Not
> driver-specific.
> >>
> >>> On Oct 31, 2017, at 1:37 PM, Laurent Goujon <laur...@dremio.com>
> wrote:
> >>>
> >>> I really like Julian's idea of unwrapping Arrow objects out of the
> >>> JDBC ResultSet, but I wonder if the unwrap class has to be specific to
> >>> the driver and if an interface can be designed to be used by multiple
> drivers:
> >>> for drivers based on Arrow, it means you could totally skip the
> >>> serialization/deserialization from/to JDBC records.
> >>> If such an interface exists, I would propose to add it to the Arrow
> >>> project, with Arrow product/projects in charge of adding support for
> >>> it in their own JDBC driver.
> >>>
> >>> Laurent
> >>>
> >>> On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar
> >>> <atul.dambal...@xoriant.com>
> >>> wrote:
> >>>
> >>>> Thanks for your thoughts Julian. I think, adding support for Arrow
> >>>> objects for Avatica Remote Driver (AvaticaToArrowConverter) can be
> >>>> certainly taken up as another activity. And you are right, we will
> >>>> have to look at specific JDBC driver to really optimize it
> individually.
> >>>>
> >>>> I would be curious if there are any further inputs/comments from
> >>>> other Dev folks, on the JDBC adapter aspect.
> >>>>
> >>>> -Atul
> >>>>
> >>>> -----Original Message-----
> >>>> From: Julian Hyde [mailto:jh...@apache.org]
> >>>> Sent: Tuesday, October 31, 2017 11:12 AM
> >>>> To: dev@arrow.apache.org
> >>>> Subject: Re: JDBC Adapter for Apache-Arrow
> >>>>
> >>>> Sorry I didn’t read your email thoroughly enough. I was talking about
> >>>> the inverse (JDBC reading from Arrow) whereas you are talking about
> >>>> Arrow reading from JDBC. Your proposal makes perfect sense.
> >>>>
> >>>> JDBC is quite a chatty interface (a call for every column of every
> >>>> row, plus an occasional call to find out whether values are null, and
> >>>> objects such as strings and timestamps become a Java heap object) so
> >>>> for specific JDBC drivers it may be possible to optimize. For
> >>>> example, the Avatica remove driver receives row sets in an RPC
> >>>> response in protobuf format. It may be useful if the JDBC driver were
> >>>> able to expose a direct path from protobuf to Arrow.
> "ResultSet.unwrap(AvaticaToArrowConverter.class)”
> >>>> might be one way to achieve this.
> >>>>
> >>>> Julian
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar
> >>>>> <atul.dambal...@xoriant.com>
> >>>> wrote:
> >>>>>
> >>>>> Hi Julian,
> >>>>>
> >>>>> Thanks for your response. If I understand correctly (looking at
> >>>>> other
> >>>> adapters), Calcite-Arrow adapter would provide SQL front end for
> >>>> in-memory Arrow data objects/structures. So from that perspective,
> >>>> are you suggesting building the Calcite-Arrow adapter?
> >>>>>
> >>>>> In this case, what we are saying is to provide a mechanism for
> >>>>> upstream
> >>>> apps to be able to get/create Arrow objects/structures from a
> >>>> relational database. This would also mean converting row like data
> >>>> from a SQL Database to columnar Arrow data structures. The utility
> >>>> may be, can make use of JDBC's MetaData features to figure out the
> >>>> underlying DB schema and define Arrow columnar schema. Also
> >>>> underlying database in this case would be any relational DB and hence
> >>>> would be persisted to the disk, but the Arrow objects being in-memory
> can be ephemeral.
> >>>>>
> >>>>> Please correct me if I am missing anything.
> >>>>>
> >>>>> -Atul
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Julian Hyde [mailto:jhyde.apa...@gmail.com]
> >>>>> Sent: Monday, October 30, 2017 7:50 PM
> >>>>> To: dev@arrow.apache.org
> >>>>> Subject: Re: JDBC Adapter for Apache-Arrow
> >>>>>
> >>>>> How about writing an Arrow adapter for Calcite? I think it amounts
> >>>>> to
> >>>> the same thing - you would inherit Calcite’s SQL parser and Avatica
> >>>> JDBC stack.
> >>>>>
> >>>>> Would this database be ephemeral (i.e. would the data go away when
> >>>>> you
> >>>> close the connection)? If not, how would you know where to load the
> >>>> data from?
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>>> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar
> >>>>>> <atul.dambal...@xoriant.com>
> >>>> wrote:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I wanted to open up a conversation here regarding developing a
> >>>> Java-based JDBC Adapter for Apache Arrow. I have had a preliminary
> >>>> discussion with Wes McKinney and Siddharth Teotia on this a couple
> >>>> weeks earlier.
> >>>>>>
> >>>>>> Basically at a high level (over-simplified) this adapter/API will
> >>>>>> allow
> >>>> upstream apps to query RDBMS data over JDBC and get the JDBC objects
> >>>> converted to Arrow in-memory (JVM) objects/structures. The upstream
> >>>> utility can then work with Arrow objects/structures with usual
> >>>> performance benefits. The utility will be very much similar to C++
> >>>> implementation of "Convert a vector of row-wise data into an Arrow
> >>>> table" as described here - https://arrow.apache.org/docs/
> cpp/md_tutorials_row_wise_conversion.html.
> >>>>>>
> >>>>>> How useful this adapter would be and which other Apache projects
> >>>>>> would
> >>>> benefit from this? Based on the usability we can open a JIRA for this
> >>>> activity and start looking into the implementation details.
> >>>>>>
> >>>>>> Regards,
> >>>>>> -Atul Dambalkar
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
>
>

Re: JDBC Adapter for Apache-Arrow

Reply via email to