We have some stuff I Dremio that we've planned on open sourcing but haven't yet done so. We should try to get that out for others to consume.
On Jan 7, 2018 11:49 AM, "Uwe L. Korn" <uw...@xhochy.com> wrote: > Has anyone made progress on the JDBC adapter yet? > > I recently came across a lot projects with good JDBC drivers but not so > good drivers in Python. Having an Arrow-JDBC adaptor would make these query > engines much more useful to the Python community. Being an Arrow committer > and one of the turbodbc authors, I have quite some knowledge in this area > but my Java is a bit rusty and I have never dealt with JDBC, so I‘m looking > for someone to collaborate on this feature. > > Also this might be my ultimate chance to also get contributing to the Java > part of Apache Arrow. > > Uwe > > > Am 07.11.2017 um 20:01 schrieb Julian Hyde <jh...@apache.org>: > > > > I have logged https://issues.apache.org/jira/browse/CALCITE-2040 (I > > logged it within Calcite because this makes more sense that this is an > > Arrow adapter within Calcite than a Calcite adapter within Arrow). > > > > Note the last paragraph about > > https://issues.apache.org/jira/browse/CALCITE-2025 and bioinformatics > > file formats. Readers for these formats would be useful extensions to > > Arrow regardless of whether the data was ultimately going to be > > queried using SQL. (Contributions welcome!) Calcite's bio adapter > > would build upon the Arrow readers in two respects: (1) to read > > metadata from these files (e.g. are there any extra fields?) and (2) > > to push down processing (filters, projects) into the reader. > > > > Julian > > > > > > On Tue, Nov 7, 2017 at 10:21 AM, Atul Dambalkar > > <atul.dambal...@xoriant.com> wrote: > >> Hi, > >> > >> Don' t mean to interrupt the current discussion threads. But, based on > the discussions so far on the JDBC Adapter piece, are we in a position to > create a JIRA ticket for this as well as the other piece about adding a > direct Arrow objects creation support from JDBC drivers? If yes, I can > certainly go ahead and create JIRA for JDBC Adapter work. > >> > >> Julian, would you like to create the JIRA for the other item that you > proposed. > >> > >> -Atul > >> > >> -----Original Message----- > >> From: Atul Dambalkar > >> Sent: Thursday, November 02, 2017 2:59 PM > >> To: dev@arrow.apache.org > >> Subject: RE: JDBC Adapter for Apache-Arrow > >> > >> I also like the approach of adding an interface and making it art of > Arrow, so any specific JDBC driver can implement that interface to directly > expose Arrow objects without having to create JDBC objects in the first > place. One such implementation could be for Avatica itself what Julian was > suggesting earlier. > >> > >> -----Original Message----- > >> From: Julian Hyde [mailto:jh...@apache.org] > >> Sent: Tuesday, October 31, 2017 4:28 PM > >> To: dev@arrow.apache.org > >> Subject: Re: JDBC Adapter for Apache-Arrow > >> > >> Yeah, I agree, it should be an interface defined as part of Arrow. Not > driver-specific. > >> > >>> On Oct 31, 2017, at 1:37 PM, Laurent Goujon <laur...@dremio.com> > wrote: > >>> > >>> I really like Julian's idea of unwrapping Arrow objects out of the > >>> JDBC ResultSet, but I wonder if the unwrap class has to be specific to > >>> the driver and if an interface can be designed to be used by multiple > drivers: > >>> for drivers based on Arrow, it means you could totally skip the > >>> serialization/deserialization from/to JDBC records. > >>> If such an interface exists, I would propose to add it to the Arrow > >>> project, with Arrow product/projects in charge of adding support for > >>> it in their own JDBC driver. > >>> > >>> Laurent > >>> > >>> On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar > >>> <atul.dambal...@xoriant.com> > >>> wrote: > >>> > >>>> Thanks for your thoughts Julian. I think, adding support for Arrow > >>>> objects for Avatica Remote Driver (AvaticaToArrowConverter) can be > >>>> certainly taken up as another activity. And you are right, we will > >>>> have to look at specific JDBC driver to really optimize it > individually. > >>>> > >>>> I would be curious if there are any further inputs/comments from > >>>> other Dev folks, on the JDBC adapter aspect. > >>>> > >>>> -Atul > >>>> > >>>> -----Original Message----- > >>>> From: Julian Hyde [mailto:jh...@apache.org] > >>>> Sent: Tuesday, October 31, 2017 11:12 AM > >>>> To: dev@arrow.apache.org > >>>> Subject: Re: JDBC Adapter for Apache-Arrow > >>>> > >>>> Sorry I didn’t read your email thoroughly enough. I was talking about > >>>> the inverse (JDBC reading from Arrow) whereas you are talking about > >>>> Arrow reading from JDBC. Your proposal makes perfect sense. > >>>> > >>>> JDBC is quite a chatty interface (a call for every column of every > >>>> row, plus an occasional call to find out whether values are null, and > >>>> objects such as strings and timestamps become a Java heap object) so > >>>> for specific JDBC drivers it may be possible to optimize. For > >>>> example, the Avatica remove driver receives row sets in an RPC > >>>> response in protobuf format. It may be useful if the JDBC driver were > >>>> able to expose a direct path from protobuf to Arrow. > "ResultSet.unwrap(AvaticaToArrowConverter.class)” > >>>> might be one way to achieve this. > >>>> > >>>> Julian > >>>> > >>>> > >>>> > >>>> > >>>>> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar > >>>>> <atul.dambal...@xoriant.com> > >>>> wrote: > >>>>> > >>>>> Hi Julian, > >>>>> > >>>>> Thanks for your response. If I understand correctly (looking at > >>>>> other > >>>> adapters), Calcite-Arrow adapter would provide SQL front end for > >>>> in-memory Arrow data objects/structures. So from that perspective, > >>>> are you suggesting building the Calcite-Arrow adapter? > >>>>> > >>>>> In this case, what we are saying is to provide a mechanism for > >>>>> upstream > >>>> apps to be able to get/create Arrow objects/structures from a > >>>> relational database. This would also mean converting row like data > >>>> from a SQL Database to columnar Arrow data structures. The utility > >>>> may be, can make use of JDBC's MetaData features to figure out the > >>>> underlying DB schema and define Arrow columnar schema. Also > >>>> underlying database in this case would be any relational DB and hence > >>>> would be persisted to the disk, but the Arrow objects being in-memory > can be ephemeral. > >>>>> > >>>>> Please correct me if I am missing anything. > >>>>> > >>>>> -Atul > >>>>> > >>>>> -----Original Message----- > >>>>> From: Julian Hyde [mailto:jhyde.apa...@gmail.com] > >>>>> Sent: Monday, October 30, 2017 7:50 PM > >>>>> To: dev@arrow.apache.org > >>>>> Subject: Re: JDBC Adapter for Apache-Arrow > >>>>> > >>>>> How about writing an Arrow adapter for Calcite? I think it amounts > >>>>> to > >>>> the same thing - you would inherit Calcite’s SQL parser and Avatica > >>>> JDBC stack. > >>>>> > >>>>> Would this database be ephemeral (i.e. would the data go away when > >>>>> you > >>>> close the connection)? If not, how would you know where to load the > >>>> data from? > >>>>> > >>>>> Julian > >>>>> > >>>>>> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar > >>>>>> <atul.dambal...@xoriant.com> > >>>> wrote: > >>>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> I wanted to open up a conversation here regarding developing a > >>>> Java-based JDBC Adapter for Apache Arrow. I have had a preliminary > >>>> discussion with Wes McKinney and Siddharth Teotia on this a couple > >>>> weeks earlier. > >>>>>> > >>>>>> Basically at a high level (over-simplified) this adapter/API will > >>>>>> allow > >>>> upstream apps to query RDBMS data over JDBC and get the JDBC objects > >>>> converted to Arrow in-memory (JVM) objects/structures. The upstream > >>>> utility can then work with Arrow objects/structures with usual > >>>> performance benefits. The utility will be very much similar to C++ > >>>> implementation of "Convert a vector of row-wise data into an Arrow > >>>> table" as described here - https://arrow.apache.org/docs/ > cpp/md_tutorials_row_wise_conversion.html. > >>>>>> > >>>>>> How useful this adapter would be and which other Apache projects > >>>>>> would > >>>> benefit from this? Based on the usability we can open a JIRA for this > >>>> activity and start looking into the implementation details. > >>>>>> > >>>>>> Regards, > >>>>>> -Atul Dambalkar > >>>>>> > >>>>>> > >>>> > >>>> > >> > >