Re: JDBC Adapter for Apache-Arrow

丁锦祥 Tue, 31 Oct 2017 17:21:31 -0700

unsubscribe

On Tue, Oct 31, 2017 at 4:28 PM, Julian Hyde <[email protected]> wrote:


> Yeah, I agree, it should be an interface defined as part of Arrow. Not
> driver-specific.
>
> > On Oct 31, 2017, at 1:37 PM, Laurent Goujon <[email protected]> wrote:
> >
> > I really like Julian's idea of unwrapping Arrow objects out of the JDBC
> > ResultSet, but I wonder if the unwrap class has to be specific to the
> > driver and if an interface can be designed to be used by multiple
> drivers:
> > for drivers based on Arrow, it means you could totally skip the
> > serialization/deserialization from/to JDBC records.
> > If such an interface exists, I would propose to add it to the Arrow
> > project, with Arrow product/projects in charge of adding support for it
> in
> > their own JDBC driver.
> >
> > Laurent
> >
> > On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar <
> [email protected]>
> > wrote:
> >
> >> Thanks for your thoughts Julian. I think, adding support for Arrow
> objects
> >> for Avatica Remote Driver (AvaticaToArrowConverter) can be certainly
> taken
> >> up as another activity. And you are right, we will have to look at
> specific
> >> JDBC driver to really optimize it individually.
> >>
> >> I would be curious if there are any further inputs/comments from other
> Dev
> >> folks, on the JDBC adapter aspect.
> >>
> >> -Atul
> >>
> >> -----Original Message-----
> >> From: Julian Hyde [mailto:[email protected]]
> >> Sent: Tuesday, October 31, 2017 11:12 AM
> >> To: [email protected]
> >> Subject: Re: JDBC Adapter for Apache-Arrow
> >>
> >> Sorry I didn’t read your email thoroughly enough. I was talking about
> the
> >> inverse (JDBC reading from Arrow) whereas you are talking about Arrow
> >> reading from JDBC. Your proposal makes perfect sense.
> >>
> >> JDBC is quite a chatty interface (a call for every column of every row,
> >> plus an occasional call to find out whether values are null, and objects
> >> such as strings and timestamps become a Java heap object) so for
> specific
> >> JDBC drivers it may be possible to optimize. For example, the Avatica
> >> remove driver receives row sets in an RPC response in protobuf format.
> It
> >> may be useful if the JDBC driver were able to expose a direct path from
> >> protobuf to Arrow. "ResultSet.unwrap(AvaticaToArrowConverter.class)”
> >> might be one way to achieve this.
> >>
> >> Julian
> >>
> >>
> >>
> >>
> >>> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar <
> [email protected]>
> >> wrote:
> >>>
> >>> Hi Julian,
> >>>
> >>> Thanks for your response. If I understand correctly (looking at other
> >> adapters), Calcite-Arrow adapter would provide SQL front end for
> in-memory
> >> Arrow data objects/structures. So from that perspective, are you
> suggesting
> >> building the Calcite-Arrow adapter?
> >>>
> >>> In this case, what we are saying is to provide a mechanism for upstream
> >> apps to be able to get/create Arrow objects/structures from a relational
> >> database. This would also mean converting row like data from a SQL
> Database
> >> to columnar Arrow data structures. The utility may be, can make use of
> >> JDBC's MetaData features to figure out the underlying DB schema and
> define
> >> Arrow columnar schema. Also underlying database in this case would be
> any
> >> relational DB and hence would be persisted to the disk, but the Arrow
> >> objects being in-memory can be ephemeral.
> >>>
> >>> Please correct me if I am missing anything.
> >>>
> >>> -Atul
> >>>
> >>> -----Original Message-----
> >>> From: Julian Hyde [mailto:[email protected]]
> >>> Sent: Monday, October 30, 2017 7:50 PM
> >>> To: [email protected]
> >>> Subject: Re: JDBC Adapter for Apache-Arrow
> >>>
> >>> How about writing an Arrow adapter for Calcite? I think it amounts to
> >> the same thing - you would inherit Calcite’s SQL parser and Avatica JDBC
> >> stack.
> >>>
> >>> Would this database be ephemeral (i.e. would the data go away when you
> >> close the connection)? If not, how would you know where to load the data
> >> from?
> >>>
> >>> Julian
> >>>
> >>>> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar <
> [email protected]>
> >> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I wanted to open up a conversation here regarding developing a
> >> Java-based JDBC Adapter for Apache Arrow. I have had a preliminary
> >> discussion with Wes McKinney and Siddharth Teotia on this a couple weeks
> >> earlier.
> >>>>
> >>>> Basically at a high level (over-simplified) this adapter/API will
> allow
> >> upstream apps to query RDBMS data over JDBC and get the JDBC objects
> >> converted to Arrow in-memory (JVM) objects/structures. The upstream
> utility
> >> can then work with Arrow objects/structures with usual performance
> >> benefits. The utility will be very much similar to C++ implementation of
> >> "Convert a vector of row-wise data into an Arrow table" as described
> here -
> >> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
> .
> >>>>
> >>>> How useful this adapter would be and which other Apache projects would
> >> benefit from this? Based on the usability we can open a JIRA for this
> >> activity and start looking into the implementation details.
> >>>>
> >>>> Regards,
> >>>> -Atul Dambalkar
> >>>>
> >>>>
> >>
> >>
>
>

Re: JDBC Adapter for Apache-Arrow

Reply via email to