Re: JDBC Adapter for Apache-Arrow

Julian Hyde Tue, 31 Oct 2017 16:29:23 -0700

Yeah, I agree, it should be an interface defined as part of Arrow. Not 
driver-specific.


> On Oct 31, 2017, at 1:37 PM, Laurent Goujon <laur...@dremio.com> wrote:
> 
> I really like Julian's idea of unwrapping Arrow objects out of the JDBC
> ResultSet, but I wonder if the unwrap class has to be specific to the
> driver and if an interface can be designed to be used by multiple drivers:
> for drivers based on Arrow, it means you could totally skip the
> serialization/deserialization from/to JDBC records.
> If such an interface exists, I would propose to add it to the Arrow
> project, with Arrow product/projects in charge of adding support for it in
> their own JDBC driver.
> 
> Laurent
> 
> On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar <atul.dambal...@xoriant.com>
> wrote:
> 
>> Thanks for your thoughts Julian. I think, adding support for Arrow objects
>> for Avatica Remote Driver (AvaticaToArrowConverter) can be certainly taken
>> up as another activity. And you are right, we will have to look at specific
>> JDBC driver to really optimize it individually.
>> 
>> I would be curious if there are any further inputs/comments from other Dev
>> folks, on the JDBC adapter aspect.
>> 
>> -Atul
>> 
>> -----Original Message-----
>> From: Julian Hyde [mailto:jh...@apache.org]
>> Sent: Tuesday, October 31, 2017 11:12 AM
>> To: dev@arrow.apache.org
>> Subject: Re: JDBC Adapter for Apache-Arrow
>> 
>> Sorry I didn’t read your email thoroughly enough. I was talking about the
>> inverse (JDBC reading from Arrow) whereas you are talking about Arrow
>> reading from JDBC. Your proposal makes perfect sense.
>> 
>> JDBC is quite a chatty interface (a call for every column of every row,
>> plus an occasional call to find out whether values are null, and objects
>> such as strings and timestamps become a Java heap object) so for specific
>> JDBC drivers it may be possible to optimize. For example, the Avatica
>> remove driver receives row sets in an RPC response in protobuf format. It
>> may be useful if the JDBC driver were able to expose a direct path from
>> protobuf to Arrow. "ResultSet.unwrap(AvaticaToArrowConverter.class)”
>> might be one way to achieve this.
>> 
>> Julian
>> 
>> 
>> 
>> 
>>> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar <atul.dambal...@xoriant.com>
>> wrote:
>>> 
>>> Hi Julian,
>>> 
>>> Thanks for your response. If I understand correctly (looking at other
>> adapters), Calcite-Arrow adapter would provide SQL front end for in-memory
>> Arrow data objects/structures. So from that perspective, are you suggesting
>> building the Calcite-Arrow adapter?
>>> 
>>> In this case, what we are saying is to provide a mechanism for upstream
>> apps to be able to get/create Arrow objects/structures from a relational
>> database. This would also mean converting row like data from a SQL Database
>> to columnar Arrow data structures. The utility may be, can make use of
>> JDBC's MetaData features to figure out the underlying DB schema and define
>> Arrow columnar schema. Also underlying database in this case would be any
>> relational DB and hence would be persisted to the disk, but the Arrow
>> objects being in-memory can be ephemeral.
>>> 
>>> Please correct me if I am missing anything.
>>> 
>>> -Atul
>>> 
>>> -----Original Message-----
>>> From: Julian Hyde [mailto:jhyde.apa...@gmail.com]
>>> Sent: Monday, October 30, 2017 7:50 PM
>>> To: dev@arrow.apache.org
>>> Subject: Re: JDBC Adapter for Apache-Arrow
>>> 
>>> How about writing an Arrow adapter for Calcite? I think it amounts to
>> the same thing - you would inherit Calcite’s SQL parser and Avatica JDBC
>> stack.
>>> 
>>> Would this database be ephemeral (i.e. would the data go away when you
>> close the connection)? If not, how would you know where to load the data
>> from?
>>> 
>>> Julian
>>> 
>>>> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar <atul.dambal...@xoriant.com>
>> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I wanted to open up a conversation here regarding developing a
>> Java-based JDBC Adapter for Apache Arrow. I have had a preliminary
>> discussion with Wes McKinney and Siddharth Teotia on this a couple weeks
>> earlier.
>>>> 
>>>> Basically at a high level (over-simplified) this adapter/API will allow
>> upstream apps to query RDBMS data over JDBC and get the JDBC objects
>> converted to Arrow in-memory (JVM) objects/structures. The upstream utility
>> can then work with Arrow objects/structures with usual performance
>> benefits. The utility will be very much similar to C++ implementation of
>> "Convert a vector of row-wise data into an Arrow table" as described here -
>> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html.
>>>> 
>>>> How useful this adapter would be and which other Apache projects would
>> benefit from this? Based on the usability we can open a JIRA for this
>> activity and start looking into the implementation details.
>>>> 
>>>> Regards,
>>>> -Atul Dambalkar
>>>> 
>>>> 
>> 
>>

Re: JDBC Adapter for Apache-Arrow

Reply via email to