RE: JDBC Adapter for Apache-Arrow

Atul Dambalkar Thu, 02 Nov 2017 14:59:57 -0700

I also like the approach of adding an interface and making it art of Arrow, so 
any specific JDBC driver can implement that interface to directly expose Arrow 
objects without having to create JDBC objects in the first place. One such 
implementation could be for Avatica itself what Julian was suggesting earlier.


-----Original Message-----
From: Julian Hyde [mailto:[email protected]] 
Sent: Tuesday, October 31, 2017 4:28 PM
To: [email protected]
Subject: Re: JDBC Adapter for Apache-Arrow

Yeah, I agree, it should be an interface defined as part of Arrow. Not 
driver-specific.

> On Oct 31, 2017, at 1:37 PM, Laurent Goujon <[email protected]> wrote:
> 
> I really like Julian's idea of unwrapping Arrow objects out of the 
> JDBC ResultSet, but I wonder if the unwrap class has to be specific to 
> the driver and if an interface can be designed to be used by multiple drivers:
> for drivers based on Arrow, it means you could totally skip the 
> serialization/deserialization from/to JDBC records.
> If such an interface exists, I would propose to add it to the Arrow 
> project, with Arrow product/projects in charge of adding support for 
> it in their own JDBC driver.
> 
> Laurent
> 
> On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar 
> <[email protected]>
> wrote:
> 
>> Thanks for your thoughts Julian. I think, adding support for Arrow 
>> objects for Avatica Remote Driver (AvaticaToArrowConverter) can be 
>> certainly taken up as another activity. And you are right, we will 
>> have to look at specific JDBC driver to really optimize it individually.
>> 
>> I would be curious if there are any further inputs/comments from 
>> other Dev folks, on the JDBC adapter aspect.
>> 
>> -Atul
>> 
>> -----Original Message-----
>> From: Julian Hyde [mailto:[email protected]]
>> Sent: Tuesday, October 31, 2017 11:12 AM
>> To: [email protected]
>> Subject: Re: JDBC Adapter for Apache-Arrow
>> 
>> Sorry I didn’t read your email thoroughly enough. I was talking about 
>> the inverse (JDBC reading from Arrow) whereas you are talking about 
>> Arrow reading from JDBC. Your proposal makes perfect sense.
>> 
>> JDBC is quite a chatty interface (a call for every column of every 
>> row, plus an occasional call to find out whether values are null, and 
>> objects such as strings and timestamps become a Java heap object) so 
>> for specific JDBC drivers it may be possible to optimize. For 
>> example, the Avatica remove driver receives row sets in an RPC 
>> response in protobuf format. It may be useful if the JDBC driver were 
>> able to expose a direct path from protobuf to Arrow. 
>> "ResultSet.unwrap(AvaticaToArrowConverter.class)”
>> might be one way to achieve this.
>> 
>> Julian
>> 
>> 
>> 
>> 
>>> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar 
>>> <[email protected]>
>> wrote:
>>> 
>>> Hi Julian,
>>> 
>>> Thanks for your response. If I understand correctly (looking at 
>>> other
>> adapters), Calcite-Arrow adapter would provide SQL front end for 
>> in-memory Arrow data objects/structures. So from that perspective, 
>> are you suggesting building the Calcite-Arrow adapter?
>>> 
>>> In this case, what we are saying is to provide a mechanism for 
>>> upstream
>> apps to be able to get/create Arrow objects/structures from a 
>> relational database. This would also mean converting row like data 
>> from a SQL Database to columnar Arrow data structures. The utility 
>> may be, can make use of JDBC's MetaData features to figure out the 
>> underlying DB schema and define Arrow columnar schema. Also 
>> underlying database in this case would be any relational DB and hence 
>> would be persisted to the disk, but the Arrow objects being in-memory can be 
>> ephemeral.
>>> 
>>> Please correct me if I am missing anything.
>>> 
>>> -Atul
>>> 
>>> -----Original Message-----
>>> From: Julian Hyde [mailto:[email protected]]
>>> Sent: Monday, October 30, 2017 7:50 PM
>>> To: [email protected]
>>> Subject: Re: JDBC Adapter for Apache-Arrow
>>> 
>>> How about writing an Arrow adapter for Calcite? I think it amounts 
>>> to
>> the same thing - you would inherit Calcite’s SQL parser and Avatica 
>> JDBC stack.
>>> 
>>> Would this database be ephemeral (i.e. would the data go away when 
>>> you
>> close the connection)? If not, how would you know where to load the 
>> data from?
>>> 
>>> Julian
>>> 
>>>> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar 
>>>> <[email protected]>
>> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I wanted to open up a conversation here regarding developing a
>> Java-based JDBC Adapter for Apache Arrow. I have had a preliminary 
>> discussion with Wes McKinney and Siddharth Teotia on this a couple 
>> weeks earlier.
>>>> 
>>>> Basically at a high level (over-simplified) this adapter/API will 
>>>> allow
>> upstream apps to query RDBMS data over JDBC and get the JDBC objects 
>> converted to Arrow in-memory (JVM) objects/structures. The upstream 
>> utility can then work with Arrow objects/structures with usual 
>> performance benefits. The utility will be very much similar to C++ 
>> implementation of "Convert a vector of row-wise data into an Arrow 
>> table" as described here - 
>> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html.
>>>> 
>>>> How useful this adapter would be and which other Apache projects 
>>>> would
>> benefit from this? Based on the usability we can open a JIRA for this 
>> activity and start looking into the implementation details.
>>>> 
>>>> Regards,
>>>> -Atul Dambalkar
>>>> 
>>>> 
>> 
>>

RE: JDBC Adapter for Apache-Arrow

Reply via email to