Re: JDBC Adapter for Apache-Arrow

Julian Hyde Wed, 01 Nov 2017 10:06:06 -0700

http://lmgtfy.com/?q=unsubscribe+apache+arrow 
<http://lmgtfy.com/?q=unsubscribe+apache+arrow>


> On Oct 31, 2017, at 5:20 PM, 丁锦祥 <[email protected]> wrote:
> 
> unsubscribe
> 
> On Tue, Oct 31, 2017 at 4:28 PM, Julian Hyde <[email protected]> wrote:
> 
>> Yeah, I agree, it should be an interface defined as part of Arrow. Not
>> driver-specific.
>> 
>>> On Oct 31, 2017, at 1:37 PM, Laurent Goujon <[email protected]> wrote:
>>> 
>>> I really like Julian's idea of unwrapping Arrow objects out of the JDBC
>>> ResultSet, but I wonder if the unwrap class has to be specific to the
>>> driver and if an interface can be designed to be used by multiple
>> drivers:
>>> for drivers based on Arrow, it means you could totally skip the
>>> serialization/deserialization from/to JDBC records.
>>> If such an interface exists, I would propose to add it to the Arrow
>>> project, with Arrow product/projects in charge of adding support for it
>> in
>>> their own JDBC driver.
>>> 
>>> Laurent
>>> 
>>> On Tue, Oct 31, 2017 at 1:18 PM, Atul Dambalkar <
>> [email protected]>
>>> wrote:
>>> 
>>>> Thanks for your thoughts Julian. I think, adding support for Arrow
>> objects
>>>> for Avatica Remote Driver (AvaticaToArrowConverter) can be certainly
>> taken
>>>> up as another activity. And you are right, we will have to look at
>> specific
>>>> JDBC driver to really optimize it individually.
>>>> 
>>>> I would be curious if there are any further inputs/comments from other
>> Dev
>>>> folks, on the JDBC adapter aspect.
>>>> 
>>>> -Atul
>>>> 
>>>> -----Original Message-----
>>>> From: Julian Hyde [mailto:[email protected]]
>>>> Sent: Tuesday, October 31, 2017 11:12 AM
>>>> To: [email protected]
>>>> Subject: Re: JDBC Adapter for Apache-Arrow
>>>> 
>>>> Sorry I didn’t read your email thoroughly enough. I was talking about
>> the
>>>> inverse (JDBC reading from Arrow) whereas you are talking about Arrow
>>>> reading from JDBC. Your proposal makes perfect sense.
>>>> 
>>>> JDBC is quite a chatty interface (a call for every column of every row,
>>>> plus an occasional call to find out whether values are null, and objects
>>>> such as strings and timestamps become a Java heap object) so for
>> specific
>>>> JDBC drivers it may be possible to optimize. For example, the Avatica
>>>> remove driver receives row sets in an RPC response in protobuf format.
>> It
>>>> may be useful if the JDBC driver were able to expose a direct path from
>>>> protobuf to Arrow. "ResultSet.unwrap(AvaticaToArrowConverter.class)”
>>>> might be one way to achieve this.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Oct 31, 2017, at 10:41 AM, Atul Dambalkar <
>> [email protected]>
>>>> wrote:
>>>>> 
>>>>> Hi Julian,
>>>>> 
>>>>> Thanks for your response. If I understand correctly (looking at other
>>>> adapters), Calcite-Arrow adapter would provide SQL front end for
>> in-memory
>>>> Arrow data objects/structures. So from that perspective, are you
>> suggesting
>>>> building the Calcite-Arrow adapter?
>>>>> 
>>>>> In this case, what we are saying is to provide a mechanism for upstream
>>>> apps to be able to get/create Arrow objects/structures from a relational
>>>> database. This would also mean converting row like data from a SQL
>> Database
>>>> to columnar Arrow data structures. The utility may be, can make use of
>>>> JDBC's MetaData features to figure out the underlying DB schema and
>> define
>>>> Arrow columnar schema. Also underlying database in this case would be
>> any
>>>> relational DB and hence would be persisted to the disk, but the Arrow
>>>> objects being in-memory can be ephemeral.
>>>>> 
>>>>> Please correct me if I am missing anything.
>>>>> 
>>>>> -Atul
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Julian Hyde [mailto:[email protected]]
>>>>> Sent: Monday, October 30, 2017 7:50 PM
>>>>> To: [email protected]
>>>>> Subject: Re: JDBC Adapter for Apache-Arrow
>>>>> 
>>>>> How about writing an Arrow adapter for Calcite? I think it amounts to
>>>> the same thing - you would inherit Calcite’s SQL parser and Avatica JDBC
>>>> stack.
>>>>> 
>>>>> Would this database be ephemeral (i.e. would the data go away when you
>>>> close the connection)? If not, how would you know where to load the data
>>>> from?
>>>>> 
>>>>> Julian
>>>>> 
>>>>>> On Oct 30, 2017, at 6:17 PM, Atul Dambalkar <
>> [email protected]>
>>>> wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I wanted to open up a conversation here regarding developing a
>>>> Java-based JDBC Adapter for Apache Arrow. I have had a preliminary
>>>> discussion with Wes McKinney and Siddharth Teotia on this a couple weeks
>>>> earlier.
>>>>>> 
>>>>>> Basically at a high level (over-simplified) this adapter/API will
>> allow
>>>> upstream apps to query RDBMS data over JDBC and get the JDBC objects
>>>> converted to Arrow in-memory (JVM) objects/structures. The upstream
>> utility
>>>> can then work with Arrow objects/structures with usual performance
>>>> benefits. The utility will be very much similar to C++ implementation of
>>>> "Convert a vector of row-wise data into an Arrow table" as described
>> here -
>>>> https://arrow.apache.org/docs/cpp/md_tutorials_row_wise_conversion.html
>> .
>>>>>> 
>>>>>> How useful this adapter would be and which other Apache projects would
>>>> benefit from this? Based on the usability we can open a JIRA for this
>>>> activity and start looking into the implementation details.
>>>>>> 
>>>>>> Regards,
>>>>>> -Atul Dambalkar
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: JDBC Adapter for Apache-Arrow

Reply via email to