Re: data source v2 online meetup

Reynold Xin Thu, 01 Feb 2018 16:44:30 -0800

Still would be good to join. We can also do an additional one in March to
give people more time.



On Thu, Feb 1, 2018 at 3:59 PM, Russell Spitzer <[email protected]>
wrote:

> I can try to do a quick scratch implementation to see how the connector
> fits in, but we are in the middle of release land so I don't have the
> amount of time I really need to think about this. I'd be glad to join any
> hangout to discuss everything though.
>
> On Thu, Feb 1, 2018 at 11:15 AM Ryan Blue <[email protected]> wrote:
>
>> We don't mind updating Iceberg when the API improves. We are fully aware
>> that this is a very early implementation and will change. My hope is that
>> the community is receptive to our suggestions.
>>
>> A good example of an area with friction is filter and projection
>> push-down. The implementation for DSv2 isn't based on what the other read
>> paths do, it is a brand new and mostly untested. I don't really understand
>> why DSv2 introduced a new code path, when reusing existing code for this
>> ended up being smaller and works for more cases (see my comments on
>> #20476 <https://github.com/apache/spark/pull/20476>). I understand
>> wanting to fix parts of push-down, just not why it is a good idea to mix
>> that substantial change into an unrelated API update. This is one area
>> where, I hope, our suggestion to get DSv2 working well and redesign
>> push-down as a parallel effort is heard.
>>
>> I also see a few areas where the integration of DSv2 conflicts with what
>> I understand to be design principles of the catalyst optimizer. The fact
>> that it should use immutable nodes in plans is mostly settled, but there
>> are other examples. The approach of the new push-down implementation fights
>> against the principle of small rules that don't need to process the entire
>> plan tree. I think this makes the component brittle, and I'd like to
>> understand the rationale for going with this design. I'd love to see a
>> design document that covers why this is a necessary choice (but again,
>> separately).
>>
>> rb
>>
>> On Thu, Feb 1, 2018 at 9:10 AM, Felix Cheung <[email protected]>
>> wrote:
>>
>>> +1 hangout
>>>
>>> ------------------------------
>>> *From:* Xiao Li <[email protected]>
>>> *Sent:* Wednesday, January 31, 2018 10:46:26 PM
>>> *To:* Ryan Blue
>>> *Cc:* Reynold Xin; dev; Wenchen Fen; Russell Spitzer
>>> *Subject:* Re: data source v2 online meetup
>>>
>>> Hi, Ryan,
>>>
>>> wow, your Iceberg already used data source V2 API! That is pretty cool!
>>> I am just afraid these new APIs are not stable. We might deprecate or
>>> change some data source v2 APIs in the next version (2.4). Sorry for the
>>> inconvenience it might introduce.
>>>
>>> Thanks for your feedback always,
>>>
>>> Xiao
>>>
>>>
>>> 2018-01-31 15:54 GMT-08:00 Ryan Blue <[email protected]>:
>>>
>>>> Thanks for suggesting this, I think it's a great idea. I'll definitely
>>>> attend and can talk about the changes that we've made DataSourceV2 to
>>>> enable our new table format, Iceberg
>>>> <https://github.com/Netflix/iceberg#about-iceberg>.
>>>>
>>>> On Wed, Jan 31, 2018 at 2:35 PM, Reynold Xin <[email protected]>
>>>> wrote:
>>>>
>>>>> Data source v2 API is one of the larger main changes in Spark 2.3, and
>>>>> whatever that has already been committed is only the first version and 
>>>>> we'd
>>>>> need more work post-2.3 to improve and stablize it.
>>>>>
>>>>> I think at this point we should stop making changes to it in
>>>>> branch-2.3, and instead focus on using the existing API and getting
>>>>> feedback for 2.4. Would people be interested in doing an online hangout to
>>>>> discuss this, perhaps in the month of Feb?
>>>>>
>>>>> It'd be more productive if people attending the hangout have tried the
>>>>> API by implementing some new sources or porting an existing source over.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

Re: data source v2 online meetup

Reply via email to