Re: data source v2 online meetup

Russell Spitzer Thu, 01 Feb 2018 16:00:06 -0800

I can try to do a quick scratch implementation to see how the connector
fits in, but we are in the middle of release land so I don't have the
amount of time I really need to think about this. I'd be glad to join any
hangout to discuss everything though.


On Thu, Feb 1, 2018 at 11:15 AM Ryan Blue <rb...@netflix.com> wrote:

> We don't mind updating Iceberg when the API improves. We are fully aware
> that this is a very early implementation and will change. My hope is that
> the community is receptive to our suggestions.
>
> A good example of an area with friction is filter and projection
> push-down. The implementation for DSv2 isn't based on what the other read
> paths do, it is a brand new and mostly untested. I don't really understand
> why DSv2 introduced a new code path, when reusing existing code for this
> ended up being smaller and works for more cases (see my comments on #20476
> <https://github.com/apache/spark/pull/20476>). I understand wanting to
> fix parts of push-down, just not why it is a good idea to mix that
> substantial change into an unrelated API update. This is one area where, I
> hope, our suggestion to get DSv2 working well and redesign push-down as a
> parallel effort is heard.
>
> I also see a few areas where the integration of DSv2 conflicts with what I
> understand to be design principles of the catalyst optimizer. The fact that
> it should use immutable nodes in plans is mostly settled, but there are
> other examples. The approach of the new push-down implementation fights
> against the principle of small rules that don't need to process the entire
> plan tree. I think this makes the component brittle, and I'd like to
> understand the rationale for going with this design. I'd love to see a
> design document that covers why this is a necessary choice (but again,
> separately).
>
> rb
>
> On Thu, Feb 1, 2018 at 9:10 AM, Felix Cheung <felixcheun...@hotmail.com>
> wrote:
>
>> +1 hangout
>>
>> ------------------------------
>> *From:* Xiao Li <gatorsm...@gmail.com>
>> *Sent:* Wednesday, January 31, 2018 10:46:26 PM
>> *To:* Ryan Blue
>> *Cc:* Reynold Xin; dev; Wenchen Fen; Russell Spitzer
>> *Subject:* Re: data source v2 online meetup
>>
>> Hi, Ryan,
>>
>> wow, your Iceberg already used data source V2 API! That is pretty cool! I
>> am just afraid these new APIs are not stable. We might deprecate or change
>> some data source v2 APIs in the next version (2.4). Sorry for the
>> inconvenience it might introduce.
>>
>> Thanks for your feedback always,
>>
>> Xiao
>>
>>
>> 2018-01-31 15:54 GMT-08:00 Ryan Blue <rb...@netflix.com.invalid>:
>>
>>> Thanks for suggesting this, I think it's a great idea. I'll definitely
>>> attend and can talk about the changes that we've made DataSourceV2 to
>>> enable our new table format, Iceberg
>>> <https://github.com/Netflix/iceberg#about-iceberg>.
>>>
>>> On Wed, Jan 31, 2018 at 2:35 PM, Reynold Xin <r...@databricks.com>
>>> wrote:
>>>
>>>> Data source v2 API is one of the larger main changes in Spark 2.3, and
>>>> whatever that has already been committed is only the first version and we'd
>>>> need more work post-2.3 to improve and stablize it.
>>>>
>>>> I think at this point we should stop making changes to it in
>>>> branch-2.3, and instead focus on using the existing API and getting
>>>> feedback for 2.4. Would people be interested in doing an online hangout to
>>>> discuss this, perhaps in the month of Feb?
>>>>
>>>> It'd be more productive if people attending the hangout have tried the
>>>> API by implementing some new sources or porting an existing source over.
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: data source v2 online meetup

Reply via email to