I can try to do a quick scratch implementation to see how the connector fits in, but we are in the middle of release land so I don't have the amount of time I really need to think about this. I'd be glad to join any hangout to discuss everything though.
On Thu, Feb 1, 2018 at 11:15 AM Ryan Blue <rb...@netflix.com> wrote: > We don't mind updating Iceberg when the API improves. We are fully aware > that this is a very early implementation and will change. My hope is that > the community is receptive to our suggestions. > > A good example of an area with friction is filter and projection > push-down. The implementation for DSv2 isn't based on what the other read > paths do, it is a brand new and mostly untested. I don't really understand > why DSv2 introduced a new code path, when reusing existing code for this > ended up being smaller and works for more cases (see my comments on #20476 > <https://github.com/apache/spark/pull/20476>). I understand wanting to > fix parts of push-down, just not why it is a good idea to mix that > substantial change into an unrelated API update. This is one area where, I > hope, our suggestion to get DSv2 working well and redesign push-down as a > parallel effort is heard. > > I also see a few areas where the integration of DSv2 conflicts with what I > understand to be design principles of the catalyst optimizer. The fact that > it should use immutable nodes in plans is mostly settled, but there are > other examples. The approach of the new push-down implementation fights > against the principle of small rules that don't need to process the entire > plan tree. I think this makes the component brittle, and I'd like to > understand the rationale for going with this design. I'd love to see a > design document that covers why this is a necessary choice (but again, > separately). > > rb > > On Thu, Feb 1, 2018 at 9:10 AM, Felix Cheung <felixcheun...@hotmail.com> > wrote: > >> +1 hangout >> >> ------------------------------ >> *From:* Xiao Li <gatorsm...@gmail.com> >> *Sent:* Wednesday, January 31, 2018 10:46:26 PM >> *To:* Ryan Blue >> *Cc:* Reynold Xin; dev; Wenchen Fen; Russell Spitzer >> *Subject:* Re: data source v2 online meetup >> >> Hi, Ryan, >> >> wow, your Iceberg already used data source V2 API! That is pretty cool! I >> am just afraid these new APIs are not stable. We might deprecate or change >> some data source v2 APIs in the next version (2.4). Sorry for the >> inconvenience it might introduce. >> >> Thanks for your feedback always, >> >> Xiao >> >> >> 2018-01-31 15:54 GMT-08:00 Ryan Blue <rb...@netflix.com.invalid>: >> >>> Thanks for suggesting this, I think it's a great idea. I'll definitely >>> attend and can talk about the changes that we've made DataSourceV2 to >>> enable our new table format, Iceberg >>> <https://github.com/Netflix/iceberg#about-iceberg>. >>> >>> On Wed, Jan 31, 2018 at 2:35 PM, Reynold Xin <r...@databricks.com> >>> wrote: >>> >>>> Data source v2 API is one of the larger main changes in Spark 2.3, and >>>> whatever that has already been committed is only the first version and we'd >>>> need more work post-2.3 to improve and stablize it. >>>> >>>> I think at this point we should stop making changes to it in >>>> branch-2.3, and instead focus on using the existing API and getting >>>> feedback for 2.4. Would people be interested in doing an online hangout to >>>> discuss this, perhaps in the month of Feb? >>>> >>>> It'd be more productive if people attending the hangout have tried the >>>> API by implementing some new sources or porting an existing source over. >>>> >>>> >>>> >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> >> > > > -- > Ryan Blue > Software Engineer > Netflix >