Still would be good to join. We can also do an additional one in March to give people more time.
On Thu, Feb 1, 2018 at 3:59 PM, Russell Spitzer <russell.spit...@gmail.com> wrote: > I can try to do a quick scratch implementation to see how the connector > fits in, but we are in the middle of release land so I don't have the > amount of time I really need to think about this. I'd be glad to join any > hangout to discuss everything though. > > On Thu, Feb 1, 2018 at 11:15 AM Ryan Blue <rb...@netflix.com> wrote: > >> We don't mind updating Iceberg when the API improves. We are fully aware >> that this is a very early implementation and will change. My hope is that >> the community is receptive to our suggestions. >> >> A good example of an area with friction is filter and projection >> push-down. The implementation for DSv2 isn't based on what the other read >> paths do, it is a brand new and mostly untested. I don't really understand >> why DSv2 introduced a new code path, when reusing existing code for this >> ended up being smaller and works for more cases (see my comments on >> #20476 <https://github.com/apache/spark/pull/20476>). I understand >> wanting to fix parts of push-down, just not why it is a good idea to mix >> that substantial change into an unrelated API update. This is one area >> where, I hope, our suggestion to get DSv2 working well and redesign >> push-down as a parallel effort is heard. >> >> I also see a few areas where the integration of DSv2 conflicts with what >> I understand to be design principles of the catalyst optimizer. The fact >> that it should use immutable nodes in plans is mostly settled, but there >> are other examples. The approach of the new push-down implementation fights >> against the principle of small rules that don't need to process the entire >> plan tree. I think this makes the component brittle, and I'd like to >> understand the rationale for going with this design. I'd love to see a >> design document that covers why this is a necessary choice (but again, >> separately). >> >> rb >> >> On Thu, Feb 1, 2018 at 9:10 AM, Felix Cheung <felixcheun...@hotmail.com> >> wrote: >> >>> +1 hangout >>> >>> ------------------------------ >>> *From:* Xiao Li <gatorsm...@gmail.com> >>> *Sent:* Wednesday, January 31, 2018 10:46:26 PM >>> *To:* Ryan Blue >>> *Cc:* Reynold Xin; dev; Wenchen Fen; Russell Spitzer >>> *Subject:* Re: data source v2 online meetup >>> >>> Hi, Ryan, >>> >>> wow, your Iceberg already used data source V2 API! That is pretty cool! >>> I am just afraid these new APIs are not stable. We might deprecate or >>> change some data source v2 APIs in the next version (2.4). Sorry for the >>> inconvenience it might introduce. >>> >>> Thanks for your feedback always, >>> >>> Xiao >>> >>> >>> 2018-01-31 15:54 GMT-08:00 Ryan Blue <rb...@netflix.com.invalid>: >>> >>>> Thanks for suggesting this, I think it's a great idea. I'll definitely >>>> attend and can talk about the changes that we've made DataSourceV2 to >>>> enable our new table format, Iceberg >>>> <https://github.com/Netflix/iceberg#about-iceberg>. >>>> >>>> On Wed, Jan 31, 2018 at 2:35 PM, Reynold Xin <r...@databricks.com> >>>> wrote: >>>> >>>>> Data source v2 API is one of the larger main changes in Spark 2.3, and >>>>> whatever that has already been committed is only the first version and >>>>> we'd >>>>> need more work post-2.3 to improve and stablize it. >>>>> >>>>> I think at this point we should stop making changes to it in >>>>> branch-2.3, and instead focus on using the existing API and getting >>>>> feedback for 2.4. Would people be interested in doing an online hangout to >>>>> discuss this, perhaps in the month of Feb? >>>>> >>>>> It'd be more productive if people attending the hangout have tried the >>>>> API by implementing some new sources or porting an existing source over. >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> >