I'm fine starting a branch later if we do run into those issues, but I don't think it is a good idea to do it now in anticipation. All of the work that we can do on master we should try to do on master. We can start a branch when we need one.
On Mon, Mar 30, 2020 at 7:44 PM OpenInx <open...@gmail.com> wrote: > Hi Ryan > > The reason I suggest to open a new dev branch for row-delete development > is: we will split the whole feature into > many small issues and each issue will have a pull request with appropriate > length of code so the contributors/reviewers > can discuss one point each time and make this feature a faster iteration. > In the process of implementation, we will ensure > that the v1 works for every separate PR but it may not ready for cutting > release, for example, when release the 0.8.0 I'm > sure we won't like the release version contains part of the v2 spec(such > as provide the sequence_number, but no file_type). > The spark reader/writer and data/delete manifest may also need some code > refactor, it's possible to put them into several PR. > Splitting into multiple Pull Requests may block the release of the new > version for a certain period of time, that's not we want > to see. > > About the new branch maintenance, in my experience we could rebase the new > branch with master periodly(such as rebase > for every three days), so that the new pull request for row-delete will be > designed based on the newest changes. It should work > for the master which would not have too many new change. This is in line > with our current situation. > > In this case, I weighed the maintenance costs of the new branch against > the delay of the row-delete. I think we should let the > row-delete go a little faster (almost all community users are looking > forward to this feature), and I think the current maintenance > cost is acceptable. > > Thanks > > On Tue, Mar 31, 2020 at 5:52 AM Ryan Blue <rb...@netflix.com.invalid> > wrote: > >> Sorry, I didn't address the suggestion to add a Flink branch as well. The >> work needed for the Flink sink is to remove parts that are specific to >> Netflix, so I'm not sure what the rationale for a branch would be. Is there >> a reason why this can't be done in master, but requires a shared branch? If >> multiple people want to contribute, why not contribute to the same PR? >> >> A shared PR branch makes the most sense to me for this because it is >> regularly tested against master. >> >> On Mon, Mar 30, 2020 at 2:48 PM Ryan Blue <rb...@netflix.com> wrote: >> >>> I think we will eventually may want a branch, but I think it is too >>> early to create one now. >>> >>> Branches are expensive. They require maintenance to stay in sync with >>> master, usually copying changes from master into the branch with updates. >>> Updating the changes to master for the branch is more difficult because it >>> is usually not the original contributor or reviewer porting them. And it is >>> better to catch problems between changes in master and the branch early. >>> >>> I'm not against branches, but I don't want to create them unless they >>> are valuable. In this case, I don't see the value. We plan to add v2 in >>> parallel so you can still write v1 tables for compatibility, and most of >>> the work that needs to be done -- like creating readers and writers for >>> diff formats -- can be done in master. >>> >>> rb >>> >>> On Mon, Mar 30, 2020 at 9:00 AM Gautam <gautamkows...@gmail.com> wrote: >>> >>>> Thanks for bringing this up OpenInx. That's a great idea: to open a >>>> separate branch for row-level deletes. >>>> >>>> I would like to help support/contribute/review this as well. If there >>>> are sub-tasks you guys have identified that can be added to >>>> https://github.com/apache/incubator-iceberg/milestone/4 we can start >>>> taking those up too. >>>> >>>> thanks for the good work, >>>> - Gautam. >>>> >>>> >>>> >>>> On Mon, Mar 30, 2020 at 8:39 AM Junjie Chen <chenjunjied...@gmail.com> >>>> wrote: >>>> >>>>> +1 to create the branch. Some row-level delete subtasks must be based >>>>> on the sequence number as well as end to end tests. >>>>> >>>>> On Fri, Mar 27, 2020 at 4:42 PM OpenInx <open...@gmail.com> wrote: >>>>> >>>>>> Dear Dev: >>>>>> >>>>>> Tuesday, we had a sync meeting. and discussed about the things: >>>>>> 1. cut the 0.8.0 release; >>>>>> 2. flink connector ; >>>>>> 3. iceberg row-level delete; >>>>>> 4. Map-Reduce Formats and Hive support. >>>>>> >>>>>> We'll release version 0.8.0 around April 15, the following >>>>>> 0.9.0 will be >>>>>> released in the next few month. On the other hand, Ryan, Junjie >>>>>> Chen >>>>>> and I have done three PoC versions for the row-level deletes. We >>>>>> had >>>>>> a full discussion[4] and started to do the relevant code design. >>>>>> we're sure that >>>>>> the feature will introduce some incompatible specification, >>>>>> such as the >>>>>> sequence_number spec[1], file_type spec[2], the sortedOrder >>>>>> feature seems >>>>>> also to be a breaking change [3]. >>>>>> >>>>>> To avoid affecting the release of version 0.8.0 and push the >>>>>> row-delete feature >>>>>> early. I suggest to open a new branch for the row-delete >>>>>> feature, name it branch-1. >>>>>> Once the row-delete feature is stable, we could release the >>>>>> 1.0.0. Or we can just >>>>>> open a row-delete feature branch and once the work is done we >>>>>> will merge >>>>>> the row-delete feature branch back to master branch, and >>>>>> continue to release the 0.9.0 >>>>>> version. >>>>>> >>>>>> I guess the flink connector dev are facing the same problem ? >>>>>> >>>>>> What do you think about this ? >>>>>> >>>>>> Thank you. >>>>>> >>>>>> >>>>>> [1]. https://github.com/apache/incubator-iceberg/pull/588 >>>>>> [2]. https://github.com/apache/incubator-iceberg/issues/824 >>>>>> [3]. https://github.com/apache/incubator-iceberg/issues/317 >>>>>> [4]. >>>>>> https://docs.google.com/document/d/1CPFun2uG-eXdJggqKcPsTdNa2wPMpAdw8loeP-0fm_M/edit?usp=sharing >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Ryan Blue Software Engineer Netflix