Hi Ryan The reason I suggest to open a new dev branch for row-delete development is: we will split the whole feature into many small issues and each issue will have a pull request with appropriate length of code so the contributors/reviewers can discuss one point each time and make this feature a faster iteration. In the process of implementation, we will ensure that the v1 works for every separate PR but it may not ready for cutting release, for example, when release the 0.8.0 I'm sure we won't like the release version contains part of the v2 spec(such as provide the sequence_number, but no file_type). The spark reader/writer and data/delete manifest may also need some code refactor, it's possible to put them into several PR. Splitting into multiple Pull Requests may block the release of the new version for a certain period of time, that's not we want to see.
About the new branch maintenance, in my experience we could rebase the new branch with master periodly(such as rebase for every three days), so that the new pull request for row-delete will be designed based on the newest changes. It should work for the master which would not have too many new change. This is in line with our current situation. In this case, I weighed the maintenance costs of the new branch against the delay of the row-delete. I think we should let the row-delete go a little faster (almost all community users are looking forward to this feature), and I think the current maintenance cost is acceptable. Thanks On Tue, Mar 31, 2020 at 5:52 AM Ryan Blue <rb...@netflix.com.invalid> wrote: > Sorry, I didn't address the suggestion to add a Flink branch as well. The > work needed for the Flink sink is to remove parts that are specific to > Netflix, so I'm not sure what the rationale for a branch would be. Is there > a reason why this can't be done in master, but requires a shared branch? If > multiple people want to contribute, why not contribute to the same PR? > > A shared PR branch makes the most sense to me for this because it is > regularly tested against master. > > On Mon, Mar 30, 2020 at 2:48 PM Ryan Blue <rb...@netflix.com> wrote: > >> I think we will eventually may want a branch, but I think it is too early >> to create one now. >> >> Branches are expensive. They require maintenance to stay in sync with >> master, usually copying changes from master into the branch with updates. >> Updating the changes to master for the branch is more difficult because it >> is usually not the original contributor or reviewer porting them. And it is >> better to catch problems between changes in master and the branch early. >> >> I'm not against branches, but I don't want to create them unless they are >> valuable. In this case, I don't see the value. We plan to add v2 in >> parallel so you can still write v1 tables for compatibility, and most of >> the work that needs to be done -- like creating readers and writers for >> diff formats -- can be done in master. >> >> rb >> >> On Mon, Mar 30, 2020 at 9:00 AM Gautam <gautamkows...@gmail.com> wrote: >> >>> Thanks for bringing this up OpenInx. That's a great idea: to open a >>> separate branch for row-level deletes. >>> >>> I would like to help support/contribute/review this as well. If there >>> are sub-tasks you guys have identified that can be added to >>> https://github.com/apache/incubator-iceberg/milestone/4 we can start >>> taking those up too. >>> >>> thanks for the good work, >>> - Gautam. >>> >>> >>> >>> On Mon, Mar 30, 2020 at 8:39 AM Junjie Chen <chenjunjied...@gmail.com> >>> wrote: >>> >>>> +1 to create the branch. Some row-level delete subtasks must be based >>>> on the sequence number as well as end to end tests. >>>> >>>> On Fri, Mar 27, 2020 at 4:42 PM OpenInx <open...@gmail.com> wrote: >>>> >>>>> Dear Dev: >>>>> >>>>> Tuesday, we had a sync meeting. and discussed about the things: >>>>> 1. cut the 0.8.0 release; >>>>> 2. flink connector ; >>>>> 3. iceberg row-level delete; >>>>> 4. Map-Reduce Formats and Hive support. >>>>> >>>>> We'll release version 0.8.0 around April 15, the following 0.9.0 >>>>> will be >>>>> released in the next few month. On the other hand, Ryan, Junjie >>>>> Chen >>>>> and I have done three PoC versions for the row-level deletes. We >>>>> had >>>>> a full discussion[4] and started to do the relevant code design. >>>>> we're sure that >>>>> the feature will introduce some incompatible specification, such >>>>> as the >>>>> sequence_number spec[1], file_type spec[2], the sortedOrder >>>>> feature seems >>>>> also to be a breaking change [3]. >>>>> >>>>> To avoid affecting the release of version 0.8.0 and push the >>>>> row-delete feature >>>>> early. I suggest to open a new branch for the row-delete feature, >>>>> name it branch-1. >>>>> Once the row-delete feature is stable, we could release the >>>>> 1.0.0. Or we can just >>>>> open a row-delete feature branch and once the work is done we >>>>> will merge >>>>> the row-delete feature branch back to master branch, and continue >>>>> to release the 0.9.0 >>>>> version. >>>>> >>>>> I guess the flink connector dev are facing the same problem ? >>>>> >>>>> What do you think about this ? >>>>> >>>>> Thank you. >>>>> >>>>> >>>>> [1]. https://github.com/apache/incubator-iceberg/pull/588 >>>>> [2]. https://github.com/apache/incubator-iceberg/issues/824 >>>>> [3]. https://github.com/apache/incubator-iceberg/issues/317 >>>>> [4]. >>>>> https://docs.google.com/document/d/1CPFun2uG-eXdJggqKcPsTdNa2wPMpAdw8loeP-0fm_M/edit?usp=sharing >>>>> >>>>> >>>> >>>> -- >>>> Best Regards >>>> >>> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > > > -- > Ryan Blue > Software Engineer > Netflix >