I think we should abstract the API firstly, then implement the MOR. COW is also a necessary implementation, but it's easy to implement and no so urgent.
On Tue, Mar 3, 2020 at 3:45 PM Junjie Chen <chenjunjied...@gmail.com> wrote: > Thanks, Ryan > > Maybe the discussion is very clear before. Actually, we have built an > internal implementation for update and delete via copy on write mode. Some > others may also have their internal implementation as well. What I propose > is to provide a general framework or APIs set that support both copy on > write and merge on read, then people could share their COW implementation > to community and prepare some job for MOR as well. For example, we could > define row level update, mergeinto APIs and a table property indicates the > underlying mode, then one could share implementation under the cow branch > according to table property. > > There should have other ways to build the general framework, just want to > know that do we want both COW and MOR implementation or just keep the MOR? > > > On Tue, Mar 3, 2020 at 8:53 AM Ryan Blue <rb...@netflix.com.invalid> > wrote: > >> It should be possible to build an implementation of MERGE INTO in Spark >> now, using the validation that Anton added in #351 >> <https://github.com/apache/incubator-iceberg/pull/351>. I think he can >> provide some more context. >> >> On Wed, Feb 26, 2020 at 7:42 AM Junjie Chen <chenjunjied...@gmail.com> >> wrote: >> >>> Hi devs >>> >>> We are working on row level delete milestone for upsert feature in merge >>> on read mode. In the meantime, I think it may be useful to have a copy on >>> write implementation. For example, we can implement upsert with spark, so >>> that we can finalize the common APIs that upsert may need and also we could >>> discover some capabilities that spark should provide. What do you think? >>> >>> -- >>> Best Regards >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > > > -- > Best Regards >