Owen, something similar has come up in a roadmap discussion of mine. I have a question about the solution you mentioned.
The requirements would be that there is a 1:1 mapping between rows in the > matching files and stripes. > Were you thinking that there would really be a 1:1 mapping and that the rows would just line up in the right order? That seems fragile to me. I would have thought that there would need to be a common key that the rows were identified by (which is more in line with HBase column families, which you referenced; so maybe this was what you meant but didn't illustrate explicitly). With that in mind, I might have written: file1.orc: struct<id:int,name:string,email:string> file2.orc: struct<id:int,lastAccess:timestamp> On Wed, Nov 28, 2018 at 1:14 PM Owen O'Malley <owen.omal...@gmail.com> wrote: > I’m not sure what use case Erik is looking for, but I’ve had users that > want to do the equivalent of HBase’s column families. They want some of the > columns to be stored separately and the merged together on read. The > requirements would be that there is a 1:1 mapping between rows in the > matching files and stripes. > > It would look like: > > file1.orc: struct<name:string,email:string> file2.orc: > struct<lastAccess:timestamp> > > It would let them leave the stable information and only re-write the > second column family when the information in the mutable column family > changes. It would also support use cases where you add data enrichment > columns after the data has been ingested. > > From there it is easy to imagine having a replace operator where file2’s > version of a column replaces file1’s version. > > .. Owen > > > On Nov 28, 2018, at 9:44 AM, Ryan Blue <rb...@netflix.com.INVALID> > wrote: > > > > What do you mean by merge on read? > > > > A few people I've talked to are interested in building delete and upsert > > features. Those would create files that track the changes, which would be > > merged at read time to apply them. Is that what you mean? > > > > rb > > > > On Tue, Nov 27, 2018 at 12:26 PM Erik Wright > > <erik.wri...@shopify.com.invalid> wrote: > > > >> Has any consideration been given to the possibility of eventual > >> merge-on-read support in the Iceberg table spec? > >> > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > >