Re: Approaching Vectorized Reading in Iceberg ..

2019-06-12 Thread Gautam
Hey Ryan and Anton, I wanted to circle back on some findings I had after taking a first stab at this .. > There’s already a wrapper to adapt Arrow to ColumnarBatch, as well as an > iterator to read a ColumnarBatch as a sequence of InternalRow. That’s > what we want to take advantage of.. This

Re: Updates/Deletes/Upserts in Iceberg

2019-06-12 Thread Owen O'Malley
> On May 21, 2019, at 1:31 PM, Jacques Nadeau wrote: > > The main thing I'm talking about is how you target a deletion across time. If > you have a file A, and you want to delete record X in A, you define delete > A.X. At the same time, another process may be compacting A into A'. In so > do

Re: Updates/Deletes/Upserts in Iceberg

2019-06-12 Thread Owen O'Malley
> On May 15, 2019, at 12:54 PM, Ryan Blue wrote: > > 2. Iceberg diff files should use synthetic keys > > A lot of the discussion on the doc is about whether natural keys are > practical or what assumptions we can make or trade about them. In my opinion, > Iceberg tables will absolutely need

Re: Updates/Deletes/Upserts in Iceberg

2019-06-12 Thread Erik Wright
Thanks for taking the time to read through this and give your feedback. I agree that we are closing in on something here. On Tue, Jun 11, 2019 at 12:36 PM Ryan Blue wrote: > Erik, thanks for working on this doc. It’s a good detailed write-up of the > approach using natural keys and I found the s