Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Jacques Nadeau
I agree with Anton that we should probably spend some time on hangouts further discussing things. Definitely differing expectations here and we seem to be talking a bit past each other. -- Jacques Nadeau CTO and Co-Founder, Dremio On Tue, May 21, 2019 at 3:44 PM Cristian Opris wrote: > I love a

Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Cristian Opris
I love a good flame war :P > On 21 May 2019, at 22:57, Jacques Nadeau wrote: > > > That's my point, truly independent writers (two Spark jobs, or a Spark job > and Dremio job) means a distributed transaction. It would need yet another > external transaction coordinator on top of both Spark an

Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Anton Okolnychyi
I would propose to have a series of sessions over hangouts to clarify all pending points. We can start this week if there is a timeslot that works for everyone. Potential topics (feel free to suggest yours): - Use cases I believe it is critical that everyone is on the same page when it comes to

Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Jacques Nadeau
> That's my point, truly independent writers (two Spark jobs, or a Spark job > and Dremio job) means a distributed transaction. It would need yet another > external transaction coordinator on top of both Spark and Dremio, Iceberg > by itself > cannot solve this. > I'm not ready to accept this. Ice

Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Cristian Opris
Another point to add to my list of clarifications, which I hope will help understand what the document is proposing better: The delete diff files can be simply list of keys to delete from previous snapshots. Natural keys or synthetic keys, it really doesn't matter. This is simple and elegant, i

Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Cristian Opris
Hi Jacques, > On 21 May 2019, at 22:11, Jacques Nadeau wrote: > > It’s not at all clear why unique keys would be needed at all. > > If we turn your questions around, you answer yourself. If you have > independent writers, you need unique keys. > > Also truly independent writers (like a job

Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Jacques Nadeau
> > It’s not at all clear why unique keys would be needed at all. If we turn your questions around, you answer yourself. If you have independent writers, you need unique keys. Also truly independent writers (like a job writing while a job compacts), > means effectively a distributed transaction,

Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Cristian Opris
Hi Jacques, It’s not at all clear why unique keys would be needed at all. Also truly independent writers (like a job writing while a job compacts), means effectively a distributed transaction, and I believe it’s clearly out of scope for Iceberg to solve that ? > On 21 May 2019, at 21:31, Jacq

Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Cristian Opris
Synthetic vs natural keys generated a lot of discussion internally when coming up with the proposal :) There is lots of detail and lots of subtelty but a few things that may help explain our thinking: Scale - Iceberg is for remote storage at scale, not local disk. This means mainly operating on

Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Jacques Nadeau
It would be useful to describe the types of concurrent operations that > would be supported (i.e., failed snapshotting could easily be recovered, > vs. the whole operation needing to be re-executed) vs. those that wouldn't. > Solving for unlimited concurrency cases may create way more complexity th

Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Erik Wright
On Tue, May 21, 2019 at 2:01 PM Jacques Nadeau wrote: > I think we just need to have further discussion about keys. Ryan said: > > 3. Synthetic keys should be based on filename and position > > > But I'm not clear there is consensus around that. I'm also not sure > whether he means lossless inclu

Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Jacques Nadeau
I think we just need to have further discussion about keys. Ryan said: 3. Synthetic keys should be based on filename and position But I'm not clear there is consensus around that. I'm also not sure whether he means lossless inclusion, simply derived-from or something else. My thinking before is

Re: Updates/Deletes/Upserts in Iceberg

2019-05-21 Thread Erik Wright
On Thu, May 16, 2019 at 4:13 PM Ryan Blue wrote: > Replies inline. > > On Thu, May 16, 2019 at 10:07 AM Erik Wright > wrote: > >> I would be happy to participate. Iceberg with merge-on-read capabilities >> is a technology choice that my team is actively considering. It appears >> that our scenar