Upserts in Iceberg

Jacques Nadeau Tue, 21 May 2019 14:12:32 -0700

>
> It’s not at all clear why unique keys would be needed at all.


If we turn your questions around, you answer yourself. If you have
independent writers, you need unique keys.

Also truly independent writers (like a job writing while a job compacts),
> means effectively a distributed transaction, and I believe it’s clearly out
> of scope for Iceberg to solve that ?
>

Assuming a single process is writing seems severely limiting in design and
scale. I'm also surprised that you would think this is outside of Iceberg's
scope. A table format that can only be modified by a single process
basically locks that format into a single tool for a particular deployment.

Uniqueness - enforcing uniqueness at scale is not feasible (proovably so).


Expecting uniqueness is different than enforcing it. If you're saying it is
impossible to enforce, I understand that. If your we can't define a system
where it is expected and there are ramifications if it is not maintained.

Also, at scale, it’s really only feasible to do query and update/upsert on
> the partition/bucket/sort key, any other access is likely a full scan of
> terabytes of data, on remote storage.


I'm not sure why you would say unless you assume a particular
implementation. Single record deletion is definitely an important use case.
There is no need to do a full table scan to accomplish that unless you're
assuming an eager approach to deletion.

I do continue to wonder how much of this back and forth is the mixing of
thinking around restatement (eager) versus delta (lazy) implementations.
Maybe we should separate them out as two different conversations?

Re: Updates/Deletes/Upserts in Iceberg

Reply via email to