We have referred https://iceberg.incubator.apache.org/custom-catalog/ and
implemented atomic operation using dynamo optimistic locking. Iceberg
codebase has has excellent test case to validate custom implementation.
https://github.com/apache/incubator-iceberg/blob/master/hive/src/test/java/org/apache/iceberg/hive/TestHiveTableConcurrency.java


On Tue, Jan 28, 2020 at 1:35 PM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Hi Gautam,
>
> Hadoop tables are not intended to be used when the file system doesn't
> support atomic rename because of the problems you describe. Atomic rename
> is a requirement for correctness in Hadoop tables.
>
> That is why we also have metastore tables, where some other atomic swap is
> used. I strongly recommend using a metastore-based solution when your
> underlying file system doesn't support atomic rename, like the Hive
> catalog. We've also made it easy to plug in your own metastore using the
> `BaseMetastore` classes.
>
> That said, if you have an idea to make Hadoop tables better, I'm all for
> getting it in. But version hint file aside, without atomic rename, two
> committers could still conflict and cause one of the commits to be dropped
> because the second one to create any particular version's metadata file may
> succeed. I don't see a way around this.
>
> If you don't want to use a metastore, then you could rely on a write lock
> provided by ZooKeeper or something similar.
>
> On Tue, Jan 28, 2020 at 12:22 PM Gautam <gautamkows...@gmail.com> wrote:
>
>> Hello Devs,
>>                      We are currently working on building out a high
>> write throughput pipeline with Iceberg where hundreds or thousands of
>> writers (and thousands of readers) could be accessing a table at any given
>> moment. We are facing the issue called out by [1]. According to Iceberg's
>> spec on write reliability [2], the writers depend on an atomic swap, which
>> if fails should retry. While this may be true there can be instances where
>> the current write has the latest table state but still fails to perform the
>> swap or even worse the Reader sees an inconsistency while the write is
>> being made. To my understanding, this stems from the fact that the current
>> code [3] that does the swap assumes that the underlying filesystem provides
>> an atomic rename api ( like hdfs et al) to the version hint file which
>> keeps track of the current version. If the filesystem does not provide this
>> then it fails with a fatal error. I think Iceberg should provide some
>> resiliency here in committing the version once it knows that the latest
>> table state is still valid and more importantly ensure the readers never
>> fail during commit. If we agree I can work on adding this into Iceberg.
>>
>> How are folks handling write/read consistency cases where the underlying
>> fs doesn't provide atomic apis for file overwrite/rename?  We'v outlined
>> the details in the attached issue#758 [1] .. What do folks think?
>>
>> Cheers,
>> -Gautam.
>>
>> [1] - https://github.com/apache/incubator-iceberg/issues/758
>> [2] - https://iceberg.incubator.apache.org/reliability/
>> [3] -
>> https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java#L220
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to