Hello Devs,
                     We are currently working on building out a high write
throughput pipeline with Iceberg where hundreds or thousands of writers
(and thousands of readers) could be accessing a table at any given moment.
We are facing the issue called out by [1]. According to Iceberg's spec on
write reliability [2], the writers depend on an atomic swap, which if fails
should retry. While this may be true there can be instances where the
current write has the latest table state but still fails to perform the
swap or even worse the Reader sees an inconsistency while the write is
being made. To my understanding, this stems from the fact that the current
code [3] that does the swap assumes that the underlying filesystem provides
an atomic rename api ( like hdfs et al) to the version hint file which
keeps track of the current version. If the filesystem does not provide this
then it fails with a fatal error. I think Iceberg should provide some
resiliency here in committing the version once it knows that the latest
table state is still valid and more importantly ensure the readers never
fail during commit. If we agree I can work on adding this into Iceberg.

How are folks handling write/read consistency cases where the underlying fs
doesn't provide atomic apis for file overwrite/rename?  We'v outlined the
details in the attached issue#758 [1] .. What do folks think?

Cheers,
-Gautam.

[1] - https://github.com/apache/incubator-iceberg/issues/758
[2] - https://iceberg.incubator.apache.org/reliability/
[3] -
https://github.com/apache/incubator-iceberg/blob/master/core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java#L220

Reply via email to