Hi Amogh,

The version hint file is used for Hadoop tables, which are named that way
because they are intended for HDFS. We also use them for local FS tests,
but they can't be safely used concurrently with S3. For S3, you'll need a
metastore to enforce atomicity when swapping table metadata locations. You
can use the one in iceberg-hive to use the Hive metastore.

rb

On Fri, Mar 1, 2019 at 3:29 PM amogh margoor <amarg...@gmail.com> wrote:

> Hi,
> We were trying out Iceberg with Spark and happen to look into the code
> responsible for writing version-hint file.
> In the following code snippet:
>
> private void writeVersionHint(int version) {
>   Path versionHintFile = versionHintFile();
>   FileSystem fs = getFS(versionHintFile, conf);
>
>   try (FSDataOutputStream out = fs.create(versionHintFile, true /* overwrite 
> */ )) {
>     out.write(String.valueOf(version).getBytes("UTF-8"));
>
>   } catch (IOException e) {
>     LOG.warn("Failed to update version hint", e);
>   }
> }
>
>
> We observe that version-hint file is overwritten always with the same file
> name on S3. This ensures that when the `version-hint.text` is created for
> first time, HEAD calls to S3 object is avoided as `overwritten=true` in FS
> call. This ensures we do not hit Eventual consistency issue while reading
> the newly created file. But we were concerned that when file gets
> overwritten multiple times, read can see older versions of the file due to
> EC issue. This is because S3 is eventual consistent with overwritten PUTs
> and DELETEs ("Amazon S3 offers eventual consistency for overwrite PUTS
> and DELETES in all regions." [1]).
>
> Let us know if this is a known issue or we are missing something here. If
> it's a known issue what might be the repercussions.
>
> [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to