Re: Atomic operation/table common with S3

Ryan Blue Fri, 01 Feb 2019 09:57:17 -0800

+Iceberg dev list (we've moved to Apache)

Manish,

HadoopTables should only be used in production with a file system that
supports atomic rename. That is because it uses atomic rename to ensure a
linear commit history. If you use it with S3 and two commits conflict, then
one will win but both will think they succeeded.

Using HiveTables fixes the problem by making updates while holding a table
lock. That ensures a linear history: the process that gets the lock commits
and others retry.

If you don't want a dependency on HMS and you don't need concurrent
commits, then you can use HadoopTables. It will work, but you will be
vulnerable to inconsistency.

On Thu, Jan 31, 2019 at 9:20 PM Manish Malhotra <
[email protected]> wrote:

> hello,
>
> With Iceberg, if S3 is used with HadoopTables, will it be good enough to
> do operations like adding data to Iceberg table from many job/task
> concurrently?
> Or have to use HiveTable which uses hive metastore (HMS)?
>
> As it would be great if we dont have dependency on HMS.
> As this can also lead to bottleneck because of HMS ?
>
> thanks !
>
> --
> You received this message because you are subscribed to the Google Groups
> "Iceberg Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iceberg-devel/30c2afdf-6955-42a4-8111-d462c0423594%40googlegroups.com
> <https://groups.google.com/d/msgid/iceberg-devel/30c2afdf-6955-42a4-8111-d462c0423594%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Atomic operation/table common with S3

Reply via email to