Re: Write reliability in Iceberg

2020-01-29 Thread Ashish Mehta
I commented on the issue, IIUC the issue is due to behavior of rename API, its about availability of `version-hint.text` file in case of `create with overwrite flag` which is a filesystem call, after rename API is calling to rename metadata. The issue is that readers are facing NPE, when some other

Re: Write reliability in Iceberg

2020-01-28 Thread Gautam
Thanks Ryan and Suds for the suggestions, we are looking into these options. We currently don't have any external catalog or locking service and depend purely on commit retries. Additionally, we don't have any of our meta data in Hive Metastore, and, we want to leverage the underlying filesystem t

Re: Write reliability in Iceberg

2020-01-28 Thread Ryan Blue
Thanks for pointing out those references, suds! And thanks to Mouli (for writing the doc) and Anton (for writing the test)! On Tue, Jan 28, 2020 at 2:05 PM suds wrote: > We have referred https://iceberg.incubator.apache.org/custom-catalog/ and > implemented atomic operation using dynamo optimis

Re: Write reliability in Iceberg

2020-01-28 Thread suds
We have referred https://iceberg.incubator.apache.org/custom-catalog/ and implemented atomic operation using dynamo optimistic locking. Iceberg codebase has has excellent test case to validate custom implementation. https://github.com/apache/incubator-iceberg/blob/master/hive/src/test/java/org/apac

Re: Write reliability in Iceberg

2020-01-28 Thread Ryan Blue
Hi Gautam, Hadoop tables are not intended to be used when the file system doesn't support atomic rename because of the problems you describe. Atomic rename is a requirement for correctness in Hadoop tables. That is why we also have metastore tables, where some other atomic swap is used. I strongl