Re: Consistency problems with Iceberg + EMRFS

2021-06-29 Thread Ryan Blue
uot; > *Date: *Monday, June 28, 2021 at 11:44 AM > *To: *"dev@iceberg.apache.org" > *Subject: *Re: Consistency problems with Iceberg + EMRFS > > > > This message contains hyperlinks, take precaution before opening these > links. > > Greg, I don't think that the

Re: Consistency problems with Iceberg + EMRFS

2021-06-29 Thread Greg Hill
each metadata file once, but I’m not 100% sure I’m reading the code correctly. Greg From: Ryan Blue Reply-To: "dev@iceberg.apache.org" Date: Monday, June 28, 2021 at 11:44 AM To: "dev@iceberg.apache.org" Subject: Re: Consistency problems with Iceberg + EMRFS This message

Re: Consistency problems with Iceberg + EMRFS

2021-06-28 Thread Ryan Blue
day, June 11, 2021 at 11:14 AM > *To: *"dev@iceberg.apache.org" > *Subject: *Re: Consistency problems with Iceberg + EMRFS > > > > This message contains hyperlinks, take precaution before opening these > links. > > That makes sense to me, then. Is it possible to coordin

Re: Consistency problems with Iceberg + EMRFS

2021-06-28 Thread Greg Hill
with such a migration? Greg From: Ryan Blue Reply-To: "dev@iceberg.apache.org" Date: Friday, June 11, 2021 at 11:14 AM To: "dev@iceberg.apache.org" Subject: Re: Consistency problems with Iceberg + EMRFS This message contains hyperlinks, take precaution before opening the

Re: Consistency problems with Iceberg + EMRFS

2021-06-11 Thread Ryan Blue
; concurrent writes. > > > > *From: *Ryan Blue > *Reply-To: *"dev@iceberg.apache.org" > *Date: *Thursday, June 10, 2021 at 12:05 PM > *To: *"dev@iceberg.apache.org" > *Subject: *Re: Consistency problems with Iceberg + EMRFS > > > > This message conta

Re: Consistency problems with Iceberg + EMRFS

2021-06-11 Thread Scott Kruger
berg.apache.org" Subject: Re: Consistency problems with Iceberg + EMRFS This message contains hyperlinks, take precaution before opening these links. Yeah, Dan is right. If you want to use HDFS tables then you have to use a Hadoop FileSystem directly since the FileIO interface doesn't include

Re: Consistency problems with Iceberg + EMRFS

2021-06-10 Thread Ryan Blue
> *Date: *Thursday, June 10, 2021 at 10:36 AM > *To: *Iceberg Dev List > *Subject: *Re: Consistency problems with Iceberg + EMRFS > > > > This message contains hyperlinks, take precaution before opening these > links. > > Scott, I don't think you can use

Re: Consistency problems with Iceberg + EMRFS

2021-06-10 Thread Jack Ye
te: *Thursday, June 10, 2021 at 10:36 AM > *To: *Iceberg Dev List > *Subject: *Re: Consistency problems with Iceberg + EMRFS > > > > This message contains hyperlinks, take precaution before opening these > links. > > Scott, I don't think you can use S3FileIO w

Re: Consistency problems with Iceberg + EMRFS

2021-06-10 Thread Scott Kruger
uot; Date: Thursday, June 10, 2021 at 10:36 AM To: Iceberg Dev List Subject: Re: Consistency problems with Iceberg + EMRFS This message contains hyperlinks, take precaution before opening these links. Scott, I don't think you can use S3FileIO with HadoopTables because HadoopTables requi

Re: Consistency problems with Iceberg + EMRFS

2021-06-10 Thread Daniel Weeks
gt; > > > Not using EMRFS for the metadata is an interesting possibility. We’re > using HadoopTables currently; is there a Tables implementation that uses > S3FileIO that we can use, or can I somehow tell HadoopTables to use > S3FileIO? > > > > *From: *Jack Ye > *Reply-

Re: Consistency problems with Iceberg + EMRFS

2021-06-10 Thread Scott Kruger
-To: "dev@iceberg.apache.org" Date: Wednesday, June 9, 2021 at 4:10 PM To: "dev@iceberg.apache.org" Subject: Re: Consistency problems with Iceberg + EMRFS This message is from an external sender. Thanks for the additional detail. If you're not writing concurrently, then

Re: Consistency problems with Iceberg + EMRFS

2021-06-09 Thread Ryan Blue
> > > Not using EMRFS for the metadata is an interesting possibility. We’re > using HadoopTables currently; is there a Tables implementation that uses > S3FileIO that we can use, or can I somehow tell HadoopTables to use > S3FileIO? > > > > *From: *Jack Ye > *Reply-

Re: Consistency problems with Iceberg + EMRFS

2021-06-09 Thread Scott Kruger
somehow tell HadoopTables to use S3FileIO? From: Jack Ye Reply-To: "dev@iceberg.apache.org" Date: Tuesday, June 8, 2021 at 7:49 PM To: "dev@iceberg.apache.org" Subject: Re: Consistency problems with Iceberg + EMRFS This message was identified as a phishing scam. There are 2 pot

Re: Consistency problems with Iceberg + EMRFS

2021-06-08 Thread Jack Ye
There are 2 potential root causes I see: 1. you might be using EMRFS with DynamoDB enabled to check consistency, that leads to the DynamoDB and S3 out of sync. The quick solution is to just delete the DynamoDB consistency table, and the next read/write will recreate and resync it. After all, EMRFS

Re: Consistency problems with Iceberg + EMRFS

2021-06-08 Thread Ryan Blue
Hi Scott, I'm not quite sure what's happening here, but I should at least note that we didn't intend for HDFS tables to be used with S3. HFDS tables use an atomic rename in the file system to ensure that only one committer "wins" to produce a given version of the table metadata. In S3, renames are

Consistency problems with Iceberg + EMRFS

2021-06-08 Thread Scott Kruger
We’re using the Iceberg API (0.11.1) over raw parquet data in S3/EMRFS, basically just using the table API to issues overwrites/appends. Everything works great for the most part, but we’ve recently started to have problems with the iceberg metadata directory going out of sync. See the following