Subject: [VOTE] Release Apache Iceberg 0.12.0 RC3

2021-08-09 Thread Carl Steinbach
Hi Everyone, I propose the following RC to be released as the official Apache Iceberg 0.12.0 release. The commit ID is 7ca1044655694dbbab660d02cef360ac1925f1c2 * This corresponds to the tag: apache-iceberg-0.12.0-rc3 * https://github.com/apache/iceberg/commits/apache-iceberg-0.12.0-rc3 * https://

Re: Writing iceberg table to S3

2021-08-09 Thread Russell Spitzer
The config you used specified a catalog named "hive_prod", so to reference it you need to either "use hive_prod" or refer to the table with the catalog identifier "CREATE TABLE hive_prod.default.mytable" On Mon, Aug 9, 2021 at 6:15 PM Lian Jiang wrote: > Thanks Ryan. > > Using this command (uri

Re: Writing iceberg table to S3

2021-08-09 Thread Lian Jiang
Thanks Ryan. Using this command (uri is omitted because the uri is in hive-site.xml): spark-shell --conf spark.sql.catalog.hive_prod=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.hive_prod.type=hive This statement: spark.sql("CREATE TABLE default.mytable (uuid string) USING icebe

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Carl Steinbach
I am withdrawing RC2 from voting because of issues that were found during testing. RC3 will follow shortly. Thanks, everyone, for your help testing RC2! - Carl On Mon, Aug 9, 2021 at 1:44 PM Szehon Ho wrote: > Got it, I somehow thought changes were manually cherry-picked, thanks for > clarific

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Szehon Ho
Got it, I somehow thought changes were manually cherry-picked, thanks for clarification. Thanks Szehon > On 9 Aug 2021, at 13:34, Ryan Blue wrote: > > Szehon, I think that should make it because the RC will come from master. > > On Mon, Aug 9, 2021 at 12:56 PM Szehon Ho wrote: > If it’s easy

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Wing Yew Poon
https://github.com/apache/iceberg/pull/2954 should be ready to merge. The CI passed. On Mon, Aug 9, 2021 at 9:08 AM Wing Yew Poon wrote: > Ryan, > Thanks for the review. Let me look into implementing your refactoring > suggestion. > - Wing Yew > > > On Mon, Aug 9, 2021 at 8:41 AM Ryan Blue wro

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Ryan Blue
Szehon, I think that should make it because the RC will come from master. On Mon, Aug 9, 2021 at 12:56 PM Szehon Ho wrote: > If it’s easy, would it make sense to include Russell’s fix as well for > Metadata tables query , as it affects Spark 3.1 (a regression from Spark > 3.0)? https://github.c

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Szehon Ho
If it’s easy, would it make sense to include Russell’s fix as well for Metadata tables query , as it affects Spark 3.1 (a regression from Spark 3.0)? https://github.com/apache/iceberg/pull/2877/files The issue : https://github.com/apache/iceberg/issues/2783 was at some point marked for 0.12 re

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Ryan Blue
Thanks for pointing that one out, Jack! That would be good to get in as well. On Mon, Aug 9, 2021 at 11:02 AM Jack Ye wrote: > If we are considering recutting the branch, please also include this PR > https://github.com/apache/iceberg/pull/2943 which fixes the validation > when creating a schema

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Jack Ye
If we are considering recutting the branch, please also include this PR https://github.com/apache/iceberg/pull/2943 which fixes the validation when creating a schema with identifier fields, thank you! -Jack Ye On Mon, Aug 9, 2021 at 9:08 AM Wing Yew Poon wrote: > Ryan, > Thanks for the review.

Re: Writing iceberg table to S3

2021-08-09 Thread Ryan Blue
Lian, I think we should improve the docs for catalogs since it isn’t clear. We have a few configuration pages that are helpful, but it looks like they assume you know what your options are already. Take a look at the Spark docs for catalogs, which is the closest we have right now: https://iceberg.

Re: Writing iceberg table to S3

2021-08-09 Thread Lian Jiang
Thanks Eduard and Ryan. I use spark on a K8S cluster to write parquet on s3 and then add an external table in hive metastore for this parquet. In the future, when using iceberg, I prefer hive metadata store since it is my centralized metastore for batch and streaming datasets. I don't see that hiv

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Wing Yew Poon
Ryan, Thanks for the review. Let me look into implementing your refactoring suggestion. - Wing Yew On Mon, Aug 9, 2021 at 8:41 AM Ryan Blue wrote: > Yeah, I agree. We should fix this for the 0.12.0 release. That said, I > plan to continue testing this RC because it won't change that much since

Re: Writing iceberg table to S3

2021-08-09 Thread Ryan Blue
Lian, Iceberg tables work great in S3. When creating the table, just pass the `LOCATION` clause with an S3 path, or set your catalog's warehouse location to S3 so tables are automatically created there. The only restriction for S3 is that you need a metastore to track the table metadata location

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Ryan Blue
Yeah, I agree. We should fix this for the 0.12.0 release. That said, I plan to continue testing this RC because it won't change that much since this affects the Spark extensions in 3.1. Other engines and Spark 3.0 or older should be fine. I left a comment on the PR. I think it looks good, but we s

Re: Writing iceberg table to S3

2021-08-09 Thread Eduard Tudenhoefner
Lian you can have a look at https://iceberg.apache.org/aws/. It should contain all the info that you need. The codebase contains a *S3FileIO *class, which is an implementation that is backed by S3. On Mon, Aug 9, 2021 at 7:37 AM Lian Jiang wrote: > I am reading https://iceberg.apache.org/spark-w