Re: Writing iceberg table to S3

2021-08-11 Thread Lian Jiang
`SET iceberg.mr.catalog=hive` works!!! Thanks Ryan, you rock!!! You may consider adding the below into iceberg document to help other newcomers. Add `SET iceberg.mr.catalog=hive` to https://iceberg.apache.org/hive/. Add `.tableProperty("location", filePath)` to https://iceberg.apache.org/spark-wri

Re: Writing iceberg table to S3

2021-08-11 Thread Ryan Blue
Looks like the table is set up correctly. I think the problem might be how Hive is configured. I think by default it will try to load tables by location in 0.11.1. You need to tell it to load tables as metastore tables, not HDFS tables by running `SET iceberg.mr.catalog=hive`. On Wed, Aug 11, 2021

Re: Writing iceberg table to S3

2021-08-11 Thread Lian Jiang
hive> describe formatted mytable3; OK # col_name data_type comment value int # Detailed Table Information Database: mydb OwnerType: USER Owner: root CreateTime: Wed Aug 11 20:02:14 UTC 2021 LastAcc

Re: Writing iceberg table to S3

2021-08-11 Thread Ryan Blue
Can you run `DESCRIBE FORMATTED` for the table? Then we can see if there is a storage handler set up for it. On Wed, Aug 11, 2021 at 1:46 PM Lian Jiang wrote: > Thanks guys. tableProperty("location", ...) works. > > I have trouble making hive query an iceberg table by following > https://iceberg

Re: Writing iceberg table to S3

2021-08-11 Thread Lian Jiang
Thanks guys. tableProperty("location", ...) works. I have trouble making hive query an iceberg table by following https://iceberg.apache.org/hive/. I have done: * in Hive shell, do `add jar /path/to/iceberg-hive-runtime.jar;` * in hive-site.xml, add hive.vectorized.execution.enabled=false and ic

Re: Writing iceberg table to S3

2021-08-11 Thread Ryan Blue
The problem for #3 is how Spark handles the options. The option method sets write options, not table properties. The write options aren’t passed when creating the table. Instead, you should use tableProperty("location", ...). Ryan On Wed, Aug 11, 2021 at 9:17 AM Russell Spitzer wrote: > 2) Hive

Re: Writing iceberg table to S3

2021-08-11 Thread Russell Spitzer
2) Hive cannot read Iceberg tables without configuring the MR Hive integration from iceberg. So you shouldn't see it in hive unless you have configured that, see https://iceberg.apache.org/hive/. 3) https://github.com/apache/iceberg/blob/master/spark3/src/main/java/org/apache/iceberg/spark/Sp

Re: Writing iceberg table to S3

2021-08-11 Thread Lian Jiang
Any help is highly appreciated! On Tue, Aug 10, 2021 at 11:06 AM Lian Jiang wrote: > Thanks Russell. > > I tried: > > /spark/bin/spark-shell --packages > org.apache.iceberg:iceberg-hive-runtime:0.11.1,org.apache.iceberg:iceberg-spark3-runtime:0.11.1 > --conf spark.sql.catalog.hive_test=org.apach

Re: Writing iceberg table to S3

2021-08-10 Thread Lian Jiang
Thanks Russell. I tried: /spark/bin/spark-shell --packages org.apache.iceberg:iceberg-hive-runtime:0.11.1,org.apache.iceberg:iceberg-spark3-runtime:0.11.1 --conf spark.sql.catalog.hive_test=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.hive_test.type=hive import org.apache.spark

Re: Writing iceberg table to S3

2021-08-10 Thread Russell Spitzer
Specify a property of "location" when creating the table. Just add a ".option("location", "path")" > On Aug 10, 2021, at 11:15 AM, Lian Jiang wrote: > > Thanks Russell. This helps a lot. > > I want to specify a HDFS location when creating an iceberg dataset using > dataframe api. All examples

Re: Writing iceberg table to S3

2021-08-10 Thread Lian Jiang
Thanks Russell. This helps a lot. I want to specify a HDFS location when creating an iceberg dataset using dataframe api. All examples using warehouse location are SQL. Do you have an example for dataframe API? For example, how to support HDFS/S3 location in the query below? The reason I ask is th

Re: Writing iceberg table to S3

2021-08-09 Thread Russell Spitzer
The config you used specified a catalog named "hive_prod", so to reference it you need to either "use hive_prod" or refer to the table with the catalog identifier "CREATE TABLE hive_prod.default.mytable" On Mon, Aug 9, 2021 at 6:15 PM Lian Jiang wrote: > Thanks Ryan. > > Using this command (uri

Re: Writing iceberg table to S3

2021-08-09 Thread Lian Jiang
Thanks Ryan. Using this command (uri is omitted because the uri is in hive-site.xml): spark-shell --conf spark.sql.catalog.hive_prod=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.hive_prod.type=hive This statement: spark.sql("CREATE TABLE default.mytable (uuid string) USING icebe

Re: Writing iceberg table to S3

2021-08-09 Thread Ryan Blue
Lian, I think we should improve the docs for catalogs since it isn’t clear. We have a few configuration pages that are helpful, but it looks like they assume you know what your options are already. Take a look at the Spark docs for catalogs, which is the closest we have right now: https://iceberg.

Re: Writing iceberg table to S3

2021-08-09 Thread Lian Jiang
Thanks Eduard and Ryan. I use spark on a K8S cluster to write parquet on s3 and then add an external table in hive metastore for this parquet. In the future, when using iceberg, I prefer hive metadata store since it is my centralized metastore for batch and streaming datasets. I don't see that hiv

Re: Writing iceberg table to S3

2021-08-09 Thread Ryan Blue
Lian, Iceberg tables work great in S3. When creating the table, just pass the `LOCATION` clause with an S3 path, or set your catalog's warehouse location to S3 so tables are automatically created there. The only restriction for S3 is that you need a metastore to track the table metadata location

Re: Writing iceberg table to S3

2021-08-09 Thread Eduard Tudenhoefner
Lian you can have a look at https://iceberg.apache.org/aws/. It should contain all the info that you need. The codebase contains a *S3FileIO *class, which is an implementation that is backed by S3. On Mon, Aug 9, 2021 at 7:37 AM Lian Jiang wrote: > I am reading https://iceberg.apache.org/spark-w

Writing iceberg table to S3

2021-08-08 Thread Lian Jiang
I am reading https://iceberg.apache.org/spark-writes/#spark-writes and wondering if it is possible to create an iceberg table on S3. This guide seems to say only write to a hive table (backed up by HDFS if I understand correctly). Hudi and Delta can write to s3 with a specified S3 path. How can I d