Re: Spark configuration on hive catalog

Huadong Liu Thu, 22 Apr 2021 11:45:42 -0700

Thank you Szehon, Russell. Yeah, glad to see you here, Szehon!

My bad! I was confused by the .db surfix when a catalog namespace is
created (createDatabase internally). Things work as expected when I use
<catalog_name>.<db_name_without_db_surfix>.<table_name>.


--
Huadong

On Thu, Apr 22, 2021 at 10:31 AM Szehon Ho <[email protected]>
wrote:

> Hi Huadong, nice to see you again :).  The syntax is spark-sql is ‘insert
> into <catalog>.<db>.<table> …”, here you defined your db as a catalog?
>
> You just need to define one catalog and use it when referring to your
> table.
>
>
>
> On 22 Apr 2021, at 07:34, Huadong Liu <[email protected]> wrote:
>
> Hello Iceberg Dev,
>
> I am not sure I follow the discussion on Spark configurations on hive
> catalogs <https://iceberg.apache.org/spark-configuration/#catalogs>. I
> created an iceberg table with the hive catalog.
>
> Configuration conf = new Configuration();
> conf.set("hive.metastore.uris", args[0]);
> conf.set("hive.metastore.warehouse.dir", args[1]);
>
> HiveCatalog catalog = new HiveCatalog(conf);
> ImmutableMap meta = ImmutableMap.of(...);
> Schema schema = new Schema(...);
> PartitionSpec spec = PartitionSpec.builderFor(schema)...build();
>
> TableIdentifier name = TableIdentifier.of("my_db", "my_table");
> Table table = catalog.createTable(name, schema, spec);
>
> On a box with *hive.metastore.uris *set correctly in *hive-site.xml,* 
> spark-sql
> runs fine with
>
> spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.1
> --conf
> spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
> --conf spark.sql.catalog.spark_catalog.type=hive
> spark-sql> INSERT INTO my_db.my_table VALUES ("111", timestamp 'today',
> 1), ("333", timestamp 'today', 3);
> spark-sql> SELECT * FROM my_db.my_table ;
>
> However, if I follow the Spark hive configuration above to add a table
> catalog,
>
> spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.1
> --conf
> spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
> --conf spark.sql.catalog.spark_catalog.type=hive
> --conf spark.sql.catalog.my_db=org.apache.iceberg.spark.SparkCatalog
> --conf spark.sql.catalog.my_db.type=hive
> spark-sql> INSERT INTO my_db.my_table VALUES ("111", timestamp 'today',
> 1), ("333", timestamp 'today', 3);
> Error in query: Table not found: my_db.my_table;
>
> https://iceberg.apache.org/spark/#reading-an-iceberg-table states
> that "To use Iceberg in Spark, first configure Spark catalogs." Did I
> misunderstand anything? Do I have to configure catalog/namespace? Thanks
> for your time on this.
>
> --
> Huadong
>
>
>

Re: Spark configuration on hive catalog

Reply via email to