Re: Spark configuration on hive catalog

Russell Spitzer Thu, 22 Apr 2021 10:36:00 -0700

One thing to double check is that you have setup your spark client to use a 
Hive Catalog for the session catalog. It is possible you are using a derby 
based session catalog which the iceberg catalog is wrapping. See


https://github.com/apache/iceberg/issues/2488 
<https://github.com/apache/iceberg/issues/2488>

Make sure that 

spark.sql.catalogimplementation = hive



> On Apr 22, 2021, at 12:30 PM, Szehon Ho <szehon...@apple.com.INVALID> wrote:
> 
> Hi Huadong, nice to see you again :).  The syntax is spark-sql is ‘insert 
> into <catalog>.<db>.<table> …”, here you defined your db as a catalog?  
> 
> You just need to define one catalog and use it when referring to your table.
> 
> 
> 
>> On 22 Apr 2021, at 07:34, Huadong Liu <huadong...@gmail.com 
>> <mailto:huadong...@gmail.com>> wrote:
>> 
>> Hello Iceberg Dev,
>> 
>> I am not sure I follow the discussion on Spark configurations on hive 
>> catalogs <https://iceberg.apache.org/spark-configuration/#catalogs>. I 
>> created an iceberg table with the hive catalog.
>> Configuration conf = new Configuration();
>> conf.set("hive.metastore.uris", args[0]);
>> conf.set("hive.metastore.warehouse.dir", args[1]);
>> 
>> HiveCatalog catalog = new HiveCatalog(conf);
>> ImmutableMap meta = ImmutableMap.of(...);
>> Schema schema = new Schema(...);
>> PartitionSpec spec = PartitionSpec.builderFor(schema)...build();
>> 
>> TableIdentifier name = TableIdentifier.of("my_db", "my_table");
>> Table table = catalog.createTable(name, schema, spec);
>> On a box with hive.metastore.uris set correctly in hive-site.xml, spark-sql 
>> runs fine with 
>> 
>> spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.1
>> --conf 
>> spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
>> --conf spark.sql.catalog.spark_catalog.type=hive
>> spark-sql> INSERT INTO my_db.my_table VALUES ("111", timestamp 'today', 1), 
>> ("333", timestamp 'today', 3);
>> spark-sql> SELECT * FROM my_db.my_table ;
>> 
>> However, if I follow the Spark hive configuration above to add a table 
>> catalog,
>> 
>> spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.1
>> --conf 
>> spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
>> --conf spark.sql.catalog.spark_catalog.type=hive 
>> --conf spark.sql.catalog.my_db=org.apache.iceberg.spark.SparkCatalog 
>> --conf spark.sql.catalog.my_db.type=hive
>> spark-sql> INSERT INTO my_db.my_table VALUES ("111", timestamp 'today', 1), 
>> ("333", timestamp 'today', 3);
>> Error in query: Table not found: my_db.my_table;
>> 
>> https://iceberg.apache.org/spark/#reading-an-iceberg-table 
>> <https://iceberg.apache.org/spark/#reading-an-iceberg-table> states that "To 
>> use Iceberg in Spark, first configure Spark catalogs." Did I misunderstand 
>> anything? Do I have to configure catalog/namespace? Thanks for your time on 
>> this.
>> 
>> --
>> Huadong
>

Re: Spark configuration on hive catalog

Reply via email to