Hi

I am encountering issues while working with a REST-based catalog. My Spark
session is configured with a default catalog that uses the REST-based
implementation.

The SparkSession.catalog API does not function correctly with the
REST-based catalog. This issue has been tested and observed in Spark 3.4.

----------------------------------------------------------------------------------

${SPARK_HOME}/bin/spark-shell --master local[*]

--driver-memory 2g
--conf
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
--conf spark.sql.catalog.iceberg.uri=https://xx.xxx.xxxx.domain.com
--conf spark.sql.warehouse.dir=$SQL_WAREHOUSE_DIR
--conf spark.sql.defaultCatalog=iceberg
--conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog
--conf
spark.sql.catalog.iceberg.catalog-impl=org.apache.iceberg.rest.RESTCatalog \

scala> spark.catalog.currentCatalog
res1: String = iceberg

scala> spark.sql("select * from restDb.restTable").show
+---+----------+
| id| data|
+---+----------+
| 1|some_value|
+---+----------+

scala> spark.catalog.tableExists("restDb.restTable")
*res3: Boolean = true*

scala> spark.catalog.tableExists("restDb", "restTable")

*res4: Boolean = false*
----------------------------------------------------------------------------------

API spark.catalog.tableExists(String databaseName, String tableName)
 is only meant to work with HMS based catalog (
https://github.com/apache/spark/blob/5a91172c019c119e686f8221bbdb31f59d3d7776/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L224
)

spark.catalog.tableExists(String databaseName, String tableName)
  is meant to work with hms and non-hms based catalogs


Suggested resolutions
1. API spark.catalog.tableExists(String databaseName, String tableName) to
throw runtime exception if session catalog is non-hms based catalog
2. Deprecrate HMS specific API in newer Spark release as Spark already have
API that can work with hms and non-hms based catalogs.

Thanks
Sunny

Reply via email to