Sunny malik created SPARK-50759:
-----------------------------------
Summary: Spark catalog api bug when working with non-hms based
catalog
Key: SPARK-50759
URL: https://issues.apache.org/jira/browse/SPARK-50759
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 3.4.0, 3.3.0
Reporter: Sunny malik
Hi
I am encountering issues while working with a REST-based catalog. My Spark
session is configured with a default catalog that uses the REST-based
implementation.
The {{SparkSession.catalog}} API does not function correctly with the
REST-based catalog. This issue has been tested and observed in Spark 3.4.
----------------------------------------------------------------------------------
${SPARK_HOME}/bin/spark-shell --master local[*]
--driver-memory 2g
--conf
spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
--conf
spark.sql.catalog.iceberg.uri=[https://xx.xxx.xxxx.domain.com|https://xx.xxx.xxxx.domain.com/]
--conf spark.sql.warehouse.dir=$SQL_WAREHOUSE_DIR
--conf spark.sql.defaultCatalog=iceberg
--conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog
--conf
spark.sql.catalog.iceberg.catalog-impl=org.apache.iceberg.rest.RESTCatalog \
scala> spark.catalog.currentCatalog
res1: String = iceberg
scala> spark.sql("select * from restDb.restTable").show
+---+----------+
| id| data|
+---+----------+
| 1|some_value|
+---+----------+
scala> spark.catalog.tableExists("restDb.restTable")
*res3: Boolean = true*
scala> spark.catalog.tableExists("restDb", "restTable")
*res4: Boolean = false*
----------------------------------------------------------------------------------
API spark.catalog.tableExists(String databaseName, String tableName)
is only meant to work with HMS based catalog
([https://github.com/apache/spark/blob/5a91172c019c119e686f8221bbdb31f59d3d7776/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L224])
spark.catalog.tableExists(String databaseName, String tableName)
is meant to work with hms and non-hms based catalogs
Suggested resolutions
1. API spark.catalog.tableExists(String databaseName, String tableName) to
throw runtime exception if session catalog is non-hms based catalog
2. Deprecrate HMS specific API in newer Spark release as Spark already have API
that can work with hms and non-hms based catalogs.
Thanks
Sunny
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]