Storing a JDBC-based table in a catalog for direct use in Spark SQL

Aaron Grubb Mon, 13 Jan 2025 10:50:53 -0800

Hi all,

I'm trying to figure out how to persist a table definition in a catalog that 
can be used from different sessions. Something along the lines
of


-------------------
CREATE TABLE spark_catalog.default.test_table (
    name string
)
USING jdbc
OPTIONS (
    driver 'com.mysql.cj.jdbc.Driver',
    url 'jdbc:mysql://example.com:3306/db',
    user 'user',
    password 'pass',
    query 'SELECT name FROM test_table WHERE type_id = 2'
)
--------------------

and then from another session, directly calling

--------------------
SparkSession.builder.getOrCreate().sql('SELECT * FROM 
spark_catalog.default.test_table').show()
--------------------

However, I'm unable to figure out how to accomplish this. I initially tried to 
define the table in a Nessie catalog but it won't accept
"USING JDBC". I have also tried setting spark.sql.warehouse.dir to a location 
in S3 and setting enableHiveSupport() on the session, however
creating this table under these circumstances only creates an empty directory 
and the table doesn't show up in the next session. Would I need
to set up a Hive Metastore to accomplish this or what other options do I have?

Thanks,
Aaron

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Storing a JDBC-based table in a catalog for direct use in Spark SQL

Reply via email to