Dear All,

I am facing a strange issue with Spark 2.3, where I would like to create a
MANAGED table out of the content of a DataFrame with the storage path
overridden.

Apparently, when one tries to create a Hive table via
DataFrameWriter.saveAsTable, supplying a "path" option causes Spark to
automatically create an external table.

This demonstrates the behaviour:

scala> val numbersDF = sc.parallelize((1 to 100).toList).toDF("numbers")
numbersDF: org.apache.spark.sql.DataFrame = [numbers: int]

scala> numbersDF.write.format("orc").saveAsTable("numbers_table1")

scala> spark.sql("describe formatted
numbers_table1").filter(_.get(0).toString == "Type").show
+--------+---------+-------+
|col_name|data_type|comment|
+--------+---------+-------+
|    Type|  MANAGED|       |
+--------+---------+-------+


scala> numbersDF.write.format("orc").option("path",
"/user/foobar/numbers_table_data").saveAsTable("numbers_table2")

scala> spark.sql("describe formatted
numbers_table2").filter(_.get(0).toString == "Type").show
+--------+---------+-------+
|col_name|data_type|comment|
+--------+---------+-------+
|    Type| EXTERNAL|       |
+--------+---------+-------+



I am wondering if there is any way to force creation of a managed table
with a custom path (which as far as I know, should be possible via standard
Hive commands).

I often seem to have the problem that I cannot find the appropriate
documentation for the option configuration of Spark APIs. Could someone
please point me to the right direction and tell me where these things are
documented?

Thanks,
Peter

Reply via email to