Dear All, I am facing a strange issue with Spark 2.3, where I would like to create a MANAGED table out of the content of a DataFrame with the storage path overridden.
Apparently, when one tries to create a Hive table via DataFrameWriter.saveAsTable, supplying a "path" option causes Spark to automatically create an external table. This demonstrates the behaviour: scala> val numbersDF = sc.parallelize((1 to 100).toList).toDF("numbers") numbersDF: org.apache.spark.sql.DataFrame = [numbers: int] scala> numbersDF.write.format("orc").saveAsTable("numbers_table1") scala> spark.sql("describe formatted numbers_table1").filter(_.get(0).toString == "Type").show +--------+---------+-------+ |col_name|data_type|comment| +--------+---------+-------+ | Type| MANAGED| | +--------+---------+-------+ scala> numbersDF.write.format("orc").option("path", "/user/foobar/numbers_table_data").saveAsTable("numbers_table2") scala> spark.sql("describe formatted numbers_table2").filter(_.get(0).toString == "Type").show +--------+---------+-------+ |col_name|data_type|comment| +--------+---------+-------+ | Type| EXTERNAL| | +--------+---------+-------+ I am wondering if there is any way to force creation of a managed table with a custom path (which as far as I know, should be possible via standard Hive commands). I often seem to have the problem that I cannot find the appropriate documentation for the option configuration of Spark APIs. Could someone please point me to the right direction and tell me where these things are documented? Thanks, Peter