Thanks Peter. I'm not sure if that is possible yet. The closest I can think of to achieving what you want is to try something like:- df.registerTempTable("mytable") sql("create table mymanagedtable as select * from mytable")
I haven't used CTAS in Spark SQL before but have heard it works. This would infer the schema for you and from what I have heard CTAS creates managed tables. Let me know if this works for you. Kind Regards Chris On Thu, 14 Feb 2019 at 03:08 Horváth Péter Gergely < horvath.peter.gerg...@gmail.com> wrote: > Hi Chris, > > Thank you for the input, I know I can always write the table DDL manually. > > But here I would like to rely on Spark generating the schema. What I don't > understand is the change in the behaviour of Spark: having the storage path > specified does not necessarily mean it should be an external table. > > Is there any way to control/override this? > > Thanks, > Peter > > > On Wed, Feb 13, 2019, 13:09 Chris Teoh <chris.t...@gmail.com wrote: > >> Hey there, >> >> Could you not just create a managed table using the DDL in Spark SQL and >> then written the data frame to the underlying folder or use Spark SQL to do >> an insert? >> >> Alternatively try create table as select. Iirc hive creates managed >> tables this way. >> >> I've not confirmed this works but I think that might be worth trying. >> >> I hope that helps. >> >> Kind regards >> Chris >> >> On Wed., 13 Feb. 2019, 10:44 pm Horváth Péter Gergely, < >> horvath.peter.gerg...@gmail.com> wrote: >> >>> Dear All, >>> >>> I am facing a strange issue with Spark 2.3, where I would like to create >>> a MANAGED table out of the content of a DataFrame with the storage path >>> overridden. >>> >>> Apparently, when one tries to create a Hive table via >>> DataFrameWriter.saveAsTable, supplying a "path" option causes Spark to >>> automatically create an external table. >>> >>> This demonstrates the behaviour: >>> >>> scala> val numbersDF = sc.parallelize((1 to 100).toList).toDF("numbers") >>> numbersDF: org.apache.spark.sql.DataFrame = [numbers: int] >>> >>> scala> numbersDF.write.format("orc").saveAsTable("numbers_table1") >>> >>> scala> spark.sql("describe formatted >>> numbers_table1").filter(_.get(0).toString == "Type").show >>> +--------+---------+-------+ >>> |col_name|data_type|comment| >>> +--------+---------+-------+ >>> | Type| MANAGED| | >>> +--------+---------+-------+ >>> >>> >>> scala> numbersDF.write.format("orc").option("path", >>> "/user/foobar/numbers_table_data").saveAsTable("numbers_table2") >>> >>> scala> spark.sql("describe formatted >>> numbers_table2").filter(_.get(0).toString == "Type").show >>> +--------+---------+-------+ >>> |col_name|data_type|comment| >>> +--------+---------+-------+ >>> | Type| EXTERNAL| | >>> +--------+---------+-------+ >>> >>> >>> >>> I am wondering if there is any way to force creation of a managed table >>> with a custom path (which as far as I know, should be possible via standard >>> Hive commands). >>> >>> I often seem to have the problem that I cannot find the appropriate >>> documentation for the option configuration of Spark APIs. Could someone >>> please point me to the right direction and tell me where these things are >>> documented? >>> >>> Thanks, >>> Peter >>> >>>