In 1.2.1 of I was persisting a set of parquet files as a table for use by
spark-sql cli later on. There was a post here
<http://apache-spark-user-list.1001560.n3.nabble.com/persist-table-schema-in-spark-sql-tt16297.html#a16311>
by
Mchael Armbrust that provide a nice little helper method for dealing with
this:
/**
* Sugar for creating a Hive external table from a parquet path.
*/def createParquetTable(name: String, file: String): Unit = {
import org.apache.spark.sql.hive.HiveMetastoreTypes
val rdd = parquetFile(file)
val schema = rdd.schema.fields.map(f => s"${f.name}
${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n")
val ddl = s"""
|CREATE EXTERNAL TABLE $name (
| $schema
|)
|ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
|STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
|OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
|LOCATION '$file'""".stripMargin
sql(ddl)
setConf("spark.sql.hive.convertMetastoreParquet", "true")
}
In migrating to 1.3.x I see that the spark.sql.hive.convertMetastoreParquet
is no longer public, so the above no longer works.
I can define a helper method that wraps the HiveMetastoreTypes something
like:
package org.apache.spark.sql.hive
import org.apache.spark.sql.types.DataType
/**
* Helper to expose HiveMetastoreTypes hidden by Spark. It is created
in this name space to make it accessible.
*/
object HiveTypeHelper {
def toDataType(metastoreType: String): DataType =
HiveMetastoreTypes.toDataType(metastoreType)
def toMetastoreType(dataType: DataType): String =
HiveMetastoreTypes.toMetastoreType(dataType)
}
While this will work, is there a better way to achieve this under 1.3.x?
TIA for the assistance.
-Todd