In 1.2.1 of I was persisting a set of parquet files as a table for use by
spark-sql cli later on. There was a post here
<http://apache-spark-user-list.1001560.n3.nabble.com/persist-table-schema-in-spark-sql-tt16297.html#a16311>
by
Mchael Armbrust that provide a nice little helper method for dealing with
this:

/**
 * Sugar for creating a Hive external table from a parquet path.
 */def createParquetTable(name: String, file: String): Unit = {
  import org.apache.spark.sql.hive.HiveMetastoreTypes

  val rdd = parquetFile(file)
  val schema = rdd.schema.fields.map(f => s"${f.name}
${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n")
  val ddl = s"""
    |CREATE EXTERNAL TABLE $name (
    |  $schema
    |)
    |ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
    |STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
    |OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'
    |LOCATION '$file'""".stripMargin
  sql(ddl)
  setConf("spark.sql.hive.convertMetastoreParquet", "true")
}

In migrating to 1.3.x I see that the spark.sql.hive.convertMetastoreParquet
is no longer public, so the above no longer works.

I can define a helper method that wraps the HiveMetastoreTypes something
like:

package org.apache.spark.sql.hive
import org.apache.spark.sql.types.DataType

/**
 * Helper to expose HiveMetastoreTypes hidden by Spark.  It is created
in this name space to make it accessible.
 */
object HiveTypeHelper {
  def toDataType(metastoreType: String): DataType =
HiveMetastoreTypes.toDataType(metastoreType)
  def toMetastoreType(dataType: DataType): String =
HiveMetastoreTypes.toMetastoreType(dataType)
}

While this will work, is there a better way to achieve this under 1.3.x?

TIA for the assistance.

-Todd

Reply via email to