Here's the solution I got after talking with Liancheng:
1) using backquote `..` to wrap up all illegal characters
val rdd = parquetFile(file)
val schema = rdd.schema.fields.map(f => s"`${f.name}`
${HiveMetastoreTypes.toMetastoreType(f.dataType)}").mkString(",\n")
val ddl_13 = s"""
|CREATE EXTERNAL TABLE $name (
| $schema
|)
|STORED AS PARQUET
|LOCATION '$file'
""".stripMargin
sql(ddl_13)
2) create a new Schema and do applySchema to generate a new SchemaRDD, had
to drop and register table
val t = table(name)
val newSchema = StructType(t.schema.fields.map(s => s.copy(name =
s.name.replaceAll(".*?::", ""))))
sql(s"drop table $name")
applySchema(t, newSchema).registerTempTable(name)
I'm testing it for now.
Thanks for the help!
Jianshi
On Sat, Dec 6, 2014 at 8:41 AM, Jianshi Huang <[email protected]>
wrote:
> Hi,
>
> I had to use Pig for some preprocessing and to generate Parquet files for
> Spark to consume.
>
> However, due to Pig's limitation, the generated schema contains Pig's
> identifier
>
> e.g.
> sorted::id, sorted::cre_ts, ...
>
> I tried to put the schema inside CREATE EXTERNAL TABLE, e.g.
>
> create external table pmt (
> sorted::id bigint
> )
> stored as parquet
> location '...'
>
> Obviously it didn't work, I also tried removing the identifier sorted::,
> but the resulting rows contain only nulls.
>
> Any idea how to create a table in HiveContext from these Parquet files?
>
> Thanks,
> Jianshi
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/