Hi,
I had to use Pig for some preprocessing and to generate Parquet files for
Spark to consume.
However, due to Pig's limitation, the generated schema contains Pig's
identifier
e.g.
sorted::id, sorted::cre_ts, ...
I tried to put the schema inside CREATE EXTERNAL TABLE, e.g.
create external table pmt (
sorted::id bigint
)
stored as parquet
location '...'
Obviously it didn't work, I also tried removing the identifier sorted::,
but the resulting rows contain only nulls.
Any idea how to create a table in HiveContext from these Parquet files?
Thanks,
Jianshi
--
Jianshi Huang
LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/