ah... found out.  my issue is that hive 0.13 doesn't handle this correctly.
could be a bug.

used 0.14, it works.

btw the UNION[int, null] translates to parquet as a field "optional int32
myfieldName", I found this by calling ParquetFileReader.readFooter()


On Thu, Feb 19, 2015 at 11:32 AM, Yang <teddyyyy...@gmail.com> wrote:

> I created a parquet file, expose that to hive using an external table, but
> select from such tables are always giving NULL.
>
>
> to show the symptom, I created the following data set , each record has
> only 2 fields  __PRIMARY_KEY__ and nullableInt.  the schema represented in
> avro is the following (I converted the data into parquet through the
> avro-parquet convertor)
>
> {"type":"record","name":"mytest","namespace":"yy.com
> ","doc":"","fields":[{"name":"__PRIMARY_KEY__","type":"string","doc":""},{"name":"nullableInt","type":["int","null"],"doc":""}],"version":"1424373511441"}
>
>
>
> the following is the parquet hive table def.  I also attached the sample
> parquet file.
>
> Thanks!
> yang
>
>
> drop table mytest;
> CREATE EXTERNAL TABLE IF NOT EXISTS mytest
> (
> PRIMARY_KEY String,
> nullableInt     int
> )
>   STORED AS  PARQUET
> LOCATION '/user/myusername/camus/topics/mytest/hourly/2015/02/19/11/'
> ;
>
> select * from mytest limit 10;
>
>
>
>

Reply via email to