ah... found out. my issue is that hive 0.13 doesn't handle this correctly. could be a bug.
used 0.14, it works. btw the UNION[int, null] translates to parquet as a field "optional int32 myfieldName", I found this by calling ParquetFileReader.readFooter() On Thu, Feb 19, 2015 at 11:32 AM, Yang <teddyyyy...@gmail.com> wrote: > I created a parquet file, expose that to hive using an external table, but > select from such tables are always giving NULL. > > > to show the symptom, I created the following data set , each record has > only 2 fields __PRIMARY_KEY__ and nullableInt. the schema represented in > avro is the following (I converted the data into parquet through the > avro-parquet convertor) > > {"type":"record","name":"mytest","namespace":"yy.com > ","doc":"","fields":[{"name":"__PRIMARY_KEY__","type":"string","doc":""},{"name":"nullableInt","type":["int","null"],"doc":""}],"version":"1424373511441"} > > > > the following is the parquet hive table def. I also attached the sample > parquet file. > > Thanks! > yang > > > drop table mytest; > CREATE EXTERNAL TABLE IF NOT EXISTS mytest > ( > PRIMARY_KEY String, > nullableInt int > ) > STORED AS PARQUET > LOCATION '/user/myusername/camus/topics/mytest/hourly/2015/02/19/11/' > ; > > select * from mytest limit 10; > > > >