I created a parquet file, expose that to hive using an external table, but select from such tables are always giving NULL.
to show the symptom, I created the following data set , each record has only 2 fields __PRIMARY_KEY__ and nullableInt. the schema represented in avro is the following (I converted the data into parquet through the avro-parquet convertor) {"type":"record","name":"mytest","namespace":"yy.com ","doc":"","fields":[{"name":"__PRIMARY_KEY__","type":"string","doc":""},{"name":"nullableInt","type":["int","null"],"doc":""}],"version":"1424373511441"} the following is the parquet hive table def. I also attached the sample parquet file. Thanks! yang drop table mytest; CREATE EXTERNAL TABLE IF NOT EXISTS mytest ( PRIMARY_KEY String, nullableInt int ) STORED AS PARQUET LOCATION '/user/myusername/camus/topics/mytest/hourly/2015/02/19/11/' ; select * from mytest limit 10;
mytest.1.0.4.8.1424372400000.parquet
Description: Binary data