Hive users, I am having problems performing "complex" queries on Avro+Snappy data. If I do a "SELECT * FROM Blah LIMIT 50", I see the data coming back as it should be. But if I perform any kind of more complex query such as "SELECT count(*) FROM Blah" I am receive several rows of NULL values. My workflow of how I created the table is described below along with some of the setup.
- I am running CDH4.2 with Avro 1.7.3 hive> select * From mthomas_testavro limit 1; OK Field1 Field2 03-19-2013 a 03-19-2013 b 03-19-2013 c 03-19-2013 c Time taken: 0.103 seconds hive> select count(*) From mthomas_testavro; … Total MapReduce CPU Time Spent: 6 seconds 420 msec OK NULL NULL NULL NULL Time taken: 17.634 seconds … CREATE EXTERNAL TABLE mthomas_testavro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/tmp/testavro/' TBLPROPERTIES ( 'avro.schema.literal'='{ "namespace": "hello.world", "name": "some_schema", "type": "record", "fields": [ { "name":"field1","type":"string"}, { "name":"field2","type":"string"} ] }') ; SET avro.output.codec=snappy; SET mapred.output.compression.type=BLOCK; SET hive.exec.compress.output=true; SET mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec; INSERT OVERWRITE TABLE mthomas_testavro SELECT * FROM identical_table_inGzip_format; If I cat the output file in the external table, I see "Objavro.codec^Lsnappyavro.schema?{"type"…" at the beginning followed by the rest of the schema and binary data. So I am assuming the snappy compression worked. Furthermore, I also tried to query this table via Impala and both queries worked just fine. Maybe it is related to https://issues.apache.org/jira/browse/HIVE-3308 ??? Any ideas? Thanks. Matt “This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed, and may contain information that is non-public, proprietary, privileged, confidential and exempt from disclosure under applicable law or may be constituted as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this message in error, notify sender immediately and delete this message immediately.”