Hive users,

I am having problems performing "complex" queries on Avro+Snappy data.  If I do 
a "SELECT * FROM Blah LIMIT 50", I see the data coming back as it should be.  
But if I perform any kind of more complex query such as "SELECT count(*) FROM 
Blah" I am receive several rows of NULL values.  My workflow of how I created 
the table is described below along with some of the setup.

- I am running CDH4.2 with Avro 1.7.3

hive> select * From mthomas_testavro limit 1;
OK
Field1 Field2
03-19-2013 a
03-19-2013 b
03-19-2013 c
03-19-2013 c
Time taken: 0.103 seconds

hive> select count(*) From mthomas_testavro;
…
Total MapReduce CPU Time Spent: 6 seconds 420 msec
OK
NULL
NULL
NULL
NULL
Time taken: 17.634 seconds
…


CREATE EXTERNAL TABLE mthomas_testavro
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/tmp/testavro/'
TBLPROPERTIES (
'avro.schema.literal'='{
"namespace": "hello.world",
"name": "some_schema",
"type": "record",
"fields": [
{ "name":"field1","type":"string"},
{ "name":"field2","type":"string"}
]
}')
;

SET avro.output.codec=snappy;
SET mapred.output.compression.type=BLOCK;
SET hive.exec.compress.output=true;
SET 
mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;

INSERT OVERWRITE TABLE mthomas_testavro SELECT * FROM 
identical_table_inGzip_format;

If I cat the output file in the external table, I see 
"Objavro.codec^Lsnappyavro.schema?{"type"…" at the beginning followed by the 
rest of the schema and binary data.  So I am assuming the snappy compression 
worked.  Furthermore, I also tried to query this table via Impala and both 
queries worked just fine.

Maybe it is related to https://issues.apache.org/jira/browse/HIVE-3308  ???

Any ideas?

Thanks.

Matt
“This message (including any attachments) is intended only for the use of the 
individual or entity to which it is addressed, and may contain information that 
is non-public, proprietary, privileged, confidential and exempt from disclosure 
under applicable law or may be constituted as attorney work product. If you are 
not the intended recipient, you are hereby notified that any use, 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this message in error, notify sender 
immediately and delete this message immediately.”

Reply via email to