Tom Snee created HIVE-9312: ------------------------------ Summary: Literal string "\n" confuses Avro SerDe Key: HIVE-9312 URL: https://issues.apache.org/jira/browse/HIVE-9312 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.13.0 Environment: Hortonworks Data Platform 2.1.2.1 on Centos 6.5 Reporter: Tom Snee
Avro files with string fields that contain a backslash followed by 'n' confuse the Avro SerDe. Steps to recreate: 1. Put attached schema nested.avsc into HDFS under /user/someone. 2. Convert attached JSON file example.json into Avro with avro-tools, like so: "java -jar avro-tools-1.7.7.jar fromjson --schema-file nested.avsc example.json > example.avro" 3. Put example.avro into HDFS under /user/someone/avro-files. 4. Create a Hive table with this statement: CREATE EXTERNAL TABLE avro_table ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/user/someone/avro-files/' TBLPROPERTIES ( 'avro.schema.url'='hdfs:///user/someone/nested.avsc' ); 5. Observe that "select * from avro_table;" returns one row, as expected. 6. Observe that "select * from avro_table where mastersubjectnumber='A12B3CDE-FGH4-5I67-89J0-KLMN1OPQ23R4';" returns 13 garbled rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)