[ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123690#comment-15123690 ]
Ilya Kats commented on HIVE-6147: --------------------------------- I'm trying to create a table in Hive 0.14 that points to an HBase table with one column family ("c") and one column ("b") that contains schema-less avro serialized object: {code:sql} CREATE EXTERNAL TABLE customers ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,c:b", "c.b.serialization.type"="avro", "c.b.avro.schema.url"="hdfs:/....../Customer.avsc") TBLPROPERTIES ("hbase.table.name" = "customers", "hbase.struct.autogenerate"="true", "hive.serialization.extend.nesting.levels"="true"); {code} The DDL above creates the table successfully, but queries fail with the following error: {code} Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating c_b 16/01/29 15:36:55 [main]: ERROR CliDriver: Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating c_b java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating c_b at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating c_b at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:82) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:571) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:563) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138) ... 12 more Caused by: org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorException: An error occurred retrieving schema from bytes at org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.retrieveSchemaFromBytes(AvroLazyObjectInspector.java:331) at org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.deserializeStruct(AvroLazyObjectInspector.java:287) at org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.getStructFieldData(AvroLazyObjectInspector.java:142) at org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:109) at org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldData(DelegatedStructObjectInspector.java:88) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77) at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77) ... 17 more Caused by: java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84) at org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.retrieveSchemaFromBytes(AvroLazyObjectInspector.java:328) ... 25 more {code} It seems that there is a problem in the following code in AvroLazyObjectInspector: {code} ... private Object deserializeStruct(Object struct, String fieldName) { ... if (readerSchema == null) { ... } else { // a reader schema was provided if (schemaRetriever != null) { // a schema retriever has been provided as well. Attempt to read the write schema from the // retriever ws = schemaRetriever.retrieveWriterSchema(data); if (ws == null) { throw new IllegalStateException( "Null writer schema retrieved from schemaRetriever for field [" + fieldName + "]"); } } else { // attempt retrieving the schema from the data ws = retrieveSchemaFromBytes(data); } rs = readerSchema; try { avroWritable.readFields(data, ws, rs); } catch (IOException ioe) { throw new AvroObjectInspectorException("Error deserializing avro payload", ioe); } } ... } ... {code} because it tries to retrieve the write schema from data ({{ws = retrieveSchemaFromBytes(data)}}) even if the schema URL (reader schema) had been provided. Is there way to make it work for schema-less avro data? > Support avro data stored in HBase columns > ----------------------------------------- > > Key: HIVE-6147 > URL: https://issues.apache.org/jira/browse/HIVE-6147 > Project: Hive > Issue Type: Improvement > Components: HBase Handler > Affects Versions: 0.12.0, 0.13.0 > Reporter: Swarnim Kulkarni > Assignee: Swarnim Kulkarni > Labels: TODOC14 > Fix For: 0.14.0 > > Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, > HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, > HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt > > > Presently, the HBase Hive integration supports querying only primitive data > types in columns. It would be nice to be able to store and query Avro objects > in HBase columns by making them visible as structs to Hive. This will allow > Hive to perform ad hoc analysis of HBase data which can be deeply structured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)