Ratandeep Ratti created HIVE-18410: -------------------------------------- Summary: [Performance][Avro] Reading flat Avro tables is very expensive in Hive Key: HIVE-18410 URL: https://issues.apache.org/jira/browse/HIVE-18410 Project: Hive Issue Type: Improvement Reporter: Ratandeep Ratti Assignee: Ratandeep Ratti
There's a performance penalty when reading flat [no nested fields] Avro tables. When reading the same flat dataset in Pig, it takes half the time. On profiling, a lot of time is spent in {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the time is spent in GenericData.get().resolveUnion(), which calls GenericData.getSchemaName(Object datum), which does a lot of instanceof checks. This could be simplified with performance benefits. A approach is described in this patch which almost halves the runtime. -- This message was sent by Atlassian JIRA (v6.4.14#64029)