[ https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163631#comment-16163631 ]
Hive QA commented on HIVE-17394: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12886650/HIVE-17394.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11036 tests executed *Failed tests:* {noformat} TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=230) TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=61) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2] (batchId=89) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6788/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6788/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6788/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12886650 - PreCommit-HIVE-Build > AvroSerde is regenerating TypeInfo objects for each nullable Avro field for > every row > ------------------------------------------------------------------------------------- > > Key: HIVE-17394 > URL: https://issues.apache.org/jira/browse/HIVE-17394 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers > Affects Versions: 1.1.0, 3.0.0 > Reporter: Ratandeep Ratti > Assignee: Anthony Hsu > Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, > HIVE-17394.1.patch > > > The following methods in {{AvroDeserializer}} keeps regenerating {{TypeInfo}} > objects for every nullable field in a row. > This is happening in the following methods. > {code} > private Object deserializeNullableUnion(Object datum, Schema fileSchema, > Schema recordSchema) throws AvroSerdeException { > // elided > line 312: return worker(datum, fileSchema, newRecordSchema, > SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null)); > } > .. > private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema > recordSchema) > // elided > line 357: return worker(datum, currentFileSchema, schema, > SchemaToTypeInfo.generateTypeInfo(schema, null)); > {code} > This is really bad in terms of performance. I'm not sure why didn't we use > the TypeInfo we already have instead of generating again for each nullable > field. If you look at the {{worker}} method which calls the method > {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field > column is already determined. > Moreover the cache in {{SchemaToTypeInfo}} class does not help in nullable > Avro records case as checking if an Avro record schema object already exists > in the cache requires traversing all the fields in the record schema. > I've attached profiling snapshot which shows maximum time is being spent in > the cache. > One way of fixing this IMO might be to make use of the column TypeInfo which > is already passed in the worker method. -- This message was sent by Atlassian JIRA (v6.4.14#64029)