[ 
https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163631#comment-16163631
 ] 

Hive QA commented on HIVE-17394:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12886650/HIVE-17394.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11036 tests 
executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=61)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[drop_table_failure2]
 (batchId=89)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6788/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6788/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6788/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12886650 - PreCommit-HIVE-Build

> AvroSerde is regenerating TypeInfo objects for each nullable Avro field for 
> every row
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-17394
>                 URL: https://issues.apache.org/jira/browse/HIVE-17394
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 1.1.0, 3.0.0
>            Reporter: Ratandeep Ratti
>            Assignee: Anthony Hsu
>         Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png, 
> HIVE-17394.1.patch
>
>
> The following methods in {{AvroDeserializer}} keeps regenerating {{TypeInfo}} 
> objects for every nullable  field in a row.
> This is happening in the following methods.
> {code}
> private Object deserializeNullableUnion(Object datum, Schema fileSchema, 
> Schema recordSchema) throws AvroSerdeException {
> // elided
> line 312:  return worker(datum, fileSchema, newRecordSchema,
>             SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null));
> }
> ..
> private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema 
> recordSchema)
> // elided
> line 357: return worker(datum, currentFileSchema, schema,
>       SchemaToTypeInfo.generateTypeInfo(schema, null));
> {code}
> This is really bad in terms of performance. I'm not sure why didn't we use 
> the TypeInfo we already have instead of generating again for each nullable 
> field.  If you look at the {{worker}} method which calls the method 
> {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field 
> column is already determined. 
> Moreover the cache in {{SchemaToTypeInfo}} class does not help in nullable 
> Avro records case as checking if an Avro record schema object already exists 
> in the cache requires traversing all the fields in the record schema.
> I've attached profiling snapshot which shows maximum time is being spent in 
> the cache.
> One way of fixing this IMO might be to make use of the column TypeInfo which 
> is already passed in the worker method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to