[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292936#comment-14292936 ]
Hive QA commented on HIVE-9333: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694678/HIVE-9333.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7394 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2527/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2527/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2527/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694678 - PreCommit-HIVE-TRUNK-Build > Move parquet serialize implementation to DataWritableWriter to improve write > speeds > ----------------------------------------------------------------------------------- > > Key: HIVE-9333 > URL: https://issues.apache.org/jira/browse/HIVE-9333 > Project: Hive > Issue Type: Sub-task > Reporter: Sergio Peña > Assignee: Sergio Peña > Attachments: HIVE-9333.2.patch > > > The serialize process on ParquetHiveSerDe parses a Hive object > to a Writable object by looping through all the Hive object children, > and creating new Writables objects per child. These final writables > objects are passed in to the Parquet writing function, and parsed again > on the DataWritableWriter class by looping through the ArrayWritable > object. These two loops (ParquetHiveSerDe.serialize() and > DataWritableWriter.write() may be reduced to use just one loop into the > DataWritableWriter.write() method in order to increment the writing process > speed for Hive parquet. > In order to achieve this, we can wrap the Hive object and object inspector > on ParquetHiveSerDe.serialize() method into an object that implements the > Writable object and thus avoid the loop that serialize() does, and leave the > loop parser to the DataWritableWriter.write() method. We can see how ORC does > this with the OrcSerde.OrcSerdeRow class. > Writable objects are organized differently on any kind of storage formats, so > I don't think it is necessary to create and keep the writable objects in the > serialize() method as they won't be used until the writing process starts > (DataWritableWriter.write()). > This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)