[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292502#comment-14292502 ]
Hive QA commented on HIVE-9333: ------------------------------- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12694605/HIVE-9333.1.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 7373 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_decimal1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_types org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_parquet org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_parquet org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2522/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2522/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2522/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12694605 - PreCommit-HIVE-TRUNK-Build > Move parquet serialize implementation to DataWritableWriter to improve write > speeds > ----------------------------------------------------------------------------------- > > Key: HIVE-9333 > URL: https://issues.apache.org/jira/browse/HIVE-9333 > Project: Hive > Issue Type: Sub-task > Reporter: Sergio Peña > Assignee: Sergio Peña > Attachments: HIVE-9333.1.patch > > > The serialize process on ParquetHiveSerDe parses a Hive object > to a Writable object by looping through all the Hive object children, > and creating new Writables objects per child. These final writables > objects are passed in to the Parquet writing function, and parsed again > on the DataWritableWriter class by looping through the ArrayWritable > object. These two loops (ParquetHiveSerDe.serialize() and > DataWritableWriter.write() may be reduced to use just one loop into the > DataWritableWriter.write() method in order to increment the writing process > speed for Hive parquet. > In order to achieve this, we can wrap the Hive object and object inspector > on ParquetHiveSerDe.serialize() method into an object that implements the > Writable object and thus avoid the loop that serialize() does, and leave the > loop parser to the DataWritableWriter.write() method. We can see how ORC does > this with the OrcSerde.OrcSerdeRow class. > Writable objects are organized differently on any kind of storage formats, so > I don't think it is necessary to create and keep the writable objects in the > serialize() method as they won't be used until the writing process starts > (DataWritableWriter.write()). > We might save 200% of extra time by doing such change. > This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)