[ https://issues.apache.org/jira/browse/HIVE-20330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16693256#comment-16693256 ]
Hive QA commented on HIVE-20330: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12948838/HIVE-20330.0.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15551 tests executed *Failed tests:* {noformat} TestMiniDruidCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=197) [druidmini_masking.q,druidmini_joins.q,druid_timestamptz.q] org.apache.hive.jdbc.TestActivePassiveHA.testActivePassiveHA (batchId=259) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15009/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15009/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15009/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12948838 - PreCommit-HIVE-Build > HCatLoader cannot handle multiple InputJobInfo objects for a job with > multiple inputs > ------------------------------------------------------------------------------------- > > Key: HIVE-20330 > URL: https://issues.apache.org/jira/browse/HIVE-20330 > Project: Hive > Issue Type: Bug > Components: HCatalog > Reporter: Adam Szita > Assignee: Adam Szita > Priority: Major > Attachments: HIVE-20330.0.patch > > > While running performance tests on Pig (0.12 and 0.17) we've observed a huge > performance drop in a workload that has multiple inputs from HCatLoader. > The reason is that for a particular MR job with multiple Hive tables as > input, Pig calls {{setLocation}} on each {{LoaderFunc (HCatLoader)}} instance > but only one table's information (InputJobInfo instance) gets tracked in the > JobConf. (This is under config key {{HCatConstants.HCAT_KEY_JOB_INFO}}). > Any such call overwrites preexisting values, and thus only the last table's > information will be considered when Pig calls {{getStatistics}} to calculate > and estimate required reducer count. > In cases when there are 2 input tables, 256GB and 1MB in size respectively, > Pig will query the size information from HCat for both of them, but it will > either see 1MB+1MB=2MB or 256GB+256GB=0.5TB depending on input order in the > execution plan's DAG. > It should of course see 256.00097GB in total and use 257 reducers by default > accordingly. > In unlucky cases this will be seen as 2MB and 1 reducer will have to struggle > with the actual 256.00097GB... -- This message was sent by Atlassian JIRA (v7.6.3#76005)