[ https://issues.apache.org/jira/browse/HIVE-20220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552464#comment-16552464 ]
Hive QA commented on HIVE-20220: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12932648/HIVE-20220.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 14681 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_7] (batchId=67) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby10] (batchId=65) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby11] (batchId=76) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby1] (batchId=19) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby1_map_skew] (batchId=66) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby3] (batchId=6) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby4] (batchId=64) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby5] (batchId=42) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby6] (batchId=57) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby6_map_skew] (batchId=43) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby7_map_skew] (batchId=46) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby8] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_cube1] (batchId=4) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_rollup1] (batchId=34) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_skew_1_23] (batchId=9) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_distinct] (batchId=59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nullgroup2] (batchId=52) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[nullgroup] (batchId=75) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union33] (batchId=26) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby4] (batchId=16) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_groupby6] (batchId=92) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] (batchId=165) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[groupby1] (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[groupby2] (batchId=168) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[groupby3] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[groupby_resolution] (batchId=166) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby4] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby6] (batchId=179) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_cube1] (batchId=168) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_id2] (batchId=173) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_rollup1] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] (batchId=187) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby1] (batchId=116) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby1_map_skew] (batchId=137) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby4] (batchId=136) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby5] (batchId=126) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby6] (batchId=133) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby6_map_skew] (batchId=127) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby7_map_skew] (batchId=128) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_cube1] (batchId=109) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_resolution] (batchId=127) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_rollup1] (batchId=123) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_sort_skew_1_23] (batchId=112) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_distinct] (batchId=134) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[nullgroup2] (batchId=131) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[nullgroup] (batchId=141) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[union33] (batchId=120) org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 (batchId=241) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/12788/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/12788/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-12788/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 48 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12932648 - PreCommit-HIVE-Build > Incorrect result when hive.groupby.skewindata is enabled > -------------------------------------------------------- > > Key: HIVE-20220 > URL: https://issues.apache.org/jira/browse/HIVE-20220 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 3.0.0 > Reporter: Ganesha Shreedhara > Assignee: Ganesha Shreedhara > Priority: Major > Attachments: HIVE-20220.patch > > > hive.groupby.skewindata makes use of rand UDF to randomly distribute grouped > by keys to the reducers and hence avoids overloading a single reducer when > there is a skew in data. > This random distribution of keys is buggy when the reducer fails to fetch the > mapper output due to a faulty datanode or any other reason. When reducer > finds that it can't fetch mapper output, it sends a signal to Application > Master to reattempt the corresponding map task. The reattempted map task will > now get the different random value from rand function and hence the keys that > gets distributed now to the reducer will not be same as the previous run. > > *Steps to reproduce:* > create table test(id int); > insert into test values > (1),(2),(2),(3),(3),(3),(4),(4),(4),(4),(5),(5),(5),(5),(5),(6),(6),(6),(6),(6),(6),(7),(7),(7),(7),(7),(7),(7),(7),(8),(8),(8),(8),(8),(8),(8),(8),(9),(9),(9),(9),(9),(9),(9),(9),(9); > SET hive.groupby.skewindata=true; > SET mapreduce.reduce.reduces=2; > //Add a debug port for reducer > select count(1) from test group by id; > //Remove mapper's intermediate output file when map stage is completed and > one out of 2 reduce tasks is completed and then continue the run. This causes > 2nd reducer to send event to Application Master to rerun the map task. > The following is the expected result. > 1 > 2 > 3 > 4 > 5 > 6 > 8 > 8 > 9 > > But you may get different result due to a different value returned by the > rand function in the second run causing different distribution of keys. > This needs to be fixed such that the mapper distributes the same keys even if > it is reattempted multiple times. -- This message was sent by Atlassian JIRA (v7.6.3#76005)