[ https://issues.apache.org/jira/browse/HIVE-24577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
guojh updated HIVE-24577: ------------------------- Description: When hive execute jobs in parallel(control by “hive.exec.parallel” parameter), tasks submit to yarn with parallel. If the jobs completed simultaneously, then Their children task may submit more than ones. In our production cluster, we have a query with the stage dependencies is below: {code:java} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1, Stage-10, Stage-14 Stage-7 depends on stages: Stage-2 , consists of Stage-4, Stage-3, Stage-5 Stage-4 Stage-0 depends on stages: Stage-4, Stage-3, Stage-6 Stage-3 Stage-5 Stage-6 depends on stages: Stage-5 Stage-18 is a root stage Stage-9 depends on stages: Stage-18 Stage-10 depends on stages: Stage-9 Stage-19 is a root stage Stage-13 depends on stages: Stage-19 Stage-14 depends on stages: Stage-13 {code} There is a certain probability that Stage-10 and Stage-14 will complete at the same time, then their children Stage-2 was submitted twice. As bellow log: {code:java} 2021-01-03T13:35:32,079 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] ql.Driver: Launching Job 6 out of 6 2021-01-03T13:35:32,080 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] ql.Driver: Starting task [Stage-2:MAPRED] in parallel 2021-01-03T13:35:32,082 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] ql.Driver: Launching Job 7 out of 6 2021-01-03T13:35:32,083 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] ql.Driver: Starting task [Stage-2:MAPRED] in parallel {code} was:When > Task resubmission bug > --------------------- > > Key: HIVE-24577 > URL: https://issues.apache.org/jira/browse/HIVE-24577 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 2.3.4 > Environment: hive-2.3.4 > Reporter: guojh > Priority: Major > > When hive execute jobs in parallel(control by “hive.exec.parallel” > parameter), tasks submit to yarn with parallel. If the jobs completed > simultaneously, then Their children task may submit more than ones. > In our production cluster, we have a query with the stage dependencies is > below: > {code:java} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-2 depends on stages: Stage-1, Stage-10, Stage-14 > Stage-7 depends on stages: Stage-2 , consists of Stage-4, Stage-3, Stage-5 > Stage-4 > Stage-0 depends on stages: Stage-4, Stage-3, Stage-6 > Stage-3 > Stage-5 > Stage-6 depends on stages: Stage-5 > Stage-18 is a root stage > Stage-9 depends on stages: Stage-18 > Stage-10 depends on stages: Stage-9 > Stage-19 is a root stage > Stage-13 depends on stages: Stage-19 > Stage-14 depends on stages: Stage-13 > {code} > There is a certain probability that Stage-10 and Stage-14 will complete at > the same time, then their children Stage-2 was submitted twice. As bellow log: > {code:java} > 2021-01-03T13:35:32,079 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Launching Job 6 out of 6 > 2021-01-03T13:35:32,080 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Starting task [Stage-2:MAPRED] in parallel > 2021-01-03T13:35:32,082 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Launching Job 7 out of 6 > 2021-01-03T13:35:32,083 INFO [d207a1c7-287d-4f03-83c8-f2c42ed878a9 main] > ql.Driver: Starting task [Stage-2:MAPRED] in parallel > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)