[ https://issues.apache.org/jira/browse/HIVE-26947?focusedWorklogId=840580&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-840580 ]
ASF GitHub Bot logged work on HIVE-26947: ----------------------------------------- Author: ASF GitHub Bot Created on: 20/Jan/23 10:26 Start Date: 20/Jan/23 10:26 Worklog Time Spent: 10m Work Description: akshat0395 commented on code in PR #3955: URL: https://github.com/apache/hive/pull/3955#discussion_r1082334533 ########## ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java: ########## @@ -118,19 +119,23 @@ public void run() { singleRun.cancel(true); executor.shutdownNow(); executor = getTimeoutHandlingExecutor(); + err = true; } catch (ExecutionException e) { LOG.info("Exception during executing compaction", e); + err = true; } catch (InterruptedException ie) { // do not ignore interruption requests return; + } catch (Throwable t) { + err = true; } doPostLoopActions(System.currentTimeMillis() - startedAt); // If we didn't try to launch a job it either means there was no work to do or we got - // here as the result of a communication failure with the DB. Either way we want to wait + // here as the result of an error like communication failure with the DB, schema failures etc. Either way we want to wait // a bit before, otherwise we can start over the loop immediately. - if (!launchedJob && !stop.get()) { + if ((!launchedJob || err) && !stop.get()) { Thread.sleep(SLEEP_TIME); Review Comment: Added backoff, As @veghlaci05 mentioned reserved the err flag. Issue Time Tracking ------------------- Worklog Id: (was: 840580) Time Spent: 3h (was: 2h 50m) > Hive compactor.Worker can respawn connections to HMS at extremely high > frequency > -------------------------------------------------------------------------------- > > Key: HIVE-26947 > URL: https://issues.apache.org/jira/browse/HIVE-26947 > Project: Hive > Issue Type: Bug > Reporter: Akshat Mathur > Assignee: Akshat Mathur > Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > After catching the exception generated by the findNextCompactionAndExecute() > task, HS2 appears to immediately rerun the task with no delay or backoff. As > a result there are ~3500 connection attempts from HS2 to HMS over just a 5 > second period in the HS2 log > The compactor.Worker should wait between failed attempts and maybe do an > exponential backoff. -- This message was sent by Atlassian Jira (v8.20.10#820010)