[ https://issues.apache.org/jira/browse/HIVE-13458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246381#comment-15246381 ]
Eugene Koifman commented on HIVE-13458: --------------------------------------- [~wzheng] 1. DbTxnManager.stopHeartbeater(): - it would be useful for LOG.warn("Heartbeat task cannot be cancelled for unknown reason"); to include query id (QueryPlan.getQueryId()) - e.printStackTrace(); - where will this print to? - Below will add to the logs but I'm not sure where the value add is. {noformat} if (heartbeatTask.isCancelled() || heartbeatTask.isDone()) { LOG.info("Stopped " + Heartbeater.class.getName()); } {noformat} Perhaps lower to DEBUG level or alternatively include QueryID and the "now - startTime". I'd vote for DEBUG. 2. TezJobMonitor - why wrap the LockException message in IOException? - could {noformat} if (ctx != null && ctx.getHeartbeater() != null && 243 ctx.getHeartbeater().getLockException() != null) { 244 throw new IOException("Need to abort execution due to LockException: " + 245 ctx.getHeartbeater().getLockException().getMessage()); 246 } {noformat} be wrapped in a single method "Context.checkAborted() throws Exception" for example? Same for HadoopJobExecHelper. > Heartbeater doesn't fail query when heartbeat fails > --------------------------------------------------- > > Key: HIVE-13458 > URL: https://issues.apache.org/jira/browse/HIVE-13458 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 2.1.0 > Reporter: Wei Zheng > Assignee: Wei Zheng > Attachments: HIVE-13458.1.patch, HIVE-13458.2.patch, > HIVE-13458.3.patch > > > When a heartbeat fails to locate a lock, it should fail the current query. > That doesn't happen, which is a bug. > Another thing is, we need to make sure stopHeartbeat really stops the > heartbeat, i.e. no additional heartbeat will be sent, since that will break > the assumption and cause the query to fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)