[ https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=795674&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795674 ]
ASF GitHub Bot logged work on HIVE-26414: ----------------------------------------- Author: ASF GitHub Bot Created on: 27/Jul/22 14:23 Start Date: 27/Jul/22 14:23 Worklog Time Spent: 10m Work Description: SourabhBadhya commented on code in PR #3457: URL: https://github.com/apache/hive/pull/3457#discussion_r931126498 ########## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ########## @@ -485,6 +493,27 @@ private void clearLocksAndHB() { stopHeartbeat(); } + private void cleanupOutputDir(Context ctx) throws MetaException { + if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) { + Table destinationTable = ctx.getDestinationTable(); + if (destinationTable != null) { + try { + CompactionRequest rqst = new CompactionRequest( + destinationTable.getDbName(), destinationTable.getTableName(), CompactionType.MAJOR); + rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(), + destinationTable.getTTable(), conf)); + + rqst.putToProperties(META_TABLE_LOCATION, destinationTable.getSd().getLocation()); + rqst.putToProperties(IF_PURGE, Boolean.toString(true)); + TxnStore txnHandler = TxnUtils.getTxnStore(conf); Review Comment: > btw, would it be hard to create a completionHook similar to Iceberg one? We could create one but it would include failures only within Query execution. Anything done after query execution (post execution activities) will not be within its scope, which is why I disregarded the Hook approach. The hooks are used as part of finally block here - https://github.com/apache/hive/blob/b197ed86029f07696e326acb5878f86c286e9e1a/ql/src/java/org/apache/hadoop/hive/ql/Executor.java#L118 Cleanup will then be dependent on a HiveConf - `hive.query.lifetime.hooks`. Issue Time Tracking ------------------- Worklog Id: (was: 795674) Time Spent: 6.5h (was: 6h 20m) > Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data > --------------------------------------------------------------------------- > > Key: HIVE-26414 > URL: https://issues.apache.org/jira/browse/HIVE-26414 > Project: Hive > Issue Type: Improvement > Reporter: Sourabh Badhya > Assignee: Sourabh Badhya > Priority: Major > Labels: pull-request-available > Time Spent: 6.5h > Remaining Estimate: 0h > > When a CTAS query fails before creation of table and after writing the data, > the data is present in the directory and not cleaned up currently by the > cleaner or any other mechanism currently. This is because the cleaner > requires a table corresponding to what its cleaning. In order surpass such a > situation, we can directly pass the relevant information to the cleaner so > that such uncommitted data is deleted. -- This message was sent by Atlassian Jira (v8.20.10#820010)