[ https://issues.apache.org/jira/browse/HIVE-13429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236217#comment-15236217 ]
Lefty Leverenz commented on HIVE-13429: --------------------------------------- [~daijy], a question about the parameter description: bq. To hold a lock file in scratchdir to prevent to be removed by cleardanglingscratchdir That sounds backwards to me, based on your usage note above -- if hive.scratchdir.lock is true then cleardanglingscratchdir can remove the scratch directory, right? But the description says "to prevent to be removed" ... am I just confused or should it say "to enable to be removed"? Wait, now I get it. The lock file makes it possible for cleardanglingscratchdir to find out whether it's appropriate to remove the scratch directory. If it doesn't get an exception indicating an active process, scratchdir can be removed. (Right so far?) But what happens if someone runs cleardanglingscratchdir when hive.scratchdir.lock is false so there's no lock file in the directory? Could the scratch directory be removed despite active processes, or does the absence of a lock file also prevent removal? That's what confuses me about the parameter description. > Tool to remove dangling scratch dir > ----------------------------------- > > Key: HIVE-13429 > URL: https://issues.apache.org/jira/browse/HIVE-13429 > Project: Hive > Issue Type: Improvement > Reporter: Daniel Dai > Assignee: Daniel Dai > Labels: TODOC1.3, TODOC2.1 > Fix For: 1.3.0, 2.1.0 > > Attachments: HIVE-13429.1.patch, HIVE-13429.2.patch, > HIVE-13429.3.patch, HIVE-13429.4.patch, HIVE-13429.5.patch, > HIVE-13429.branch-1.patch > > > We have seen in some cases, user will leave the scratch dir behind, and > eventually eat out hdfs storage. This could happen when vm restarts and leave > no chance for Hive to run shutdown hook. This is applicable for both HiveCli > and HiveServer2. Here we provide an external tool to clear dead scratch dir > as needed. > We need a way to identify which scratch dir is in use. We will rely on HDFS > write lock for that. Here is how HDFS write lock works: > 1. A HDFS client open HDFS file for write and only close at the time of > shutdown > 2. Cleanup process can try to open HDFS file for write. If the client holding > this file is still running, we will get exception. Otherwise, we know the > client is dead > 3. If the HDFS client dies without closing the HDFS file, NN will reclaim the > lease after 10 min, ie, the HDFS file hold by the dead client is writable > again after 10 min > So here is how we remove dangling scratch directory in Hive: > 1. HiveCli/HiveServer2 opens a well-named lock file in scratch directory and > only close it when we about to drop scratch directory > 2. A command line tool cleardanglingscratchdir will check every scratch > directory and try open the lock file for write. If it does not get exception, > meaning the owner is dead and we can safely remove the scratch directory > 3. The 10 min window means it is possible a HiveCli/HiveServer2 is dead but > we still cannot reclaim the scratch directory for another 10 min. But this > should be tolerable -- This message was sent by Atlassian JIRA (v6.3.4#6332)