Daniel Dai created HIVE-13429:
---------------------------------

             Summary: Tool to remove dangling scratch dir
                 Key: HIVE-13429
                 URL: https://issues.apache.org/jira/browse/HIVE-13429
             Project: Hive
          Issue Type: Improvement
            Reporter: Daniel Dai
            Assignee: Daniel Dai


We have seen in some cases, user will leave the scratch dir behind, and 
eventually eat out hdfs storage. This could happen when vm restarts and leave 
no chance for Hive to run shutdown hook. This is applicable for both HiveCli 
and HiveServer2. Here we provide an external tool to clear dead scratch dir as 
needed.

We need a way to identify which scratch dir is in use. We will rely on HDFS 
write lock for that. Here is how HDFS write lock works:
1. A HDFS client open HDFS file for write and only close at the time of shutdown
2. Cleanup process can try to open HDFS file for write. If the client holding 
this file is still running, we will get exception. Otherwise, we know the 
client is dead
3. If the HDFS client dies without closing the HDFS file, NN will reclaim the 
lease after 10 min, ie, the HDFS file hold by the dead client is writable again 
after 10 min

So here is how we remove dangling scratch directory in Hive:
1. HiveCli/HiveServer2 opens a well-named lock file in scratch directory and 
only close it when we about to drop scratch directory
2. A command line tool cleardanglingscratchdir  will check every scratch 
directory and try open the lock file for write. If it does not get exception, 
meaning the owner is dead and we can safely remove the scratch directory
3. The 10 min window means it is possible a HiveCli/HiveServer2 is dead but we 
still cannot reclaim the scratch directory for another 10 min. But this should 
be tolerable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to