Madhusoodan created HIVE-22690:
----------------------------------

             Summary: When the directories from HDFS are deleted while running 
MSCK it fails with FileNotFoundException
                 Key: HIVE-22690
                 URL: https://issues.apache.org/jira/browse/HIVE-22690
             Project: Hive
          Issue Type: Bug
          Components: Hive
    Affects Versions: 2.1.1
            Reporter: Madhusoodan


Assume a table `emp` defined as follows

 
{code:java}
create external table 
    emp (id int, name string) 
partitioned by 
    (dept string)
location
    'hdfs://namenode.com:8020/hive/data/db/emp'
;{code}
Create say 1000 partitions in the HDFS

 

Now to synchronize the MetaStore, if we run the MSCK command and parallely 
delete the HDFS directories, at some point MSCK fails with 
FieNotFoundException. Here is the stack trace.

 
{code:java}
2019-12-10 23:21:50,027 WARN  hive.ql.exec.DDLTask: 
[HiveServer2-Background-Pool: Thread-500224]: Failed to run metacheck: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.FileNotFoundException: File 
hdfs://namenode.com:8020/hive/data/db/emp/dept=CS does not exist.
        at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:554)
 ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:443)
 ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:334)
 ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:310)
 ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:253)
 ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:118)
 ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1862) 
[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:413) 
[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) 
[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) 
[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) 
[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) 
[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) 
[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) 
[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) 
[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
 [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at 
org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92)
 [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
 [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_121]
        at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_121]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
 [hadoop-common-3.0.0-cdh6.2.1.jar:?]
        at 
org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
 [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_121]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_121]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[?:1.8.0_121]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[?:1.8.0_121]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: java.io.FileNotFoundException: File 
hdfs://namenode.com:8020/hive/data/db/emp/dept=CS does not exist.
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:985)
 ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:121)
 ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1045)
 ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1042)
 ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 ~[hadoop-common-3.0.0-cdh6.2.1.jar:?]
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1052)
 ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1853) 
~[hadoop-common-3.0.0-cdh6.2.1.jar:?]
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1895) 
~[hadoop-common-3.0.0-cdh6.2.1.jar:?]
        at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfoCallable.processPathDepthInfo(HiveMetaStoreChecker.java:474)
 ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:467)
 ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        at 
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:448)
 ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
        ... 4 more
{code}
I analyzed the stack trace and found that the problem is in class 
HiveMetaStoreChecker::processPathDepthInfo [1]

 

What we are doing here is
 # Create a Q
 # Put the table's data directory in the Q
 # Start few threads which explore the directories in Q and add the newly 
discovered ones to the Q.

This process has a flaw. Say there are 1000 first level directories and 
1000*500 second level directories, then we can prove that there exists 
sufficient amount of time between putting a path in the Q and exploring the 
content of the same directory. This time is large enough to do a HDFS delete 
and if done so results in the above failure.

 

What can be the improvement.
 # [best according to me] Consume the exception and may be print it in DEBUG 
mode
 # Check the existence of the directory before listing the content in it.

 

References:

[1] 
https://github.com/apache/hive/blob/01faca2f9d7dcb0f5feabfcb07fa5ea12b79c5b9/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java#L474

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to