Dudu Markovitz created HIVE-16299: ------------------------------------- Summary: In case of partitioned table, MSCK REPAIR TABLE does not do a full validation of a FS paths and in result create false partitions and directories Key: HIVE-16299 URL: https://issues.apache.org/jira/browse/HIVE-16299 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: storage-2.2.0 Reporter: Dudu Markovitz Priority: Minor
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java static String getPartitionName(Path tablePath, Path partitionPath, Set<String> partCols) ------------------------------------------------------------------------------------ MSCK REPAIR validates that any sub-directory is in the format col=val and that there is indeed a partition column named "col". However, there is no validation of the partition column location and as a result false partitions are being created and so are directories that match those partitions. e.g. 1 hive> dfs -mkdir -p /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5; hive> create external table t (i int) partitioned by (a int,b int,c int) ; OK hive> msck repair table t; OK Partitions not in metastore: t:a=1/a=2/a=3/b=4/c=5 Repair: Added partition to metastore t:a=1/a=2/a=3/b=4/c=5 Time taken: 0.563 seconds, Fetched: 2 row(s) hive> show partitions t; OK a=3/b=4/c=5 hive> dfs -ls -R /user/hive/warehouse/t; drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:07 /user/hive/warehouse/t/a=1 drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:07 /user/hive/warehouse/t/a=1/a=2 drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:07 /user/hive/warehouse/t/a=1/a=2/a=3 drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:07 /user/hive/warehouse/t/a=1/a=2/a=3/b=4 drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:07 /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5 drwxrwxrwx - cloudera supergroup 0 2017-03-26 13:07 /user/hive/warehouse/t/a=3 drwxrwxrwx - cloudera supergroup 0 2017-03-26 13:07 /user/hive/warehouse/t/a=3/b=4 drwxrwxrwx - cloudera supergroup 0 2017-03-26 13:07 /user/hive/warehouse/t/a=3/b=4/c=5 e.g. 2 hive> dfs -mkdir -p /user/hive/warehouse/t/c=3/b=2/a=1; hive> create external table t (i int) partitioned by (a int,b int,c int); OK hive> msck repair table t; OK Partitions not in metastore: t:c=3/b=2/a=1 Repair: Added partition to metastore t:c=3/b=2/a=1 Time taken: 0.512 seconds, Fetched: 2 row(s) hive> show partitions t; OK a=1/b=2/c=3 hive> dfs -ls -R /user/hive/warehouse/t; drwxrwxrwx - cloudera supergroup 0 2017-03-26 13:13 /user/hive/warehouse/t/a=1 drwxrwxrwx - cloudera supergroup 0 2017-03-26 13:13 /user/hive/warehouse/t/a=1/b=2 drwxrwxrwx - cloudera supergroup 0 2017-03-26 13:13 /user/hive/warehouse/t/a=1/b=2/c=3 drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:12 /user/hive/warehouse/t/c=3 drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:12 /user/hive/warehouse/t/c=3/b=2 drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:12 /user/hive/warehouse/t/c=3/b=2/a=1 -- This message was sent by Atlassian JIRA (v6.3.15#6346)