Manpreet Singh created HIVE-19991:
-------------------------------------

             Summary: msck repair table command not able to retrieve achieved 
data.
                 Key: HIVE-19991
                 URL: https://issues.apache.org/jira/browse/HIVE-19991
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.1.0
            Reporter: Manpreet Singh


 Observed an issue when customer used msck repair tablename command on a 
archeived data (data copied via hadoop fs -cp from another location) by the 
command did not loaded the partitions in the table and showed zero results.

 

Please find below complete steps for both archived and unarchived.

 Copied table's partition data to another location & define a similar table 
structure for new location an ran "msck repair table" command on the 
destination and then ran a select * from <destination> table it produced zero 
results, same thing is working for non achieved data.

 

1. Created two table with similar structure and different loctation.

create table sau_test1 ( a int, b string) partitioned by (dt string) stored as 
parquet location '/user/hive/warehouse/sau_test1';

create table sau_arch ( a int, b string) partitioned by (dt string) stored as 
parquet location'/user/hive/warehouse/sau_arch';

2. Inserted data in source table

insert into sau_test1 partition(dt='dt1') select 1,'A1';

select * from sau_test1;

+---------------+-------------++-----------------+
|sau_test1.a|sau_test1.b|sau_test1.dt|

+---------------+-------------++-----------------+
|1|A1|dt1|

+---------------+-------------++-----------------+

3. Copied the content of directory source directory to destination directory.

hadoop fs -cp /user/hive/warehouse/sau_test1/* /user/hive/warehouse/sau_arch/

4. Running msck repair table <destination table> and checking results. — for 
unachieved data

msck repair table sau_arch ;
select * from sau_arch;

select * from sau_arch;
+--------------+------------++----------------+
|sau_arch.a|sau_arch.b|sau_arch.dt|

+--------------+------------++----------------+
|1|A1|dt1|

+--------------+------------++----------------+

5. Customer wants the same functionality for archived data and hence tried 
below steps.

a) Dropped table partition in destination table "alter table sau_arch drop 
partition(dt='dt1');"

b) set hive.archive.enabled=true;
alter table sau_test1 archive partition ( dt='dt1');

c) copied the hdfs files from source table to destination tables.

hdfs dfs -ls /user/hive/warehouse/sau_test1/dt=dt1/

drwxr-xr-x - hive supergroup 0 2018-06-08 13:26 
/user/hive/warehouse/sau_test1/dt=dt1/data.har
-rw-r--r-- 3 hive supergroup 0 2018-06-08 13:26 
/user/hive/warehouse/sau_test1/dt=dt1/data.har/_SUCCESS
-rw-r--r-- 3 hive supergroup 305 2018-06-08 13:26 
/user/hive/warehouse/sau_test1/dt=dt1/data.har/_index
-rw-r--r-- 3 hive supergroup 23 2018-06-08 13:26 
/user/hive/warehouse/sau_test1/dt=dt1/data.har/_masterindex
-rw-r--r-- 3 hive supergroup 286 2018-06-08 13:26 
/user/hive/warehouse/sau_test1/dt=dt1/data.har/part-0
$ hdfs dfs -ls /user/hive/warehouse/sau_arch/dt=dt1/

drwxr-xr-x - ngdb supergroup 0 2018-06-08 13:27 
/user/hive/warehouse/sau_arch/dt=dt1/data.har
-rw-r--r-- 3 ngdb supergroup 0 2018-06-08 13:27 
/user/hive/warehouse/sau_arch/dt=dt1/data.har/_SUCCESS
-rw-r--r-- 3 ngdb supergroup 305 2018-06-08 13:27 
/user/hive/warehouse/sau_arch/dt=dt1/data.har/_index
-rw-r--r-- 3 ngdb supergroup 23 2018-06-08 13:27 
/user/hive/warehouse/sau_arch/dt=dt1/data.har/_masterindex
-rw-r--r-- 3 ngdb supergroup 286 2018-06-08 13:27 
/user/hive/warehouse/sau_arch/dt=dt1/data.har/part-0

d) msck repair table sau_arch;
e)select * from sau_arch . - No results shown

+--------------+------------++----------------+
|sau_arch.a|sau_arch.b|sau_arch.dt|

+--------------+------------++----------------+
+--------------+------------++----------------+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to