Hello, Following on from my earlier post concerning syncing Hive data from an on premise cluster to the cloud, I've been experimenting with the IMPORT/EXPORT functionality to move data from an on-premise HDP cluster to Amazon EMR. I started out with some simple Exports/Imports as these can be the core operations on which replication is founded. This worked fine with some on-premise clusters running HDP-2.2.4.
// on cluster 1 EXPORT TABLE my_table PARTITION (year_month='2015-12') TO '/exports/my_table' FOR REPLICATION ('1'); // Copy from cluster1:/exports/my_table to cluster2:/staging/my_table // on cluster 2 IMPORT FROM '/staging/my_table' LOCATION '/warehouse/my_table'; // Table created, partition created, data relocated to /warehouse/my_table/year_month=2015-12 I next tried similar with HDP-2.2.4 → EMR (4.2.0) like so: // On premise HDP2.2.4 SET hiveconf:hive.exim.uri.scheme.whitelist=hdfs,pfile,s3n; EXPORT TABLE my_table PARTITION (year_month='2015-12') TO 's3n://API_KEY:SECRET_KEY@exports-bucket/my_table' // on EMR SET hiveconf:hive.exim.uri.scheme.whitelist=hdfs,pfile,s3n; IMPORT FROM 's3n://exports-bucket/my_table' LOCATION 's3n://hive-warehouse-bucket/my_table' The IMPORT behaviour I see is bizarre: 1. Creates the folder 's3n://hive-warehouse/my_table' 2. Copies the part file from 's3n://exports-bucket/my_table/year_month=2015-12' to 's3n://exports-bucket/my_table' (i.e. to the parent) 3. Fails with: "ERROR exec.Task: Failed with exception checkPaths: s3n://exports-bucket/my_table has nested directorys3n://exports-bucket/my_table/year_month=2015-12" It is as if it is attempting to set the final partition location to 's3n://exports-bucket/my_table' and not 's3n://hive-warehouse-bucket/my_table/year_month=2015-12' as happens with HDP → HDP. I've tried variations, specifying the partition on import, excluding the location, all with the same result. Any thoughts or assistance would be appreciated. Thanks - Elliot.