ISHAN gaur created SQOOP-3477:
---------------------------------

             Summary: Sqoop-import creating .metadata and .signal folders while 
import
                 Key: SQOOP-3477
                 URL: https://issues.apache.org/jira/browse/SQOOP-3477
             Project: Sqoop
          Issue Type: Bug
          Components: sqoop2-client
    Affects Versions: 1.4.7
            Reporter: ISHAN gaur
             Fix For: 1.4.7


Unable to read sqoop-import extracted files in hive 

 

Steps to reproduce -

-1- Import a table from a relational database( Oracle RDS ) or any other rdbms 
into S3 through sqoop import.


Sqoop-import Sample -

sqoop import --connect 
jdbc:oracle:thin:@//ishantest.xxx.us-west-2.rds.amazonaws.com[|https://rds-op-usw2.amazon.com/ops/search?query=ishantest#linker_verify_account=515450464085]:1521/ISHANORC
 --table PV_AGGREGATES --username admin --password xxx# -m 1 
--delete-target-dir --target-dir s3n://xx-isgxaur-logs/sqoop_issue/pv_ag 
--as-parquetfile --compression-codec=snappy

Once sqoop-import is done - Data in s3 will look like -

[hadoop@ip-xx-xx-xx-xx ~]$ aws s3 ls s3://xx-xx-xx/sqoop_issue/ishan_new/


PRE .metadata/
PRE .signals/
2020-06-03 17:49:03 0 .metadata_$folder$
2020-06-03 17:49:03 0 .signals_$folder$
2020-06-03 17:49:03 417 f7e5670c-77ca-460b-80c5-e0449cb09dfe.parquet

 

As we can see sqoop-import is cretaing .metadata and .signal folders while 
extracting the data . This causes failure to read the data in hive

-2- Deploy a v5.19 EMR or any virtual machine with Hive 2.3.3 and sqoop 1.4.7 
installed.

-3- Create an external table, whose location should be the path copied in point 
1.

Command to create external table -

CREATE EXTERNAL TABLE test (dt string,code string,views string) STORED as 
parquet location 's3://xxx-isgaur-logs/sqoop_issue/ishan_new';

-4- Use hive and run a "select * from table; " statement to the table. you will 
get below error 

Error message it throws - Failed with exception 
java.io.IOException:java.lang.RuntimeException: 
s3://cccxxxx-logs/sqoop_issue/ishan_new/.metadata/descriptor.properties is not 
a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 
101, 116, 10]

The above error message is because while reading the data in hive , it doesn't 
expect the .metadata and .signal folders created by sqoop ,hence the failure.

I did extensive analysis on this problem but did not find any solution , please 
let me know if there is any workaround to it ? How can one read the 
sqoop-import extracted data within hive ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to