Markus Kemper created SQOOP-3046:
------------------------------------

             Summary: Add support for (import + --hcatalog* + --as-parquetfile) 
                 Key: SQOOP-3046
                 URL: https://issues.apache.org/jira/browse/SQOOP-3046
             Project: Sqoop
          Issue Type: Improvement
          Components: hive-integration
            Reporter: Markus Kemper


This is a request to identify a way to support Sqoop import with --hcatalog 
options when writing Parquet data files.   The test case below demonstrates the 
issue.

CODE SNIP
{noformat}
../MapredParquetOutputFormat.java       
69  @Override
70  public RecordWriter<Void, ParquetHiveRecord> getRecordWriter(
71      final FileSystem ignored,
72      final JobConf job,
73      final String name,
74      final Progressable progress
75      ) throws IOException {
76    throw new RuntimeException("Should never be used");
77  }
{noformat}

TEST CASE:
{noformat}
STEP 01 - Create MySQL Tables

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"drop table t1"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"create table t1 (c_int int, c_date date, c_timestamp timestamp)"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"describe t1"
---------------------------------------------------------------------------------------------------------
| Field                | Type                 | Null | Key | Default            
  | Extra                | 
---------------------------------------------------------------------------------------------------------
| c_int                | int(11)              | YES |     | (null)              
 |                      | 
| c_date               | date                 | YES |     | (null)              
 |                      | 
| c_timestamp          | timestamp            | NO  |     | CURRENT_TIMESTAMP   
 | on update CURRENT_TIMESTAMP | 
---------------------------------------------------------------------------------------------------------

STEP 02 : Insert and Select Row

sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"insert into t1 values (1, current_date(), current_timestamp())"
sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
"select * from t1"
--------------------------------------------------
| c_int       | c_date     | c_timestamp         | 
--------------------------------------------------
| 1           | 2016-10-26 | 2016-10-26 14:30:33.0 | 
--------------------------------------------------

beeline -u jdbc:hive2:// -e "use default; drop table t1"
sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username 
$MYUSER --password $MYPSWD --table t1 --hcatalog-database default 
--hcatalog-table t1 --create-hcatalog-table --hcatalog-storage-stanza 'stored 
as parquet' --num-mappers 1

[sqoop console debug]
16/11/02 20:25:15 INFO mapreduce.Job: Task Id : 
attempt_1478089149450_0046_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Should never be used
        at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:76)
        at 
org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:102)
        at 
org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

[yarn maptask debug]    
2016-11-02 20:25:15,565 INFO [main] org.apache.hadoop.mapred.MapTask: 
Processing split: 1=1 AND 1=1
2016-11-02 20:25:15,583 DEBUG [main] 
org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat: Creating db record 
reader for db product: MYSQL
2016-11-02 20:25:15,613 INFO [main] 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output 
Committer Algorithm version is 1
2016-11-02 20:25:15,614 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: mapred.output.key.class is 
deprecated. Instead, use mapreduce.job.output.key.class
2016-11-02 20:25:15,620 INFO [main] 
org.apache.hadoop.conf.Configuration.deprecation: mapred.output.value.class is 
deprecated. Instead, use mapreduce.job.output.value.class
2016-11-02 20:25:15,633 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : java.lang.RuntimeException: Should never be used
        at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:76)
        at 
org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:102)
        at 
org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
{noformat}







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to