Re: [jira] [Commented] (SQOOP-1393) Import data from database to Hive as Parquet files

Venkat Sun, 14 Sep 2014 18:22:02 -0700

>14/09/12 20:16:55 INFO mapred.JobClient: Task Id :
attempt_201409022012_0543_m_000000_2, Status : FAILED
java.lang.RuntimeException: Should never be used


This is because getRecordWriter is explicitly disabled in Parquet.   That
is why I mentioned in another thread that while HCatalog support in sqoop
is storage agnostic and should in theory support all hive serde's, some
storage formats may not work.

That said, to import into parquet files, you can use the --as-parquetfile
option that was recently introduced.

Thanks

Venkat

On Fri, Sep 12, 2014 at 5:29 PM, Pratik Khadloya (JIRA) <j...@apache.org>
wrote:

>
>     [
> https://issues.apache.org/jira/browse/SQOOP-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14132376#comment-14132376
> ]
>
> Pratik Khadloya commented on SQOOP-1393:
> ----------------------------------------
>
> I got past the metadata error by running the command using hcatalog.
>
> {code}
> bin/sqoop import -jt myjt:xxxx --connect jdbc:mysql://mydbserver.net/mydb
> --username myuser --password mypwd --query "SELECT ... WHERE \$CONDITIONS"
> --num-mappers 1 --hcatalog-storage-stanza "STORED AS PARQUET"
> --create-hcatalog-table --hcatalog-table abc2
> {code}
>
> But, since i am using hive 0.13, i get the following error which states
> that one should not use MapredParquetOutputFormat with hive 0.13 as it has
> native support for PARQUET files.
> {code}
> 14/09/12 20:16:55 INFO mapred.JobClient: Task Id :
> attempt_201409022012_0543_m_000000_2, Status : FAILED
> java.lang.RuntimeException: Should never be used
>         at
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:77)
>         at
> org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:103)
>         at
> org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
>         at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:548)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:653)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
> {code}
>
> Is there any code change planned for supporting hive 0.13 ?
>
> > Import data from database to Hive as Parquet files
> > --------------------------------------------------
> >
> >                 Key: SQOOP-1393
> >                 URL: https://issues.apache.org/jira/browse/SQOOP-1393
> >             Project: Sqoop
> >          Issue Type: Sub-task
> >          Components: tools
> >            Reporter: Qian Xu
> >            Assignee: Richard
> >             Fix For: 1.4.6
> >
> >         Attachments: patch.diff, patch_v2.diff, patch_v3.diff
> >
> >
> > Import data to Hive as Parquet file can be separated into two steps:
> > 1. Import an individual table from an RDBMS to HDFS as a set of Parquet
> files.
> > 2. Import the data into Hive by generating and executing a CREATE TABLE
> statement to define the data's layout in Hive with Parquet format table
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>



-- 
Regards

Venkat

Re: [jira] [Commented] (SQOOP-1393) Import data from database to Hive as Parquet files

Reply via email to