[ https://issues.apache.org/jira/browse/SQOOP-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131963#comment-14131963 ]
Pratik Khadloya commented on SQOOP-1393: ---------------------------------------- I get the following error when i try to run it: {code} $ bin/sqoop import --connect jdbc:mysql://mydbserver.net/mydb --username myuser --password mypwd --target-dir /user/myuser/sqoop/mydir --query "SELECT... WHERE \$CONDITIONS" --num-mappers 1 --hive-import --hive-table test --create-hive-table --as-parquetfile Please set $HBASE_HOME to the root of your HBase installation. Warning: /home/myuser/sqoop-57336d7/bin/../../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 14/09/12 15:05:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-SNAPSHOT 14/09/12 15:05:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 14/09/12 15:05:31 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override 14/09/12 15:05:31 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc. 14/09/12 15:05:31 INFO manager.SqlManager: Using default fetchSize of 1000 14/09/12 15:05:31 INFO tool.CodeGenTool: Beginning code generation 14/09/12 15:05:32 INFO manager.SqlManager: Executing SQL statement: SELECT ... 14/09/12 15:05:32 INFO manager.SqlManager: Executing SQL statement: SELECT ... 14/09/12 15:05:32 INFO manager.SqlManager: Executing SQL statement: SELECT ... 14/09/12 15:05:32 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce Note: /tmp/sqoop-myuser/compile/8dfb9e9347276b8e57d840fa6ec2e759/QueryResult.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 14/09/12 15:05:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-myuser/compile/8dfb9e9347276b8e57d840fa6ec2e759/QueryResult.jar 14/09/12 15:05:34 INFO mapreduce.ImportJobBase: Beginning query import. 14/09/12 15:05:34 INFO manager.SqlManager: Executing SQL statement: SELECT ... 14/09/12 15:05:34 INFO manager.SqlManager: Executing SQL statement: SELECT ... 14/09/12 15:05:35 INFO hcatalog.HCatalogManagedMetadataProvider: Default FS: hdfs://foo-ha:8020 14/09/12 15:05:35 WARN impl.HCatalog: Using a local Hive MetaStore (for testing only) 14/09/12 15:05:35 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 14/09/12 15:05:35 INFO metastore.ObjectStore: ObjectStore, initialize called 14/09/12 15:05:35 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 14/09/12 15:05:35 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 14/09/12 15:05:35 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 14/09/12 15:05:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 14/09/12 15:05:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 14/09/12 15:05:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 14/09/12 15:05:37 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 14/09/12 15:05:37 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing 14/09/12 15:05:37 INFO metastore.ObjectStore: Initialized ObjectStore 14/09/12 15:05:37 INFO metastore.HiveMetaStore: Added admin role in metastore 14/09/12 15:05:37 INFO metastore.HiveMetaStore: Added public role in metastore 14/09/12 15:05:37 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty 14/09/12 15:05:37 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=null 14/09/12 15:05:37 INFO HiveMetaStore.audit: ugi=myuser ip=unknown-ip-addr cmd=get_table : db=default tbl=null 14/09/12 15:05:37 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetExistsException: Metadata already exists for dataset:null org.kitesdk.data.DatasetExistsException: Metadata already exists for dataset:null at org.kitesdk.data.hcatalog.HCatalogManagedMetadataProvider.create(HCatalogManagedMetadataProvider.java:53) at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:134) at org.kitesdk.data.Datasets.create(Datasets.java:194) at org.kitesdk.data.Datasets.create(Datasets.java:233) at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:81) at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:70) at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:112) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:253) at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:721) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:499) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) {code} Do you have hcat setup on your test machine? > Import data from database to Hive as Parquet files > -------------------------------------------------- > > Key: SQOOP-1393 > URL: https://issues.apache.org/jira/browse/SQOOP-1393 > Project: Sqoop > Issue Type: Sub-task > Components: tools > Reporter: Qian Xu > Assignee: Richard > Fix For: 1.4.6 > > Attachments: patch.diff, patch_v2.diff, patch_v3.diff > > > Import data to Hive as Parquet file can be separated into two steps: > 1. Import an individual table from an RDBMS to HDFS as a set of Parquet files. > 2. Import the data into Hive by generating and executing a CREATE TABLE > statement to define the data's layout in Hive with Parquet format table -- This message was sent by Atlassian JIRA (v6.3.4#6332)