[ https://issues.apache.org/jira/browse/SQOOP-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186552#comment-17186552 ]
Ram commented on SQOOP-3151: ---------------------------- [~sanysand...@gmail.com] [~BoglarkaEgyed] We are using *sqoop 1.4.7* to upload parquet data that is stored in HDFS - *Plain parquet files and NOT a Hive table* **We're still facing the same issue - {code:java} 20/08/28 13:37:02 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetIOException: Cannot access descriptor location: hdfs:///<location>/part-00000-f9f92493-36a1-4714-bcc6-291c118cf599-c000/snappy/parquet/.metadata org.kitesdk.data.DatasetIOException: Cannot access descriptor location: hdfs:///<location>/part-00000-f9f92493-36a1-4714-bcc6-291c118cf599-c000/snappy/parquet/.metadata{code} The command we're running - {code:java} /sqoop-1.4.7.bin__hadoop-2.6.0/bin/sqoop export --connect jdbc:postgresql://<postgres_db_details> --username <username> --password <password> --table <table_name> --export-dir hdfs:///<location>/part-00000-f9f92493-36a1-4714-bcc6-291c118cf599-c000.parquet {code} Postgres JAR - postgresql-42.2.11.jar Please do suggest a solution ASAP. > Sqoop export HDFS file type auto detection can pick wrong type > -------------------------------------------------------------- > > Key: SQOOP-3151 > URL: https://issues.apache.org/jira/browse/SQOOP-3151 > Project: Sqoop > Issue Type: Bug > Affects Versions: 1.4.6 > Reporter: Boglarka Egyed > Assignee: Sandish Kumar HN > Priority: Major > > It appears that Sqoop export tries to detect the file format by reading the > first 3 characters of a file. Based on that header, the appropriate file > reader is used. However, if the result set happens to contain the header > sequence, the wrong reader is chosen resulting in a misleading error. > For example, if someone is exporting a table in which one of the field values > is "PART". Since Sqoop sees the letters "PAR", it is invoking the Kite SDK as > it assumes the file is in Parquet format. This leads to a misleading error: > ERROR sqoop.Sqoop: Got exception running Sqoop: > org.kitesdk.data.DatasetNotFoundException: Descriptor location does not > exist: hdfs://<path>.metadata > org.kitesdk.data.DatasetNotFoundException: Descriptor location does not > exist: hdfs://<path>.metadata > This can be reproduced easily, using Hive as a real world example: > > create table test2 (val string); > > insert into test1 values ('PAR'); > Then run a sqoop export against the table data: > $ sqoop export --connect $MYCONN --username $MYUSER --password $MYPWD -m 1 > --export-dir /user/hive/warehouse/test --table $MYTABLE > Sqoop will fail with the following: > ERROR sqoop.Sqoop: Got exception running Sqoop: > org.kitesdk.data.DatasetNotFoundException: Descriptor location does not > exist: hdfs://<path>.metadata > org.kitesdk.data.DatasetNotFoundException: Descriptor location does not > exist: hdfs://<path>.metadata > Changing value from "PAR" to something else, like 'Obj' (Avro) or 'SEQ' > (sequencefile), which will result in similar errors. -- This message was sent by Atlassian Jira (v8.3.4#803005)