[ https://issues.apache.org/jira/browse/SQOOP-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hari Sekhon reopened SQOOP-1293: -------------------------------- This isn't really invalid, my ex-colleague Jarec who is a Sqoop committer independently confirmed this was a TODO around the same here: http://stackoverflow.com/questions/22344781/how-to-import-table-as-external-table-using-sqoop The Jira SQOOP-816 may sum this up for succinctly, maybe mark this as duplicate and repoint to the URL below but it's still an issue/limitation. https://issues.apache.org/jira/browse/SQOOP-816 Obviously multi-step workarounds are possible but inelegant. The HCatalog reference doesn't make much sense to me since this is a Sqoop operation from DB and should be doable in a one shot import command but happy to be enlightened if this operation can be performed in any one command: import table from database as external table placing data in given path in HDFS, deleting the directory if it exists to avoid cumulative data build up (ie a total table refresh operation from source). Regards, Hari > --hive-import causes --target-dir and --warehouse-dir to not be respected, > nor --delete-target-dir > -------------------------------------------------------------------------------------------------- > > Key: SQOOP-1293 > URL: https://issues.apache.org/jira/browse/SQOOP-1293 > Project: Sqoop > Issue Type: Bug > Components: connectors/sqlserver, hive-integration > Affects Versions: 1.4.4 > Environment: Hortonworks Sandbox 2.0 > Reporter: Hari Sekhon > > Hi, > I'm importing a table from SQL Server 2012 and am using --hive-import to > create the metadata automatically, but am finding that it causes --target-dir > and --warehouse-dir to not be respected, nor --delete-target-dir. > sqoop import --connect "jdbc:sqlserver://x.x.x.x:1533;database=MyDatabase" > --username omitted --password omitted --driver > com.microsoft.sqlserver.jdbc.SQLServerDriver --table "cube.DimCounterParty" > --split-by CounterpartyKey --hive-import --target-dir > /MyDatabase/CounterParty --delete-target-dir > (fyi I'm using --driver to work around bug SQOOP-1292) > So I tried --warehouse-dir in case it needed that instead of --target-dir > sqoop import --connect "jdbc:sqlserver://x.x.x.x:1533;database=MyDatabase" > --username omitted --password omitted --driver > com.microsoft.sqlserver.jdbc.SQLServerDriver --table "cube.DimCounterParty" > --split-by CounterpartyKey --hive-import --warehouse-dir > /MyDatabase/CounterParty --delete-target-dir > but in both cases it ingested the data to > /apps/hive/warehouse/cube.db/dimcounterparty. > What's also strange is that it created the directory specified for > --warehouse-dir but then didn't appear to place the data in it. > I wanted to use --delete-target-dir to replace the whole table each time for > this test since the source table is only ~650,000 rows and 185MB. > What I've found is that on top of ingesting in to > /apps/hive/warehouse/cube.db/dimcounterparty by disregarding > --delete-target-dir it is causing the table volume to grow cumulatively for > each run, such that after a few runs the {noformat}select count(*){noformat} > on the table now shows 5,546,661 rows instead of 650,000. > Here is the the hive warehouse directory on HDFS where you can see the > accumulation of the data: > {noformat} > hadoop fs -ls /apps/hive/warehouse/cube.db/dimcounterparty/ > Found 40 items > -rw-r--r-- 3 root hdfs 0 2014-03-07 08:44 > /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS > -rw-r--r-- 3 root hdfs 0 2014-03-07 09:10 > /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_1 > -rw-r--r-- 3 root hdfs 0 2014-03-07 09:33 > /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_2 > -rw-r--r-- 3 root hdfs 0 2014-03-07 09:37 > /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_3 > -rw-r--r-- 3 root hdfs 0 2014-03-07 09:42 > /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_4 > -rw-r--r-- 3 root hdfs 0 2014-03-07 10:04 > /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_5 > -rw-r--r-- 3 root hdfs 0 2014-03-07 10:14 > /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_6 > -rw-r--r-- 3 root hdfs 0 2014-03-07 10:16 > /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_7 > -rw-r--r-- 3 root hdfs 49044407 2014-03-07 08:44 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000 > -rw-r--r-- 3 root hdfs 49045389 2014-03-07 09:10 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_1 > -rw-r--r-- 3 root hdfs 49045944 2014-03-07 09:33 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_2 > -rw-r--r-- 3 root hdfs 49045944 2014-03-07 09:37 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_3 > -rw-r--r-- 3 root hdfs 49045944 2014-03-07 09:41 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_4 > -rw-r--r-- 3 root hdfs 49045944 2014-03-07 10:04 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_5 > -rw-r--r-- 3 root hdfs 49045944 2014-03-07 10:14 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_6 > -rw-r--r-- 3 root hdfs 49045944 2014-03-07 10:15 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_7 > -rw-r--r-- 3 root hdfs 52363518 2014-03-07 08:44 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001 > -rw-r--r-- 3 root hdfs 52363912 2014-03-07 09:10 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_1 > -rw-r--r-- 3 root hdfs 52364256 2014-03-07 09:33 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_2 > -rw-r--r-- 3 root hdfs 52364256 2014-03-07 09:37 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_3 > -rw-r--r-- 3 root hdfs 52364256 2014-03-07 09:41 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_4 > -rw-r--r-- 3 root hdfs 52364256 2014-03-07 10:03 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_5 > -rw-r--r-- 3 root hdfs 52364256 2014-03-07 10:14 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_6 > -rw-r--r-- 3 root hdfs 52364256 2014-03-07 10:15 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_7 > -rw-r--r-- 3 root hdfs 51796051 2014-03-07 08:44 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002 > -rw-r--r-- 3 root hdfs 51796027 2014-03-07 09:10 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_1 > -rw-r--r-- 3 root hdfs 51796623 2014-03-07 09:33 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_2 > -rw-r--r-- 3 root hdfs 51796623 2014-03-07 09:37 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_3 > -rw-r--r-- 3 root hdfs 51796623 2014-03-07 09:41 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_4 > -rw-r--r-- 3 root hdfs 51796623 2014-03-07 10:03 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_5 > -rw-r--r-- 3 root hdfs 51796623 2014-03-07 10:14 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_6 > -rw-r--r-- 3 root hdfs 51796623 2014-03-07 10:15 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_7 > -rw-r--r-- 3 root hdfs 45445570 2014-03-07 08:44 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003 > -rw-r--r-- 3 root hdfs 45445544 2014-03-07 09:10 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_1 > -rw-r--r-- 3 root hdfs 45445719 2014-03-07 09:33 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_2 > -rw-r--r-- 3 root hdfs 45445719 2014-03-07 09:37 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_3 > -rw-r--r-- 3 root hdfs 45445719 2014-03-07 09:42 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_4 > -rw-r--r-- 3 root hdfs 45445719 2014-03-07 10:04 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_5 > -rw-r--r-- 3 root hdfs 45445719 2014-03-07 10:14 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_6 > -rw-r--r-- 3 root hdfs 45445719 2014-03-07 10:16 > /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_7 > {noformat} > Is this a bug that it doesn't respect --target-dir or at least > --warehouse-dir? > This highlights another issue that this should be more intuitive and/or > {noformat}sqoop import --help{noformat} should make it easier to see what > options are (not) compatible, or it should specify in the output at job > initiation time where switches will be disregarded, such as it does when > using {noformat}--hive-<option>{noformat} without > {noformat}--hive-import{noformat} > In my last place I recall using sqoop create-hive-table to generate the > metadata after import and then editing the table location metadata. It would > be a lot better if we could fix the behaviour of --hive-import to not require > such a multi-step workaround. > Thanks > Hari Sekhon > http://www.linkedin.com/in/harisekhon -- This message was sent by Atlassian JIRA (v6.2#6252)