[ 
https://issues.apache.org/jira/browse/SQOOP-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon reopened SQOOP-1293:
--------------------------------


This isn't really invalid, my ex-colleague Jarec who is a Sqoop committer 
independently confirmed this was a TODO around the same here:

http://stackoverflow.com/questions/22344781/how-to-import-table-as-external-table-using-sqoop

The Jira SQOOP-816 may sum this up for succinctly, maybe mark this as duplicate 
and repoint to the URL below but it's still an issue/limitation.

https://issues.apache.org/jira/browse/SQOOP-816

Obviously multi-step workarounds are possible but inelegant. The HCatalog 
reference doesn't make much sense to me since this is a Sqoop operation from DB 
and should be doable in a one shot import command but happy to be enlightened 
if this operation can be performed in any one command:

import table from database as external table placing data in given path in 
HDFS, deleting the directory if it exists to avoid cumulative data build up (ie 
a total table refresh operation from source).

Regards,

Hari

> --hive-import causes --target-dir and --warehouse-dir to not be respected, 
> nor --delete-target-dir
> --------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1293
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1293
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors/sqlserver, hive-integration
>    Affects Versions: 1.4.4
>         Environment: Hortonworks Sandbox 2.0
>            Reporter: Hari Sekhon
>
> Hi,
> I'm importing a table from SQL Server 2012 and am using --hive-import to 
> create the metadata automatically, but am finding that it causes --target-dir 
> and --warehouse-dir to not be respected, nor --delete-target-dir.
> sqoop import --connect "jdbc:sqlserver://x.x.x.x:1533;database=MyDatabase" 
> --username omitted --password omitted --driver 
> com.microsoft.sqlserver.jdbc.SQLServerDriver --table "cube.DimCounterParty" 
> --split-by CounterpartyKey --hive-import --target-dir 
> /MyDatabase/CounterParty --delete-target-dir
> (fyi I'm using --driver to work around bug SQOOP-1292)
> So I tried --warehouse-dir in case it needed that instead of --target-dir
> sqoop import --connect "jdbc:sqlserver://x.x.x.x:1533;database=MyDatabase" 
> --username omitted --password omitted --driver 
> com.microsoft.sqlserver.jdbc.SQLServerDriver --table "cube.DimCounterParty" 
> --split-by CounterpartyKey --hive-import --warehouse-dir 
> /MyDatabase/CounterParty --delete-target-dir
> but in both cases it ingested the data to 
> /apps/hive/warehouse/cube.db/dimcounterparty.
> What's also strange is that it created the directory specified for 
> --warehouse-dir but then didn't appear to place the data in it.
> I wanted to use --delete-target-dir to replace the whole table each time for 
> this test since the source table is only ~650,000 rows and 185MB.
> What I've found is that on top of ingesting in to 
> /apps/hive/warehouse/cube.db/dimcounterparty by disregarding 
> --delete-target-dir it is causing the table volume to grow cumulatively for 
> each run, such that after a few runs the {noformat}select count(*){noformat} 
> on the table now shows 5,546,661 rows instead of 650,000.
> Here is the the hive warehouse directory on HDFS where you can see the 
> accumulation of the data:
> {noformat}
>  hadoop fs -ls /apps/hive/warehouse/cube.db/dimcounterparty/
> Found 40 items
> -rw-r--r--   3 root hdfs          0 2014-03-07 08:44 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS
> -rw-r--r--   3 root hdfs          0 2014-03-07 09:10 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_1
> -rw-r--r--   3 root hdfs          0 2014-03-07 09:33 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_2
> -rw-r--r--   3 root hdfs          0 2014-03-07 09:37 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_3
> -rw-r--r--   3 root hdfs          0 2014-03-07 09:42 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_4
> -rw-r--r--   3 root hdfs          0 2014-03-07 10:04 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_5
> -rw-r--r--   3 root hdfs          0 2014-03-07 10:14 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_6
> -rw-r--r--   3 root hdfs          0 2014-03-07 10:16 
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_7
> -rw-r--r--   3 root hdfs   49044407 2014-03-07 08:44 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000
> -rw-r--r--   3 root hdfs   49045389 2014-03-07 09:10 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_1
> -rw-r--r--   3 root hdfs   49045944 2014-03-07 09:33 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_2
> -rw-r--r--   3 root hdfs   49045944 2014-03-07 09:37 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_3
> -rw-r--r--   3 root hdfs   49045944 2014-03-07 09:41 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_4
> -rw-r--r--   3 root hdfs   49045944 2014-03-07 10:04 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_5
> -rw-r--r--   3 root hdfs   49045944 2014-03-07 10:14 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_6
> -rw-r--r--   3 root hdfs   49045944 2014-03-07 10:15 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_7
> -rw-r--r--   3 root hdfs   52363518 2014-03-07 08:44 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001
> -rw-r--r--   3 root hdfs   52363912 2014-03-07 09:10 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_1
> -rw-r--r--   3 root hdfs   52364256 2014-03-07 09:33 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_2
> -rw-r--r--   3 root hdfs   52364256 2014-03-07 09:37 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_3
> -rw-r--r--   3 root hdfs   52364256 2014-03-07 09:41 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_4
> -rw-r--r--   3 root hdfs   52364256 2014-03-07 10:03 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_5
> -rw-r--r--   3 root hdfs   52364256 2014-03-07 10:14 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_6
> -rw-r--r--   3 root hdfs   52364256 2014-03-07 10:15 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_7
> -rw-r--r--   3 root hdfs   51796051 2014-03-07 08:44 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002
> -rw-r--r--   3 root hdfs   51796027 2014-03-07 09:10 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_1
> -rw-r--r--   3 root hdfs   51796623 2014-03-07 09:33 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_2
> -rw-r--r--   3 root hdfs   51796623 2014-03-07 09:37 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_3
> -rw-r--r--   3 root hdfs   51796623 2014-03-07 09:41 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_4
> -rw-r--r--   3 root hdfs   51796623 2014-03-07 10:03 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_5
> -rw-r--r--   3 root hdfs   51796623 2014-03-07 10:14 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_6
> -rw-r--r--   3 root hdfs   51796623 2014-03-07 10:15 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_7
> -rw-r--r--   3 root hdfs   45445570 2014-03-07 08:44 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003
> -rw-r--r--   3 root hdfs   45445544 2014-03-07 09:10 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_1
> -rw-r--r--   3 root hdfs   45445719 2014-03-07 09:33 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_2
> -rw-r--r--   3 root hdfs   45445719 2014-03-07 09:37 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_3
> -rw-r--r--   3 root hdfs   45445719 2014-03-07 09:42 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_4
> -rw-r--r--   3 root hdfs   45445719 2014-03-07 10:04 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_5
> -rw-r--r--   3 root hdfs   45445719 2014-03-07 10:14 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_6
> -rw-r--r--   3 root hdfs   45445719 2014-03-07 10:16 
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_7
> {noformat}
> Is this a bug that it doesn't respect --target-dir or at least 
> --warehouse-dir?
> This highlights another issue that this should be more intuitive and/or 
> {noformat}sqoop import --help{noformat} should make it easier to see what 
> options are (not) compatible, or it should specify in the output at job 
> initiation time where switches will be disregarded, such as it does when 
> using {noformat}--hive-<option>{noformat} without 
> {noformat}--hive-import{noformat}
> In my last place I recall using sqoop create-hive-table to generate the 
> metadata after import and then editing the table location metadata. It would 
> be a lot better if we could fix the behaviour of --hive-import to not require 
> such a multi-step workaround.
> Thanks
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to