[
https://issues.apache.org/jira/browse/SQOOP-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999709#comment-13999709
]
Hari Sekhon edited comment on SQOOP-1293 at 2/17/15 12:08 PM:
--------------------------------------------------------------
This isn't really invalid, my ex-colleague Jarek Jarcec Cecho at Cloudera who
is a Sqoop committer coincidentally independently confirmed this was a TODO
around the same time here:
http://stackoverflow.com/questions/22344781/how-to-import-table-as-external-table-using-sqoop
https://issues.apache.org/jira/browse/SQOOP-816
The Jira SQOOP-816 more simply states the hive import to external table
requirement but I believe I raise a few more points in this Jira around the
real world usage pattern.
Obviously multi-step workarounds are possible but inelegant. The HCatalog
reference doesn't make much sense to me in relation to externally managed full
table replacements.
Requirement succinctly reiterated:
Import table from database into hive as external table placing data in given
path in HDFS, deleting the directory if it exists to avoid cumulative data
build up - ie do a total external table refresh operation from source.
Regards,
Hari
was (Author: harisekhon):
This isn't really invalid, my ex-colleague Jarek Jarcec Cecho at Cloudera who
is a Sqoop committer coincidentally independently confirmed this was a TODO
around the same time here:
http://stackoverflow.com/questions/22344781/how-to-import-table-as-external-table-using-sqoop
https://issues.apache.org/jira/browse/SQOOP-816
The Jira SQOOP-816 more simply states the hive import to external table
requirement but I believe I raise a few more points in this Jira around the
real world usage pattern.
Obviously multi-step workarounds are possible but inelegant. The HCatalog
reference doesn't make much sense to me since this is a Sqoop operation from DB
and should be doable in a one shot import command if the right switch behaviour
was currently working but happy to be enlightened if this operation can be
performed in any one command via any comparable method.
Requirement succinctly reiterated:
import table from database into hive as external table placing data in given
path in HDFS, deleting the directory if it exists to avoid cumulative data
build up (ie a total table refresh operation from source).
Regards,
Hari
> --hive-import causes --target-dir and --warehouse-dir to not be respected,
> nor --delete-target-dir
> --------------------------------------------------------------------------------------------------
>
> Key: SQOOP-1293
> URL: https://issues.apache.org/jira/browse/SQOOP-1293
> Project: Sqoop
> Issue Type: Bug
> Components: connectors/sqlserver, hive-integration
> Affects Versions: 1.4.4
> Reporter: Hari Sekhon
> Priority: Minor
>
> Hi,
> I'm importing a table from SQL Server 2012 and am using --hive-import to
> create the metadata automatically, but am finding that it causes --target-dir
> and --warehouse-dir to not be respected, nor --delete-target-dir.
> sqoop import --connect "jdbc:sqlserver://x.x.x.x:1533;database=MyDatabase"
> --username omitted --password omitted --driver
> com.microsoft.sqlserver.jdbc.SQLServerDriver --table "cube.DimCounterParty"
> --split-by CounterpartyKey --hive-import --target-dir
> /MyDatabase/CounterParty --delete-target-dir
> (fyi I'm using --driver to work around bug SQOOP-1292)
> So I tried --warehouse-dir in case it needed that instead of --target-dir
> sqoop import --connect "jdbc:sqlserver://x.x.x.x:1533;database=MyDatabase"
> --username omitted --password omitted --driver
> com.microsoft.sqlserver.jdbc.SQLServerDriver --table "cube.DimCounterParty"
> --split-by CounterpartyKey --hive-import --warehouse-dir
> /MyDatabase/CounterParty --delete-target-dir
> but in both cases it ingested the data to
> /apps/hive/warehouse/cube.db/dimcounterparty.
> What's also strange is that it created the directory specified for
> --warehouse-dir but then didn't appear to place the data in it.
> I wanted to use --delete-target-dir to replace the whole table each time for
> this test since the source table is only ~650,000 rows and 185MB.
> What I've found is that on top of ingesting in to
> /apps/hive/warehouse/cube.db/dimcounterparty by disregarding
> --delete-target-dir it is causing the table volume to grow cumulatively for
> each run, such that after a few runs the {noformat}select count(*){noformat}
> on the table now shows 5,546,661 rows instead of 650,000.
> Here is the the hive warehouse directory on HDFS where you can see the
> accumulation of the data:
> {noformat}
> hadoop fs -ls /apps/hive/warehouse/cube.db/dimcounterparty/
> Found 40 items
> -rw-r--r-- 3 root hdfs 0 2014-03-07 08:44
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS
> -rw-r--r-- 3 root hdfs 0 2014-03-07 09:10
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_1
> -rw-r--r-- 3 root hdfs 0 2014-03-07 09:33
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_2
> -rw-r--r-- 3 root hdfs 0 2014-03-07 09:37
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_3
> -rw-r--r-- 3 root hdfs 0 2014-03-07 09:42
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_4
> -rw-r--r-- 3 root hdfs 0 2014-03-07 10:04
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_5
> -rw-r--r-- 3 root hdfs 0 2014-03-07 10:14
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_6
> -rw-r--r-- 3 root hdfs 0 2014-03-07 10:16
> /apps/hive/warehouse/cube.db/dimcounterparty/_SUCCESS_copy_7
> -rw-r--r-- 3 root hdfs 49044407 2014-03-07 08:44
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000
> -rw-r--r-- 3 root hdfs 49045389 2014-03-07 09:10
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_1
> -rw-r--r-- 3 root hdfs 49045944 2014-03-07 09:33
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_2
> -rw-r--r-- 3 root hdfs 49045944 2014-03-07 09:37
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_3
> -rw-r--r-- 3 root hdfs 49045944 2014-03-07 09:41
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_4
> -rw-r--r-- 3 root hdfs 49045944 2014-03-07 10:04
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_5
> -rw-r--r-- 3 root hdfs 49045944 2014-03-07 10:14
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_6
> -rw-r--r-- 3 root hdfs 49045944 2014-03-07 10:15
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00000_copy_7
> -rw-r--r-- 3 root hdfs 52363518 2014-03-07 08:44
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001
> -rw-r--r-- 3 root hdfs 52363912 2014-03-07 09:10
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_1
> -rw-r--r-- 3 root hdfs 52364256 2014-03-07 09:33
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_2
> -rw-r--r-- 3 root hdfs 52364256 2014-03-07 09:37
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_3
> -rw-r--r-- 3 root hdfs 52364256 2014-03-07 09:41
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_4
> -rw-r--r-- 3 root hdfs 52364256 2014-03-07 10:03
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_5
> -rw-r--r-- 3 root hdfs 52364256 2014-03-07 10:14
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_6
> -rw-r--r-- 3 root hdfs 52364256 2014-03-07 10:15
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00001_copy_7
> -rw-r--r-- 3 root hdfs 51796051 2014-03-07 08:44
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002
> -rw-r--r-- 3 root hdfs 51796027 2014-03-07 09:10
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_1
> -rw-r--r-- 3 root hdfs 51796623 2014-03-07 09:33
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_2
> -rw-r--r-- 3 root hdfs 51796623 2014-03-07 09:37
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_3
> -rw-r--r-- 3 root hdfs 51796623 2014-03-07 09:41
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_4
> -rw-r--r-- 3 root hdfs 51796623 2014-03-07 10:03
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_5
> -rw-r--r-- 3 root hdfs 51796623 2014-03-07 10:14
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_6
> -rw-r--r-- 3 root hdfs 51796623 2014-03-07 10:15
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00002_copy_7
> -rw-r--r-- 3 root hdfs 45445570 2014-03-07 08:44
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003
> -rw-r--r-- 3 root hdfs 45445544 2014-03-07 09:10
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_1
> -rw-r--r-- 3 root hdfs 45445719 2014-03-07 09:33
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_2
> -rw-r--r-- 3 root hdfs 45445719 2014-03-07 09:37
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_3
> -rw-r--r-- 3 root hdfs 45445719 2014-03-07 09:42
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_4
> -rw-r--r-- 3 root hdfs 45445719 2014-03-07 10:04
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_5
> -rw-r--r-- 3 root hdfs 45445719 2014-03-07 10:14
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_6
> -rw-r--r-- 3 root hdfs 45445719 2014-03-07 10:16
> /apps/hive/warehouse/cube.db/dimcounterparty/part-m-00003_copy_7
> {noformat}
> Is this a bug that it doesn't respect --target-dir or at least
> --warehouse-dir?
> This highlights another issue that this should be more intuitive and/or
> {noformat}sqoop import --help{noformat} should make it easier to see what
> options are (not) compatible, or it should specify in the output at job
> initiation time where switches will be disregarded, such as it does when
> using {noformat}--hive-<option>{noformat} without
> {noformat}--hive-import{noformat}
> In my last place I recall using sqoop create-hive-table to generate the
> metadata after import and then editing the table location metadata. It would
> be a lot better if we could fix the behaviour of --hive-import to not require
> such a multi-step workaround.
> Thanks
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)