Hi Nitin,

Thanks for the inputs - will try out those.

Regards,
Omkar Joshi

From: Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Thursday, June 27, 2013 11:48 AM
To: user@hive.apache.org
Subject: Re: Import from MySQL to Hive using Sqoop


Disclaimer: I am not a sqoop guru so here are just suggestions,

sqoop documentation says,
"Sqoop job to import data for Hive into a particular partition by specifying 
the --hive-partition-key and --hive-partition-value arguments"

I have not tried these, but not sure will it works in case of dynamic 
partitioning.

Also, not sure have you looked at incremental imports so that you do not have 
to import old data again and again.

Can you put the same question across sqoop user group?

To answer your questions:

For (2), I already have given the options to use above

For (3), As long as you are just importing one date's data and your partition 
key is that date column, you can write into a directory something like 
hdfs://blah/datastore/table/partitioncolumn=value/
you can register that partition with hive with one more step,

This approach is what option 2 implements  where it imports data into a single 
partition for a given value.


On Thu, Jun 27, 2013 at 9:43 AM, Omkar Joshi 
<omkar.jo...@lntinfotech.com<mailto:omkar.jo...@lntinfotech.com>> wrote:
Hi,

I have to import > 400 million rows from a MySQL table(having a composite 
primary key) into a PARTITIONED Hive table Hive via Sqoop. The table has data 
for two years with a column departure date ranging from 20120605 to 20140605 
and thousands of records for one day. I need to partition the data based on the 
departure date.

The versions :

Apache Hadoop  -           1.0.4
Apache Hive      -           0.9.0
Apache Sqoop    -           sqoop-1.4.2.bin__hadoop-1.0.0

As per my knowledge, there are 3 approaches:

1.    MySQL -> Non-partitioned Hive table -> INSERT from Non-partitioned Hive 
table into Partitioned Hive table

The current painful one that I'm following

2.    MySQL -> Partitioned Hive table

I read that the support for this is added in later(?) versions of Hive and 
Sqoop but was unable to find an example

3.    MySQL -> Non-partitioned Hive table -> ALTER Non-partitioned Hive table 
to add PARTITION
The syntax dictates to specify partitions as key value pairs - not feasible in 
case of millions of records where one cannot think of all the partition 
key-value pairs

Can anyone provide inputs for approaches 2 and 3?

Regards,
Omkar Joshi


________________________________
The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L&T Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"



--
Nitin Pawar

Reply via email to