Hi,

I have to import > 400 million rows from a MySQL table(having a composite 
primary key) into a PARTITIONED Hive table Hive via Sqoop. The table has data 
for two years with a column departure date ranging from 20120605 to 20140605 
and thousands of records for one day. I need to partition the data based on the 
departure date.

The versions :

Apache Hadoop  -           1.0.4
Apache Hive      -           0.9.0
Apache Sqoop    -           sqoop-1.4.2.bin__hadoop-1.0.0

As per my knowledge, there are 3 approaches:

1.    MySQL -> Non-partitioned Hive table -> INSERT from Non-partitioned Hive 
table into Partitioned Hive table

The current painful one that I'm following

2.    MySQL -> Partitioned Hive table

I read that the support for this is added in later(?) versions of Hive and 
Sqoop but was unable to find an example

3.    MySQL -> Non-partitioned Hive table -> ALTER Non-partitioned Hive table 
to add PARTITION
The syntax dictates to specify partitions as key value pairs - not feasible in 
case of millions of records where one cannot think of all the partition 
key-value pairs

Can anyone provide inputs for approaches 2 and 3?

Regards,
Omkar Joshi


________________________________
The contents of this e-mail and any attachment(s) may contain confidential or 
privileged information for the intended recipient(s). Unintended recipients are 
prohibited from taking action on the basis of information in this e-mail and 
using or disseminating the information, and must notify the sender and delete 
it from their system. L&T Infotech will not accept responsibility or liability 
for the accuracy or completeness of, or the presence of any virus or disabling 
code in this e-mail"

Reply via email to