Hi Nitin, Thanks for the inputs - will try out those.
Regards, Omkar Joshi From: Nitin Pawar [mailto:nitinpawar...@gmail.com] Sent: Thursday, June 27, 2013 11:48 AM To: user@hive.apache.org Subject: Re: Import from MySQL to Hive using Sqoop Disclaimer: I am not a sqoop guru so here are just suggestions, sqoop documentation says, "Sqoop job to import data for Hive into a particular partition by specifying the --hive-partition-key and --hive-partition-value arguments" I have not tried these, but not sure will it works in case of dynamic partitioning. Also, not sure have you looked at incremental imports so that you do not have to import old data again and again. Can you put the same question across sqoop user group? To answer your questions: For (2), I already have given the options to use above For (3), As long as you are just importing one date's data and your partition key is that date column, you can write into a directory something like hdfs://blah/datastore/table/partitioncolumn=value/ you can register that partition with hive with one more step, This approach is what option 2 implements where it imports data into a single partition for a given value. On Thu, Jun 27, 2013 at 9:43 AM, Omkar Joshi <omkar.jo...@lntinfotech.com<mailto:omkar.jo...@lntinfotech.com>> wrote: Hi, I have to import > 400 million rows from a MySQL table(having a composite primary key) into a PARTITIONED Hive table Hive via Sqoop. The table has data for two years with a column departure date ranging from 20120605 to 20140605 and thousands of records for one day. I need to partition the data based on the departure date. The versions : Apache Hadoop - 1.0.4 Apache Hive - 0.9.0 Apache Sqoop - sqoop-1.4.2.bin__hadoop-1.0.0 As per my knowledge, there are 3 approaches: 1. MySQL -> Non-partitioned Hive table -> INSERT from Non-partitioned Hive table into Partitioned Hive table The current painful one that I'm following 2. MySQL -> Partitioned Hive table I read that the support for this is added in later(?) versions of Hive and Sqoop but was unable to find an example 3. MySQL -> Non-partitioned Hive table -> ALTER Non-partitioned Hive table to add PARTITION The syntax dictates to specify partitions as key value pairs - not feasible in case of millions of records where one cannot think of all the partition key-value pairs Can anyone provide inputs for approaches 2 and 3? Regards, Omkar Joshi ________________________________ The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. L&T Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail" -- Nitin Pawar