Why dont you load all of your data into a temporary table and then from
there to your current tables.

Hive will take care of adding dynic partitions and that will remove  the
ocerhead from you.

To answer your question, you can always load data in different partitions
parallely as long as you have resources available on hive cli machine

On May 3, 2013 12:35 PM, "selva" <selvai...@gmail.com> wrote:
>
> Hi All,
>
> I need to load a month worth of processed data into a hive table. Table
have 10 partitions. Each day have many files to load and each file is
taking two seconds(constantly) and i have ~3000 files). So it will take
days to complete for 30 days worth of data.
>
> I planned to load every day data parellaly into respective partition so
that i can complete it short time.
>
> But i need clarrification before proceeding it.
>
> Question:
>
> 1. Will it cause data loss/corruption by loading parellely in different
partition of same hive table ?
>
> For example, Assume i am doing like below,
>
> Table : processedlogs
> Partition : logdate
>
> Running below commands parellely,
> LOAD DATA INPATH '/logs/processed/2013-04-01' OVERWRITE INTO TABLE
processedlogs PARTITION(logdate='2013-04-01');
> LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE
processedlogs PARTITION(logdate='2013-04-02');
> LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE
processedlogs PARTITION(logdate='2013-04-03');
> LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE
processedlogs PARTITION(logdate='2013-04-04');
> .....
> LOAD DATA INPATH '/logs/processed/2013-04-30' OVERWRITE INTO TABLE
processedlogs PARTITION(logdate='2013-04-30');
>
> Thanks
> Selva
>
>
>
>
>
>
>

Reply via email to