Hi,

 I have a number of Hive jobs that run during a day. Each individual job is
outputting data to Amazon S3. The Hive jobs use dynamic partitioning.

The problem is that when different jobs need to write to the same dynamic
partition, they will each generate one file.

What I would like is for the subsequent jobs to load the existing data and
merge it with the new data. Can this be achieved somehow? Is there an
option that needs to be enabled? I already set:

SET hive.merge.mapredfiles = true;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;

I should mention that the query that actually outputs to S3 is an INSERT
INTO TABLE query. The Hive version is 0.8.1


Thank you,
Cosmin

Reply via email to