Hi, I have a number of Hive jobs that run during a day. Each individual job is outputting data to Amazon S3. The Hive jobs use dynamic partitioning.
The problem is that when different jobs need to write to the same dynamic partition, they will each generate one file. What I would like is for the subsequent jobs to load the existing data and merge it with the new data. Can this be achieved somehow? Is there an option that needs to be enabled? I already set: SET hive.merge.mapredfiles = true; SET hive.exec.dynamic.partition = true; SET hive.exec.dynamic.partition.mode = nonstrict; I should mention that the query that actually outputs to S3 is an INSERT INTO TABLE query. The Hive version is 0.8.1 Thank you, Cosmin