I might be wrong but I think EMR inserts a reduce job when writing data
into S3. At least in my case, I am able to create a single output file by
SET mapred.reduce.tasks = 1;
INSERT OVERWRITE TABLE price_history_s3
...
Without using any a combined format. The number of mappers _is_ determined
by
Hi All,
I am using hive 0.7 on Amazon EMR. I need to merge a large number of small
files into a few larger files( basically merging a number of partitions for
a table into one). On doing the obvious query, i.e.( insert into a new
partition select * from all partitions), a large number of small file