For some reason, I can't decrease the number of mappers in Hive (0.12) and 
Hadoop 2.2. I believe I was able to do that in 0.10.

My table has 170K rows and 2000 small (20KB) uncompressed files (I'll try to 
make Hive merge these small files in the future).

The relevant Hive settings are below:

hive> SET hive.input.format;
hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
hive> SET mapreduce.input.fileinputformat.split.maxsize;
mapreduce.input.fileinputformat.split.maxsize=1073741824
hive> SET hive.hadoop.supports.splittable.combineinputformat;
hive.hadoop.supports.splittable.combineinputformat=true
hive> SET mapred.max.split.size;
mapred.max.split.size=1073741824

When I run select count(1), I get 658 mappers (one for every 3 files?):
Hadoop job information for Stage-1: number of mappers: 658; number of reducers: 
1

The table is regular and uncompressed:

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No

What am I missing?

Thanks!

Reply via email to