Hi, Number of mappers will depend on some factors like,
1.The number of mappers is defined based on our data size (By default set to one per HDFS block). 2.The number of mappers depends on amount of InputSplit generated by InputFormat#getInputSplits method. In particular FileInputSplit splits input directory in respect to blocks and files. Ex: Two files: f1 [ block1, block2], f2 [block3, block4] becomes 4 mappers f1(offset of block1), f1(offset of block2), f2(offest of block3), f2(offset of block4) Other InputFormat has its own methods for files splitting (for example Hbase splits input on region boundaries). Hope It Helps, Chinna On Fri, Mar 28, 2014 at 4:47 PM, Amjad ALSHABANI <ashshab...@gmail.com>wrote: > Hello All, > I have tow tables created in Hive, one read the data immidiately from > Cassandra DB and the other takes the data from stored files (data already > exported in somewhow from the same Cassandra DB) > > Both tables are identical (in data) > but when running the same request it gives me very different number of > mapper for each of them (P.S I m using the same hive config for both > requests) > > *hive -e " select count(1) from keyring.cred"* > > will take 2338 mapper and 1 reducer > > while: > > *hive -e "select count(1) from keyring.cred_seq"* > > will take just 151 mapper and 1 reducer > > Any idea how to minimize this number, and where does this explosion in > mapper number come from?? > > any response will be appreciated :) > > Cheers!! > > Amjad > >