Re: Number of Mapper when reading from Cassandra

Chinna Rao Lalam Fri, 28 Mar 2014 06:18:46 -0700

Hi,

Number of mappers will depend on some factors like,

1.The number of mappers is defined based on our data size (By default set
to one per HDFS block).

2.The number of mappers depends on amount of InputSplit generated by
InputFormat#getInputSplits method. In particular FileInputSplit splits
input directory in respect to blocks and files.

Ex:

Two files:
f1 [ block1, block2],
f2 [block3, block4]
becomes 4 mappers
f1(offset of block1),
f1(offset of block2),
f2(offest of block3),
f2(offset of block4)
Other InputFormat has its own methods for files splitting (for example
Hbase splits input on region boundaries).

Hope It Helps,
Chinna

On Fri, Mar 28, 2014 at 4:47 PM, Amjad ALSHABANI <ashshab...@gmail.com>wrote:

> Hello All,
> I have tow tables created in Hive, one  read the data immidiately from
> Cassandra DB and the other takes the data from stored files (data already
> exported in somewhow from the same Cassandra DB)
>
> Both tables are identical (in data)
> but when running the same request it gives me very different number of
> mapper for each of them (P.S I m using the same hive config for both
> requests)
>
> *hive -e " select count(1) from keyring.cred"*
>
> will take 2338 mapper and 1 reducer
>
> while:
>
> *hive -e "select count(1) from keyring.cred_seq"*
>
> will take just 151 mapper and 1 reducer
>
> Any idea how to minimize this number, and where does this explosion in
> mapper number come from??
>
> any response will be appreciated :)
>
> Cheers!!
>
> Amjad
>
>

Re: Number of Mapper when reading from Cassandra

Reply via email to