I still don't see the hole in the following reasoning:

- Input splits are 64k by default.  At this size, map processing time
dominates job creation.
- Therefore, if job creation time dominates, you have a toy data set
(< 64K * 256 vnodes = 16 MB)

Adding complexity to our inputformat to improve performance for this
niche does not sound like a good idea to me.

On Thu, Mar 28, 2013 at 8:40 AM, cem <cayiro...@gmail.com> wrote:
> Hi Alicia ,
>
> Cassandra input format creates mappers as many as vnodes. It is a known
> issue. You need to lower the number of vnodes :(
>
> I have a simple solution for that and ready to write a patch. Should I
> create a ticket about that? I don't know the procedure about that.
>
>  Regards,
> Cem
>
>
> On Thu, Mar 28, 2013 at 2:30 PM, Alicia Leong <lccali...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I have 3 nodes of Cassandra 1.2.3 & edited the cassandra.yaml for vnodes.
>>
>> When I execute a M/R job .. the console showed HUNDRED of Map tasks.
>>
>> May I know, is the normal since is vnodes?  If yes, this have slow the M/R
>> job to finish/complete.
>>
>>
>> Thanks
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Reply via email to