I still don't see the hole in the following reasoning: - Input splits are 64k by default. At this size, map processing time dominates job creation. - Therefore, if job creation time dominates, you have a toy data set (< 64K * 256 vnodes = 16 MB)
Adding complexity to our inputformat to improve performance for this niche does not sound like a good idea to me. On Thu, Mar 28, 2013 at 8:40 AM, cem <cayiro...@gmail.com> wrote: > Hi Alicia , > > Cassandra input format creates mappers as many as vnodes. It is a known > issue. You need to lower the number of vnodes :( > > I have a simple solution for that and ready to write a patch. Should I > create a ticket about that? I don't know the procedure about that. > > Regards, > Cem > > > On Thu, Mar 28, 2013 at 2:30 PM, Alicia Leong <lccali...@gmail.com> wrote: >> >> Hi All, >> >> I have 3 nodes of Cassandra 1.2.3 & edited the cassandra.yaml for vnodes. >> >> When I execute a M/R job .. the console showed HUNDRED of Map tasks. >> >> May I know, is the normal since is vnodes? If yes, this have slow the M/R >> job to finish/complete. >> >> >> Thanks > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder, http://www.datastax.com @spyced