Yes but my point, is with 50 map slots you can only be processing 50 at once. So it will take 1000/50 "waves" of mappers to complete the job.
On Fri, Mar 29, 2013 at 11:46 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > My point is that if you have over 16MB of data per node, you're going > to get thousands of map tasks (that is: hundreds per node) with or > without vnodes. > > On Fri, Mar 29, 2013 at 9:42 AM, Edward Capriolo <edlinuxg...@gmail.com> > wrote: > > Every map reduce task typically has a minimum Xmx of 256MB memory. See > > mapred.child.java.opts... > > So if you have a 10 node cluster with 256 vnodes... You will need to > spawn > > 2,560 map tasks to complete a job. > > And a 10 node hadoop cluster with 5 map slotes a node... You have 50 map > > slots. > > > > Wouldnt it be better if the input format spawned 10 map tasks instead of > > 2,560? > > > > > > On Fri, Mar 29, 2013 at 10:28 AM, Jonathan Ellis <jbel...@gmail.com> > wrote: > >> > >> I still don't see the hole in the following reasoning: > >> > >> - Input splits are 64k by default. At this size, map processing time > >> dominates job creation. > >> - Therefore, if job creation time dominates, you have a toy data set > >> (< 64K * 256 vnodes = 16 MB) > >> > >> Adding complexity to our inputformat to improve performance for this > >> niche does not sound like a good idea to me. > >> > >> On Thu, Mar 28, 2013 at 8:40 AM, cem <cayiro...@gmail.com> wrote: > >> > Hi Alicia , > >> > > >> > Cassandra input format creates mappers as many as vnodes. It is a > known > >> > issue. You need to lower the number of vnodes :( > >> > > >> > I have a simple solution for that and ready to write a patch. Should I > >> > create a ticket about that? I don't know the procedure about that. > >> > > >> > Regards, > >> > Cem > >> > > >> > > >> > On Thu, Mar 28, 2013 at 2:30 PM, Alicia Leong <lccali...@gmail.com> > >> > wrote: > >> >> > >> >> Hi All, > >> >> > >> >> I have 3 nodes of Cassandra 1.2.3 & edited the cassandra.yaml for > >> >> vnodes. > >> >> > >> >> When I execute a M/R job .. the console showed HUNDRED of Map tasks. > >> >> > >> >> May I know, is the normal since is vnodes? If yes, this have slow > the > >> >> M/R > >> >> job to finish/complete. > >> >> > >> >> > >> >> Thanks > >> > > >> > > >> > >> > >> > >> -- > >> Jonathan Ellis > >> Project Chair, Apache Cassandra > >> co-founder, http://www.datastax.com > >> @spyced > > > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder, http://www.datastax.com > @spyced >