Yes but my point, is with 50 map slots you can only be processing 50 at
once. So it will take 1000/50 "waves" of mappers to complete the job.


On Fri, Mar 29, 2013 at 11:46 AM, Jonathan Ellis <jbel...@gmail.com> wrote:

> My point is that if you have over 16MB of data per node, you're going
> to get thousands of map tasks (that is: hundreds per node) with or
> without vnodes.
>
> On Fri, Mar 29, 2013 at 9:42 AM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
> > Every map reduce task typically has a minimum Xmx of 256MB memory. See
> > mapred.child.java.opts...
> > So if you have a 10 node cluster with 256 vnodes... You will need to
> spawn
> > 2,560 map tasks to complete a job.
> > And a 10 node hadoop cluster with 5 map slotes a node... You have 50 map
> > slots.
> >
> > Wouldnt it be better if the input format spawned 10 map tasks instead of
> > 2,560?
> >
> >
> > On Fri, Mar 29, 2013 at 10:28 AM, Jonathan Ellis <jbel...@gmail.com>
> wrote:
> >>
> >> I still don't see the hole in the following reasoning:
> >>
> >> - Input splits are 64k by default.  At this size, map processing time
> >> dominates job creation.
> >> - Therefore, if job creation time dominates, you have a toy data set
> >> (< 64K * 256 vnodes = 16 MB)
> >>
> >> Adding complexity to our inputformat to improve performance for this
> >> niche does not sound like a good idea to me.
> >>
> >> On Thu, Mar 28, 2013 at 8:40 AM, cem <cayiro...@gmail.com> wrote:
> >> > Hi Alicia ,
> >> >
> >> > Cassandra input format creates mappers as many as vnodes. It is a
> known
> >> > issue. You need to lower the number of vnodes :(
> >> >
> >> > I have a simple solution for that and ready to write a patch. Should I
> >> > create a ticket about that? I don't know the procedure about that.
> >> >
> >> >  Regards,
> >> > Cem
> >> >
> >> >
> >> > On Thu, Mar 28, 2013 at 2:30 PM, Alicia Leong <lccali...@gmail.com>
> >> > wrote:
> >> >>
> >> >> Hi All,
> >> >>
> >> >> I have 3 nodes of Cassandra 1.2.3 & edited the cassandra.yaml for
> >> >> vnodes.
> >> >>
> >> >> When I execute a M/R job .. the console showed HUNDRED of Map tasks.
> >> >>
> >> >> May I know, is the normal since is vnodes?  If yes, this have slow
> the
> >> >> M/R
> >> >> job to finish/complete.
> >> >>
> >> >>
> >> >> Thanks
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder, http://www.datastax.com
> >> @spyced
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>

Reply via email to