The number of map tasks for a job is a function of the InputFormat, which in the case of ColumnInputFormat is a function of the global number of keys in Cassandra. The number of concurrent maps being executed at any given time per TaskTracker (per node) is set by mapred.tasktracker.reduce.tasks.maximum. j
On Fri, May 7, 2010 at 9:57 AM, Joseph Stein <crypt...@gmail.com> wrote: > you can manage the number of map tasks by node > > mapred.tasktracker.map.tasks.maximum=1 > > > On Fri, May 7, 2010 at 9:53 AM, gabriele renzi <rff....@gmail.com> wrote: >> On Fri, May 7, 2010 at 2:44 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >>> Sounds like you need to configure Hadoop to not create a whole bunch >>> of Map tasks at once >> >> interesting, from a quick check it seems there are a dozen threads running. >> Yet , setNumMapTasks seems to be deprecated (together with JobConf) >> and while I guess >> -Dmapred.map.tasks=N >> may still work, it seems that so it seems the only way to manage the >> number of map tasks is via a custom subclass of >> ColumnFamilyInputFormat. >> >> But of course you have a point that in a single box this does not add >> anything. >> > > > > -- > /* > Joe Stein > http://www.linkedin.com/in/charmalloc > */ >