Seems like the hadoop Input format should combine the splits that are
on the same node into the same map task, like Hadoop's
CombinedInputFormat can. I am not sure who recommends vnodes as the
default, because this is now the second problem (that I know of) of
this class where vnodes has extra overhead,
https://issues.apache.org/jira/browse/CASSANDRA-5161

This seems to be the standard operating practice in c* now, enable
things in the default configuration like new partitioners and newer
features like vnodes, even though they are not heavily tested in the
wild or well understood, then deal with fallout.


On Fri, Feb 15, 2013 at 11:52 AM, cem <cayiro...@gmail.com> wrote:
> Hi All,
>
> I have just started to use virtual nodes. I set the number of nodes to 256
> as recommended.
>
> The problem that I have is when I run a mapreduce job it creates node * 256
> mappers. It creates node * 256 splits. this effects the performance since
> the range queries have a lot of overhead.
>
> Any suggestion to improve the performance? It seems like I need to lower the
> number of virtual nodes.
>
> Best Regards,
> Cem
>
>

Reply via email to