On Fri, Feb 15, 2013 at 7:01 PM, Edward Capriolo <edlinuxg...@gmail.com> wrote: > Seems like the hadoop Input format should combine the splits that are > on the same node into the same map task, like Hadoop's > CombinedInputFormat can. I am not sure who recommends vnodes as the > default, because this is now the second problem (that I know of) of > this class where vnodes has extra overhead, > https://issues.apache.org/jira/browse/CASSANDRA-5161 > > This seems to be the standard operating practice in c* now, enable > things in the default configuration like new partitioners and newer > features like vnodes, even though they are not heavily tested in the > wild or well understood, then deal with fallout.
Except that it is not in fact enabled by default; The default remains 1-token-per-node. That said, the only way that a feature like this will ever be heavily tested in the wild, and well understood, is if it is actually put to use. Speaking only for myself, I am grateful to users like Cem who test new features and report the issues they find. > On Fri, Feb 15, 2013 at 11:52 AM, cem <cayiro...@gmail.com> wrote: >> Hi All, >> >> I have just started to use virtual nodes. I set the number of nodes to 256 >> as recommended. >> >> The problem that I have is when I run a mapreduce job it creates node * 256 >> mappers. It creates node * 256 splits. this effects the performance since >> the range queries have a lot of overhead. >> >> Any suggestion to improve the performance? It seems like I need to lower the >> number of virtual nodes. >> >> Best Regards, >> Cem >> >> -- Eric Evans Acunu | http://www.acunu.com | @acunu