On Fri, Nov 2, 2012 at 12:38 AM, Manu Zhang <owenzhang1...@gmail.com> wrote: >> It splits into a contiguous range, because truly upgrading to vnode >> functionality is another step. > > That confuses me. As I understand it, there is no point in having 256 tokens > on same node if I don't commit the shuffle
This isn't exactly true. By-partition operations (think repair, streaming, etc) will be more reliable in the sense that if they fail and need to be restarted, there is less that is lost/needs redoing. Also, if all you did was migrate from 1-token-per-node to 256 contiguous tokens per node, normal topology changes (bootstrapping new nodes, decommissioning old ones), would gradually work to redistribute the partitions. And, from a topology perspective, splitting the one partition into many contiguous partition is a no-op; it's safe to do and there is no cost to speak of from a computational or IO perspective. On the other hand, shuffling requires moving tokens around the cluster. If you completely randomize placement, it follows that you will need to relocate all of the clusters data, so it's quite costly. It's also precedent setting, and not thoroughly tested yet. -- Eric Evans Acunu | http://www.acunu.com | @acunu