> > Does the new scheme still require the node to re-iterate all sstables to > build the merkle tree or streaming data for partition level > repair and move?
You would have to iterate through all sstables on the system to repair one vnode, yes: but building the tree for just one range of the data means that huge portions of the sstables files can be skipped. It should scale down linearly as the number of vnodes increases (ie, with 100 vnodes, it will take 1/100th the time to repair one vnode). On Thu, Mar 22, 2012 at 5:46 AM, Zhu Han <schumi....@gmail.com> wrote: > On Thu, Mar 22, 2012 at 6:20 PM, Richard Low <r...@acunu.com> wrote: > > > On 22 March 2012 05:48, Zhu Han <schumi....@gmail.com> wrote: > > > > > I second it. > > > > > > Is there some goals we missed which can not be achieved by assigning > > > multiple tokens to a single node? > > > > This is exactly the proposed solution. The discussion is about how to > > implement this, and the methods of choosing tokens and replication > > strategy. > > > > Does the new scheme still require the node to re-iterate all sstables to > build the merkle tree or streaming data for partition level > repair and move? > > The disk IO triggered by above steps could be very time-consuming if the > dataset on single node is very large. It could be much more costly than > the network IO, especially when concurrent repair tasks hit the same > node. > > Is there any good ideas on it? > > > > Richard. > > >