On Fri, Mar 23, 2012 at 6:54 AM, Peter Schuller <peter.schul...@infidyne.com > wrote:
> > You would have to iterate through all sstables on the system to repair > one > > vnode, yes: but building the tree for just one range of the data means > that > > huge portions of the sstables files can be skipped. It should scale down > > linearly as the number of vnodes increases (ie, with 100 vnodes, it will > > take 1/100th the time to repair one vnode). > The SSTable indices should still be scanned for size tiered compaction. Do I miss anything here? > The story is less good for "nodetool cleanup" however, which still has > to truck over the entire dataset. > > (The partitions/buckets in my crush-inspired scheme addresses this by > allowing that each ring segment, in vnode terminology, be stored > separately in the file system.) > But the number of files can be a big problem if there are hundreds of vnodes and millions of sstables on the same physical node. We need a way to pin sstable inode to memory. Otherwise, it's possible the average number of disk IO to access a row in a sstable could be five or more. > > -- > / Peter Schuller (@scode, http://worldmodscode.wordpress.com) >