Re: RFC: Cassandra Virtual Nodes

Zhu Han Thu, 22 Mar 2012 19:04:47 -0700

On Fri, Mar 23, 2012 at 6:54 AM, Peter Schuller <peter.schul...@infidyne.com
> wrote:


> > You would have to iterate through all sstables on the system to repair
> one
> > vnode, yes: but building the tree for just one range of the data means
> that
> > huge portions of the sstables files can be skipped. It should scale down
> > linearly as the number of vnodes increases (ie, with 100 vnodes, it will
> > take 1/100th the time to repair one vnode).
>

The SSTable indices should still be scanned for size tiered compaction.
Do I miss anything here?


> The story is less good for "nodetool cleanup" however, which still has
> to truck over the entire dataset.
>
> (The partitions/buckets in my crush-inspired scheme addresses this by
> allowing that each ring segment, in vnode terminology, be stored
> separately in the file system.)
>

But the number of files can be a big problem if there are hundreds of
vnodes and millions of sstables
on the same physical node.

We need a way to pin sstable inode to memory.  Otherwise,
it's possible the average number of disk IO to access a row in a sstable
could
be five or more.


>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>

Re: RFC: Cassandra Virtual Nodes

Reply via email to