I’ll take a slightly different position - people who never expect to change the 
cluster shouldn’t care which they’re using, people who want to grow by 10-20% 
often should probably use vnodes, everyone else can probably figure out how to 
get by with single token, with the caveat that they’ll probably just double 
their cluster when they want to grow (until they’re super advanced, in which 
case they’ll automate token moves to rebalance). 

Single token is easier to reason about in almost every situation, and full 
sstable streaming is now fast enough that you probably won’t miss vnodes, which 
were hugely valuable at making bootstrap and decom faster when streaming was 
slow by increasing parallelization. 

The right solution here is to get rid of the hash table, eventually, but until 
we do, the main benefit of vnodes is sliglhty better bandwidth for bootstrap 
and slightly better balance when you do incremental expansions. Slightly. 





> On Oct 8, 2024, at 5:20 PM, Jordan West <jw...@apache.org> wrote:
> 
> Hi Long,
> 
> This is the best resource on understanding tokens per node and their impact 
> on operations / availability: 
> https://jolynch.github.io/pdf/cassandra-availability-virtual.pdf
> 
> I am one of those users that used a single token. It does make certain 
> operations simpler but it comes with a cost: changing cluster topology 
> outside of doubling takes significant expertise. 
> 
> It’s also important to factor in the intertia of decisions. Those companies 
> opted for single token when vnodes were nascent and buggy or didn’t exist. 
> 
> My recommendation these days is to use vnodes with a small number of tokens 
> per node. I prefer 4 but would say going as high as 16 is reasonable. The 
> paper does a better job of describing why. I wouldn’t go higher because many 
> operations are on the order of the number of tokens in the cluster and that 
> overhead can be problematic. 
> 
> Jordan 
> 
> On Mon, Oct 7, 2024 at 17:37 Long Pan <panlong...@gmail.com 
> <mailto:panlong...@gmail.com>> wrote:
>> Hi Cassandra Community,
>> 
>> I’m currently exploring the use of single vnode (single token) per node in 
>> large-scale Cassandra deployments. I've come across discussions suggesting 
>> that some heavy users like Apple and Netflix have opted for this 
>> configuration to simplify operations and achieve more predictable 
>> performance.
>> 
>> I’d like to ask if anyone could point me to resources (blog posts, 
>> conference talks, case studies or even personal experiences) that dive 
>> deeper into:
>> 
>> The rationale behind using a single vnode instead of multiple vnodes.
>> The operational benefits and any potential trade-offs encountered.
>> Thank you in advance for your insights and any pointers you can provide!
>> 
>> Best regards,
>> Long
>> 

Reply via email to