Re: Resources on Using Single Vnode in Cassandra

2024-10-08 Thread Jeff Jirsa
I’ll take a slightly different position - people who never expect to change the 
cluster shouldn’t care which they’re using, people who want to grow by 10-20% 
often should probably use vnodes, everyone else can probably figure out how to 
get by with single token, with the caveat that they’ll probably just double 
their cluster when they want to grow (until they’re super advanced, in which 
case they’ll automate token moves to rebalance). 

Single token is easier to reason about in almost every situation, and full 
sstable streaming is now fast enough that you probably won’t miss vnodes, which 
were hugely valuable at making bootstrap and decom faster when streaming was 
slow by increasing parallelization. 

The right solution here is to get rid of the hash table, eventually, but until 
we do, the main benefit of vnodes is sliglhty better bandwidth for bootstrap 
and slightly better balance when you do incremental expansions. Slightly. 





> On Oct 8, 2024, at 5:20 PM, Jordan West  wrote:
> 
> Hi Long,
> 
> This is the best resource on understanding tokens per node and their impact 
> on operations / availability: 
> https://jolynch.github.io/pdf/cassandra-availability-virtual.pdf
> 
> I am one of those users that used a single token. It does make certain 
> operations simpler but it comes with a cost: changing cluster topology 
> outside of doubling takes significant expertise. 
> 
> It’s also important to factor in the intertia of decisions. Those companies 
> opted for single token when vnodes were nascent and buggy or didn’t exist. 
> 
> My recommendation these days is to use vnodes with a small number of tokens 
> per node. I prefer 4 but would say going as high as 16 is reasonable. The 
> paper does a better job of describing why. I wouldn’t go higher because many 
> operations are on the order of the number of tokens in the cluster and that 
> overhead can be problematic. 
> 
> Jordan 
> 
> On Mon, Oct 7, 2024 at 17:37 Long Pan  > wrote:
>> Hi Cassandra Community,
>> 
>> I’m currently exploring the use of single vnode (single token) per node in 
>> large-scale Cassandra deployments. I've come across discussions suggesting 
>> that some heavy users like Apple and Netflix have opted for this 
>> configuration to simplify operations and achieve more predictable 
>> performance.
>> 
>> I’d like to ask if anyone could point me to resources (blog posts, 
>> conference talks, case studies or even personal experiences) that dive 
>> deeper into:
>> 
>> The rationale behind using a single vnode instead of multiple vnodes.
>> The operational benefits and any potential trade-offs encountered.
>> Thank you in advance for your insights and any pointers you can provide!
>> 
>> Best regards,
>> Long
>> 



Re: Resources on Using Single Vnode in Cassandra

2024-10-08 Thread guo Maxwell
I think cost is a very important point if you are going to use *single**
token i*f your cluster will be very large , because every time the cluster
is expanded, the nodes need to be doubled.100 -> 200, 200->400 ...
This is one of the reasons why we maintain many small clusters.

of course its availability will be better .

Abe Ratnofsky  于2024年10月9日周三 11:56写道:

> Here’s the best post I’m aware of:
> https://jolynch.github.io/pdf/cassandra-availability-virtual.pdf
>
> On Oct 7, 2024, at 17:30, Long Pan  wrote:
>
> 
>
> Hi Cassandra Community,
>
> I’m currently exploring the use of *single vnode* (single token) per node
> in large-scale Cassandra deployments. I've come across discussions
> suggesting that some heavy users like Apple and Netflix have opted for this
> configuration to simplify operations and achieve more predictable
> performance.
>
> I’d like to ask if anyone could point me to *resources* (blog posts,
> conference talks, case studies or even personal experiences) that dive
> deeper into:
>
>- The *rationale* behind using a single vnode instead of multiple
>vnodes.
>- The *operational benefits* and any potential trade-offs encountered.
>
> Thank you in advance for your insights and any pointers you can provide!
>
> Best regards,
> Long
>
>


Tombstone Generation in Cassandra 4.1.3 Despite No Update/Delete Operations

2024-10-08 Thread Naman kaushik
Hi Community,

We are currently using Cassandra version 4.1.3 and have encountered an
issue related to tombstone generation. We have two tables storing monthly
data: table_september and table_october. Each table has a TTL of 30 days.

For the month of October, data is being inserted into the table_october,
and we are seeing the following warning at the start of the month:

*WARN  [CompactionExecutor:22030] 2024-10-07 16:37:16,376
BigTableWriter.java:274 - Writing 102594 tombstones to
**table_october*

*Here are a few things to note*:

   - No update or delete operations are being performed on the table.
   - TTL is correctly set to 30 days, and the data being inserted is within
   this time range, so the TTL shouldn't be the reason for tombstones.
   - No null values are being inserted in any column.

We are still seeing tombstones being generated for the October table. Does
anyone have insights into what could be causing these tombstones, or how we
can prevent this from happening?

Any help would be greatly appreciated!

Thanks in advance.


Re: Resources on Using Single Vnode in Cassandra

2024-10-08 Thread Jeff Jirsa
You don’t have to double. You can add 1 node at a time - you just have to move every other token to stay balancedMost people don’t write the tooling to do that, but it’s not that complicatedCalculate the token positions with N nodesCalculate the token positions with N+1 nodes Bootstrap the new machine at whichever N+1 token is furthest from an existing token For each existing node:     Run cleanup     Move node to the new token Run cleanup again It’s involved but straight forward, online, and safe. Because there’s only one token per node you can bootstrap/move in batches (offset by 2x RF - so if you have 100 machines and RF=3, you can have 16 machines bootstrapping or moving at the same time). You can’t do that safely with vnodes. On Oct 9, 2024, at 12:51 AM, guo Maxwell  wrote:I think cost is a very important point if you are going to use single token if your cluster will be very large , because every time the cluster is expanded, the nodes need to be doubled.100 -> 200, 200->400 ... This is one of the reasons why we maintain many small clusters.of course its availability will be better . Abe Ratnofsky  于2024年10月9日周三 11:56写道:Here’s the best post I’m aware of: https://jolynch.github.io/pdf/cassandra-availability-virtual.pdfOn Oct 7, 2024, at 17:30, Long Pan  wrote:Hi Cassandra Community,I’m currently exploring the use of single vnode (single token) per node in large-scale Cassandra deployments. I've come across discussions suggesting that some heavy users like Apple and Netflix have opted for this configuration to simplify operations and achieve more predictable performance.I’d like to ask if anyone could point me to resources (blog posts, conference talks, case studies or even personal experiences) that dive deeper into:The rationale behind using a single vnode instead of multiple vnodes.The operational benefits and any potential trade-offs encountered.Thank you in advance for your insights and any pointers you can provide!Best regards,Long



Re: Tombstone Generation in Cassandra 4.1.3 Despite No Update/Delete Operations

2024-10-08 Thread Jon Haddad
Are you using collections?

—
Jon Haddad
Rustyrazorblade Consulting
rustyrazorblade.com


On Tue, Oct 8, 2024 at 10:52 PM Naman kaushik 
wrote:

> Hi Community,
>
> We are currently using Cassandra version 4.1.3 and have encountered an
> issue related to tombstone generation. We have two tables storing monthly
> data: table_september and table_october. Each table has a TTL of 30 days.
>
> For the month of October, data is being inserted into the table_october,
> and we are seeing the following warning at the start of the month:
>
> *WARN  [CompactionExecutor:22030] 2024-10-07 16:37:16,376 
> BigTableWriter.java:274 - Writing 102594 tombstones to **table_october*
>
> *Here are a few things to note*:
>
>- No update or delete operations are being performed on the table.
>- TTL is correctly set to 30 days, and the data being inserted is
>within this time range, so the TTL shouldn't be the reason for tombstones.
>- No null values are being inserted in any column.
>
> We are still seeing tombstones being generated for the October table. Does
> anyone have insights into what could be causing these tombstones, or how we
> can prevent this from happening?
>
> Any help would be greatly appreciated!
>
> Thanks in advance.
>


Re: Resources on Using Single Vnode in Cassandra

2024-10-08 Thread Jordan West
Hi Long,

This is the best resource on understanding tokens per node and their impact
on operations / availability:
https://jolynch.github.io/pdf/cassandra-availability-virtual.pdf

I am one of those users that used a single token. It does make certain
operations simpler but it comes with a cost: changing cluster topology
outside of doubling takes significant expertise.

It’s also important to factor in the intertia of decisions. Those companies
opted for single token when vnodes were nascent and buggy or didn’t exist.

My recommendation these days is to use vnodes with a small number of tokens
per node. I prefer 4 but would say going as high as 16 is reasonable. The
paper does a better job of describing why. I wouldn’t go higher because
many operations are on the order of the number of tokens in the cluster and
that overhead can be problematic.

Jordan

On Mon, Oct 7, 2024 at 17:37 Long Pan  wrote:

> Hi Cassandra Community,
>
> I’m currently exploring the use of *single vnode* (single token) per node
> in large-scale Cassandra deployments. I've come across discussions
> suggesting that some heavy users like Apple and Netflix have opted for this
> configuration to simplify operations and achieve more predictable
> performance.
>
> I’d like to ask if anyone could point me to *resources* (blog posts,
> conference talks, case studies or even personal experiences) that dive
> deeper into:
>
>- The *rationale* behind using a single vnode instead of multiple
>vnodes.
>- The *operational benefits* and any potential trade-offs encountered.
>
> Thank you in advance for your insights and any pointers you can provide!
>
> Best regards,
> Long
>


Re: Resources on Using Single Vnode in Cassandra

2024-10-08 Thread Abe Ratnofsky
Here’s the best post I’m aware of: 
https://jolynch.github.io/pdf/cassandra-availability-virtual.pdf

> On Oct 7, 2024, at 17:30, Long Pan  wrote:
> 
> 
> Hi Cassandra Community,
> 
> I’m currently exploring the use of single vnode (single token) per node in 
> large-scale Cassandra deployments. I've come across discussions suggesting 
> that some heavy users like Apple and Netflix have opted for this 
> configuration to simplify operations and achieve more predictable performance.
> 
> I’d like to ask if anyone could point me to resources (blog posts, conference 
> talks, case studies or even personal experiences) that dive deeper into:
> 
> The rationale behind using a single vnode instead of multiple vnodes.
> The operational benefits and any potential trade-offs encountered.
> Thank you in advance for your insights and any pointers you can provide!
> 
> Best regards,
> Long