Re: Upgrade strategy for high number of nodes

Shishir Kumar Fri, 29 Nov 2019 20:58:24 -0800

Some more background. We are planning (tested) binary upgrade across all
nodes without downtime. As next step running upgradesstables. As C*
file format and version (from format big, version mc to format bti, version
aa (Refer
https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/tools/toolsSStables/ToolsSSTableupgrade.html
- upgrade from DSE 5.1 to 6.x). Underlying changes explains why it takes
too much time to upgrade.
Running  upgradesstables  in parallel across RAC - This is where I am not
sure on impact of running in parallel (document recommends to run one node
at time). During upgradesstables there are scenario's where it report file
corruption, hence require corrective step I.e. scrub. Due to file
corruption at times nodes goes down due to sstable corruption or result in
high CPU usage ~100%. Performing above in parallel *without downtime* might
result in more inconsistency across nodes. This scenario have not tested,
so will need group help in case they have done similar upgrade in
past (I.e. scenario's/complexity which needs to be considered and why
guideline recommend to run upgradesstable one node at time).
-Shishir


On Fri, Nov 29, 2019 at 11:52 PM Josh Snyder <j...@code406.com> wrote:

> Hello Shishir,
>
> It shouldn't be necessary to take downtime to perform upgrades of a
> Cassandra cluster. It sounds like the biggest issue you're facing is the
> upgradesstables step. upgradesstables is not strictly necessary before a
> Cassandra node re-enters the cluster to serve traffic; in my experience it
> is purely for optimizing the performance of the database once the software
> upgrade is complete. I recommend trying out an upgrade in a test
> environment without using upgradesstables, which should bring the 5 hours
> per node down to just a few minutes.
>
> If you're running NetworkTopologyStrategy and you want to optimize
> further, you could consider performing the upgrade on multiple nodes within
> the same rack in parallel. When correctly configured,
> NetworkTopologyStrategy can protect your database from an outage of an
> entire rack. So performing an upgrade on a few nodes at a time within a
> rack is the same as a partial rack outage, from the database's perspective.
>
> Have a nice upgrade!
>
> Josh
>
> On Fri, Nov 29, 2019 at 7:22 AM Shishir Kumar <shishirroy2...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Need input on cassandra upgrade strategy for below:
>> 1. We have Datacenter across 4 geography (multiple isolated deployments
>> in each DC).
>> 2. Number of Cassandra nodes in each deployment is between 6 to 24
>> 3. Data volume on each nodes between 150 to 400 GB
>> 4. All production environment has DR set up
>> 5. During upgrade we do not want downtime
>>
>> We are planning to go for stack upgrade but upgradesstables is taking
>> approx. 5 hours per node (if data volume is approx 200 GB).
>> Options-
>> No downtime - As per recommendation (DataStax documentation) if we plan
>> to upgrade one node at time I.e. in sequence upgrade cycle for one
>> environment will take weeks, so DevOps concern.
>> Read Only (No downtime) - Route read only load to DR system. We have
>> resilience built up to take care of mutation scenarios. But incase it takes
>> more than say 3-4 hours, there will be long catch up exercise. Maintenance
>> cost seems too high due to unknowns
>> Downtime- Can upgrade all nodes in parallel as no live customers. This
>> has direct Customer impact, so need to convince on maintenance cost vs
>> customer impact.
>> Please suggest how other Organisation are solving this scenario (whom
>> have 100+ nodes)
>>
>> Regards
>> Shishir
>>
>>

Re: Upgrade strategy for high number of nodes

Reply via email to