Hi Jeff,
/When you’re upgrading or rebuilding you want all copies on the same
version with proper sstables . So either add GCP then upgrade to 4.0
or upgrade to 4.0 and then expand to GCP. Don’t do them at the same time./
I think I forgot to mention one thing that after completion of step 1
our GCP data center will be added with rebuild done on all nodes. So
our complete cluster would be on 3.0.9 after step 1. Will change
num_tokens from current 256 to 16 in GCP data center in this step only.
DC1 -
5nodes (physical) - version 3.0.9
numtokens256
DC2 -
5nodes (GCP) - version 3.0.9
numtokens16
Rest all step from 2-5 are meant for upgradation in which I am
planning to go DC wise upgradation and running upgradesstables on GCP
first.
DC1 -
5nodes (physical) - version 3.0.9
numtokens256
DC2 -
5nodes (GCP) - version 4.0.0
numtokens16
Since I won't be needing physical DC anymore so instead of upgrading
it I will simply discard that DC
Regards,
Ashish
On Mon, Sep 6, 2021, 7:31 AM Jeff Jirsa <jji...@gmail.com
<mailto:jji...@gmail.com>> wrote:
In-line
On Sep 3, 2021, at 11:12 AM, MyWorld <timeplus.1...@gmail.com
<mailto:timeplus.1...@gmail.com>> wrote:
Hi Jeff,
Thanks for your response.
To answer your question, Yes, we have created dev environment by
restoring them from snapshot/CSV files.
Just one follow up question, I have a 5-node single DC on
production on version 3.0.9on physical server.
We are planning to migrate to GCP along with upgradation using
below steps.
1. Setup GCP data center with same version 3.0.9 and rebuild
complete data
2. Now install and configure 4.0 version in new GCP data center
on all 5 nodes
3. Stop version 3.0.9 and start 4.0 on all 5 nodes of GCP one by one
4. Run upgradesstables one by one on all 5 nodes of GCP
5.Later move read/write traffic to GCP and remove old datacenter
which is still on version 3.0.9
Please guide on few things:
1. Is the above mention approach right?
When you’re upgrading or rebuilding you want all copies on the
same version with proper sstables . So either add GCP then upgrade
to 4.0 or upgrade to 4.0 and then expand to GCP. Don’t do them at
the same time.
2. OR should we update 4.0 on only one node on GCP at a time and
run upgrade sstables on just one node first
I usually do upgradesstables after all bounces are done
The only exception is perhaps doing upgradesstables with exactly
one copy via backup/restore to make sure 4.0 works with your data
files, which it sounds like you’ve already done.
3. OR should we migrate to GCP first and then think of upgrade
4.0 later
4. OR Is there any reason I should upgrade to 3.11.x first
Not 3.11 but maybe latest 3.0 instead
Regards,
Ashish
On Fri, Sep 3, 2021, 11:11 PM Jeff Jirsa <jji...@gmail.com
<mailto:jji...@gmail.com>> wrote:
On Fri, Sep 3, 2021 at 10:33 AM MyWorld
<timeplus.1...@gmail.com <mailto:timeplus.1...@gmail.com>> wrote:
Hi all,
We are doing a POC on dev environment to upgrade apache
cassandra 3.0.9 to 4.0.0. We have the below setup
currently on cassandra 3.0.9
DC1 - GCP(india) - 1 node
DC2 - GCP(US) - 1 node
3.0.9 is very old. It's got older version of data files and
some known correctness bugs.
For upgradation, we carried out below steps on DC2 -
GCP(US) node:
Step1. Install apache cassandra 4.0.0
Step2. Did all Configuration settings
Step3. Stop apache cassandra 3.0.9
Step4. Start apache cassandra 4.0.0 and monitor logs
Step5. Run nodetool upgradesstables and monitor logs
After monitoring logs, I had below observations:
*1. Initially during bootstrap at Step4, received below
exceptions:*
a) Exception (java.lang.IllegalArgumentException)
encountered during startup: Invalid sstable file
manifest.json: the name doesn't look like a supported
sstable file name
java.lang.IllegalArgumentException: Invalid sstable file
manifest.json: the name doesn't look like a supported
sstable file name
b) ERROR [main] 2021-08-29 06:25:52,120
CassandraDaemon.java:909 - Exception encountered during
startup
java.lang.IllegalArgumentException: Invalid sstable file
schema.cql: the name doesn't look like a supported
sstable file name
*In order to resolve, we removed manifest.json and
schema.cql files from each table directory and the issue
was resolved. *
Did you restore these from backup/snapshot?
*
*
*2. After resolving the above issue, we received below
WARN messages during bootstrap(step 4).*
*WARN * [main] 2021-08-29 06:33:25,737
CommitLogReplayer.java:305 - Origin of 1 sstables is
unknown or doesn't match the local node;
commitLogIntervals for them were ignored
*DEBUG *[main] 2021-08-29
06:33:25,737 CommitLogReplayer.java:306 - Ignored
commitLogIntervals from the following sstables:
[/opt1/cassandra_poc/data/clickstream/glcat_mcat_by_flname-af4e3ac0ace511ebaf9ec13e37d013c2/mc-1-big-Data.db]
*WARN *[main] 2021-08-29 06:33:25,737
CommitLogReplayer.java:305 - Origin of 2 sstables is
unknown or doesn't match the local node;
commitLogIntervals for them were ignored
*DEBUG *[main] 2021-08-29 06:33:25,738
CommitLogReplayer.java:306 - Ignored commitLogIntervals
from the following sstables:
[/opt1/cassandra_poc/data/clickstream/gl_city_map**
*
*
Your data files dont match the commitlog files it expects to
see. Either you restored these from backup, or it's because
3.0.9 is much older than 3.0.x that is more commonly used.
*3. While upgrading sstables (step 5), we received below
messages:*
*WARN* [CompactionExecutor:3] 2021-08-29 07:47:32,828
DuplicateRowChecker.java:96 - Detected 2 duplicate rows
for 29621439 during Upgrade sstables.
*WARN* [CompactionExecutor:3] 2021-08-29 07:47:32,831
DuplicateRowChecker.java:96 - Detected 4 duplicate rows
for 45016570 during Upgrade sstables.
*WARN* [CompactionExecutor:3] 2021-08-29 07:47:32,833
DuplicateRowChecker.java:96 - Detected 3 duplicate rows
for 61260692 during Upgrade sstables.
This says you have corrupt data from an old bug. Probably
related to 2.1 -> 3.0 upgrades, if this was originally on
2.1. If you read those keys, you would find that the data
returns 2-4 rows where it should be exactly 1.
4.*Also, received below messages during upgrade*
*DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,347
InitialConnectionHandler.java:77 - OPTIONS received 5/v5
*DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,349
InitialConnectionHandler.java:121 - Response to STARTUP
sent, configuring pipeline for 5/v5
*DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,350
InitialConnectionHandler.java:153 - Configured pipeline:
DefaultChannelPipeline{(frameDecoder =
org.apache.cassandra.net.FrameDecoderCrc), (frameEncoder
= org.apache.cassandra.net.FrameEncoderCrc),
(cqlProcessor =
org.apache.cassandra.transport.CQLMessageHandler),
(exceptionHandler =
org.apache.cassandra.transport.ExceptionHandlers$PostV5ExceptionHandler)}
Logs of debug stuff, normal. It's the netty connection
pipelines being setup.
*5. After upgrade, we are regularly getting below messages:*
*DEBUG* [ScheduledTasks:1] 2021-09-02 00:03:20,910
SSLFactory.java:354 - Checking whether certificates have
been updated []
*DEBUG* [ScheduledTasks:1] 2021-09-02 00:13:20,910
SSLFactory.java:354 - Checking whether certificates have
been updated []
*DEBUG* [ScheduledTasks:1] 2021-09-02 00:23:20,911
SSLFactory.java:354 - Checking whether certificates have
been updated []
Normal. It's checking to see if the ssl cert changed, and if
it did, it would reload it.
*Can someone please explain what these above ERROR / WARN
/ DEBUG messages refer to? Is there anything to be
concerned about?*
*
*
*Also, received 2 READ_REQ dropped messages (may be
due to nw latency) *
*INFO* [ScheduledTasks:1] 2021-09-03 11:40:10,009
MessagingMetrics.java:206 - READ_REQ messages were
dropped in last 5000 ms: 0 internal and 1 cross node.
Mean internal dropped latency: 0 ms and Mean cross-node
dropped latency: 12359 ms
*INFO* [ScheduledTasks:1] 2021-09-03 13:27:15,291
MessagingMetrics.java:206 - READ_REQ messages were
dropped in last 5000 ms: 0 internal and 1 cross node.
Mean internal dropped latency: 0 ms and Mean cross-node
dropped latency: 5960 ms
12s and 6s cross-node latency isn't hugely surprising from US
to India, given the geographical distance and likelihood of
packet loss across that distance. Losing 1 read request every
few hours seems like it's within normal expectations.
Rest of the stats are pretty much normal (tpstats,
status, info, tablestats, etc)
Regards,
Ashish