Re: [UPGRADATION] Apache Cassandra from version 3.0.9 to 4.0.0

Jeff Jirsa Sun, 05 Sep 2021 19:01:28 -0700

In-line 

> On Sep 3, 2021, at 11:12 AM, MyWorld <timeplus.1...@gmail.com> wrote:
> 
> 
> Hi Jeff,
> Thanks for your response.
> To answer your question, Yes, we have created dev environment by restoring 
> them from snapshot/CSV files.
> 
> Just one follow up question, I have a 5-node single DC on production on 
> version 3.0.9on physical server.
> We are planning to migrate to GCP along with upgradation using below steps.
> 1. Setup GCP data center with same version 3.0.9 and rebuild complete data
> 2. Now install and configure 4.0 version in new GCP data center on all 5 nodes
> 3. Stop version 3.0.9 and start 4.0 on all 5 nodes of GCP one by one
> 4. Run upgradesstables one by one on all 5 nodes of GCP
> 5.Later move read/write traffic to GCP and remove old datacenter which is 
> still on version 3.0.9
> 
> Please guide on few things:
> 1. Is the above mention approach right?


When you’re upgrading or rebuilding you want all copies on the same version 
with proper sstables . So either add GCP then upgrade to 4.0 or upgrade to 4.0 
and then expand to GCP. Don’t do them at the same time. 


> 2. OR should we update 4.0 on only one node on GCP at a time and run upgrade 
> sstables on just one node first

I usually do upgradesstables after all bounces are done

The only exception is perhaps doing upgradesstables with exactly one copy via 
backup/restore to make sure 4.0 works with your data files, which it sounds 
like you’ve already done. 

> 3. OR should we migrate to GCP first and then think of upgrade 4.0 later
> 4. OR Is there any reason I should upgrade to 3.11.x first

Not 3.11 but maybe latest 3.0 instead 


> 
> Regards,
> Ashish
> 
>> On Fri, Sep 3, 2021, 11:11 PM Jeff Jirsa <jji...@gmail.com> wrote:
>> 
>> 
>>> On Fri, Sep 3, 2021 at 10:33 AM MyWorld <timeplus.1...@gmail.com> wrote:
>>> Hi all,
>>> We are doing a POC on dev environment to upgrade apache cassandra 3.0.9 to 
>>> 4.0.0. We have the below setup currently on cassandra 3.0.9
>>> DC1 - GCP(india) - 1 node
>>> DC2 - GCP(US) - 1 node
>> 
>> 3.0.9 is very old. It's got older version of data files and some known 
>> correctness bugs. 
>>  
>>> 
>>> For upgradation, we carried out below steps on DC2 - GCP(US) node:
>>> Step1. Install apache cassandra 4.0.0 
>>> Step2. Did all Configuration settings 
>>> Step3. Stop apache cassandra 3.0.9 
>>> Step4. Start apache cassandra 4.0.0 and monitor logs
>>> Step5. Run nodetool upgradesstables and monitor logs
>>> 
>>> After monitoring logs, I had below observations:
>>> 1. Initially during bootstrap at Step4, received below exceptions:
>>> a) Exception (java.lang.IllegalArgumentException) encountered during 
>>> startup: Invalid sstable file manifest.json: the name doesn't look like a 
>>> supported sstable file name
>>> java.lang.IllegalArgumentException: Invalid sstable file manifest.json: the 
>>> name doesn't look like a supported sstable file name
>>> b) ERROR [main] 2021-08-29 06:25:52,120 CassandraDaemon.java:909 - 
>>> Exception encountered during startup
>>> java.lang.IllegalArgumentException: Invalid sstable file schema.cql: the 
>>> name doesn't look like a supported sstable file name
>>>  
>>> In order to resolve, we removed manifest.json and schema.cql files from 
>>> each table directory and the issue was resolved. 
>> 
>> Did you restore these from backup/snapshot? 
>>  
>>> 
>>> 2. After resolving the above issue, we received below WARN messages during 
>>> bootstrap(step 4).
>>> WARN  [main] 2021-08-29 06:33:25,737 CommitLogReplayer.java:305 - Origin of 
>>> 1 sstables is unknown or doesn't match the local node; commitLogIntervals 
>>> for them were ignored
>>> DEBUG [main] 2021-08-29 06:33:25,737 CommitLogReplayer.java:306 - Ignored 
>>> commitLogIntervals from the following sstables: 
>>> [/opt1/cassandra_poc/data/clickstream/glcat_mcat_by_flname-af4e3ac0ace511ebaf9ec13e37d013c2/mc-1-big-Data.db]
>>> WARN  [main] 2021-08-29 06:33:25,737 CommitLogReplayer.java:305 - Origin of 
>>> 2 sstables is unknown or doesn't match the local node; commitLogIntervals 
>>> for them were ignored
>>> DEBUG [main] 2021-08-29 06:33:25,738 CommitLogReplayer.java:306 - Ignored 
>>> commitLogIntervals from the following sstables: 
>>> [/opt1/cassandra_poc/data/clickstream/gl_city_map 
>>> 
>> 
>> Your data files dont match the commitlog files it expects to see. Either you 
>> restored these from backup, or it's because 3.0.9 is much older than 3.0.x 
>> that is more commonly used. 
>>  
>>> 3. While upgrading sstables (step 5), we received below messages:
>>> WARN  [CompactionExecutor:3] 2021-08-29 07:47:32,828 
>>> DuplicateRowChecker.java:96 - Detected 2 duplicate rows for 29621439 during 
>>> Upgrade sstables.
>>> WARN  [CompactionExecutor:3] 2021-08-29 07:47:32,831 
>>> DuplicateRowChecker.java:96 - Detected 4 duplicate rows for 45016570 during 
>>> Upgrade sstables.
>>> WARN  [CompactionExecutor:3] 2021-08-29 07:47:32,833 
>>> DuplicateRowChecker.java:96 - Detected 3 duplicate rows for 61260692 during 
>>> Upgrade sstables.
>>> 
>> 
>> This says you have corrupt data from an old bug. Probably related to 2.1 -> 
>> 3.0 upgrades, if this was originally on 2.1. If you read those keys, you 
>> would find that the data returns 2-4 rows where it should be exactly 1. 
>>  
>>> 4. Also, received below messages during upgrade
>>> DEBUG [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,347 
>>> InitialConnectionHandler.java:77 - OPTIONS received 5/v5
>>> DEBUG [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,349 
>>> InitialConnectionHandler.java:121 - Response to STARTUP sent, configuring 
>>> pipeline for 5/v5
>>> DEBUG [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,350 
>>> InitialConnectionHandler.java:153 - Configured pipeline: 
>>> DefaultChannelPipeline{(frameDecoder = 
>>> org.apache.cassandra.net.FrameDecoderCrc), (frameEncoder = 
>>> org.apache.cassandra.net.FrameEncoderCrc), (cqlProcessor = 
>>> org.apache.cassandra.transport.CQLMessageHandler), (exceptionHandler = 
>>> org.apache.cassandra.transport.ExceptionHandlers$PostV5ExceptionHandler)}
>>> 
>> 
>> Logs of debug stuff, normal. It's the netty connection pipelines being 
>> setup. 
>>  
>>> 5. After upgrade, we are regularly getting below messages:
>>> DEBUG [ScheduledTasks:1] 2021-09-02 00:03:20,910 SSLFactory.java:354 - 
>>> Checking whether certificates have been updated []
>>> DEBUG [ScheduledTasks:1] 2021-09-02 00:13:20,910 SSLFactory.java:354 - 
>>> Checking whether certificates have been updated []
>>> DEBUG [ScheduledTasks:1] 2021-09-02 00:23:20,911 SSLFactory.java:354 - 
>>> Checking whether certificates have been updated []
>>> 
>> Normal. It's checking to see if the ssl cert changed, and if it did, it 
>> would reload it. 
>>  
>>> Can someone please explain what these above ERROR / WARN / DEBUG messages 
>>> refer to? Is there anything to be concerned about?
>>> 
>>> Also, received 2 READ_REQ dropped messages (may be due to nw latency) 
>>> INFO  [ScheduledTasks:1] 2021-09-03 11:40:10,009 MessagingMetrics.java:206 
>>> - READ_REQ messages were dropped in last 5000 ms: 0 internal and 1 cross 
>>> node. Mean internal dropped latency: 0 ms and Mean cross-node dropped 
>>> latency: 12359 ms
>>> INFO  [ScheduledTasks:1] 2021-09-03 13:27:15,291 MessagingMetrics.java:206 
>>> - READ_REQ messages were dropped in last 5000 ms: 0 internal and 1 cross 
>>> node. Mean internal dropped latency: 0 ms and Mean cross-node dropped 
>>> latency: 5960 ms
>>> 
>> 
>> 12s and 6s cross-node latency isn't hugely surprising from US to India, 
>> given the geographical distance and likelihood of packet loss across that 
>> distance. Losing 1 read request every few hours seems like it's within 
>> normal expectations. 
>> 
>>  
>>> Rest of the stats are pretty much normal (tpstats, status, info, 
>>> tablestats, etc)
>>> 
>>> Regards,
>>> Ashish
>>>

Re: [UPGRADATION] Apache Cassandra from version 3.0.9 to 4.0.0

Reply via email to