Hello Ashish,

I'm slightly worried about this:

   /Since I won't be needing physical DC anymore so instead of
   upgrading it I will simply discard that DC/

This sounds like you are planning to add GCP 3.x to existing cluster, and upgrade GCP to 4.0, then decommission the existing DC without upgrading. If so, you need to think twice. Adding or removing nodes (or DCs) in a cluster with different versions is not recommended. I'd highly recommend you upgrade the existing DC before decommissioning it. Of course, you can skip the upgrade sstables on it which is often the most time consuming part.


Cheers,

Bowen


On 06/09/2021 03:29, MyWorld wrote:
Hi Jeff,

/When you’re upgrading or rebuilding you want all copies on the same version with proper sstables . So either add GCP then upgrade to 4.0 or upgrade to 4.0 and then expand to GCP. Don’t do them at the same time./

I think I forgot to mention one thing that after completion of step 1 our GCP data center will be added with rebuild done on all nodes. So our complete cluster would be on 3.0.9 after step 1. Will change num_tokens from current 256 to 16 in GCP data center in this step only.

DC1 -
5nodes (physical) - version 3.0.9
numtokens256
DC2 -
5nodes (GCP) - version 3.0.9
numtokens16

Rest all step from 2-5 are meant for upgradation in which I am planning to go DC wise upgradation and running upgradesstables on GCP first.

DC1 -
5nodes (physical) - version 3.0.9
numtokens256
DC2 -
5nodes (GCP) - version 4.0.0
numtokens16

Since I won't be needing physical DC anymore so instead of upgrading it I will simply discard that DC

Regards,
Ashish

On Mon, Sep 6, 2021, 7:31 AM Jeff Jirsa <jji...@gmail.com <mailto:jji...@gmail.com>> wrote:

    In-line

    On Sep 3, 2021, at 11:12 AM, MyWorld <timeplus.1...@gmail.com
    <mailto:timeplus.1...@gmail.com>> wrote:

    
    Hi Jeff,
    Thanks for your response.
    To answer your question, Yes, we have created dev environment by
    restoring them from snapshot/CSV files.

    Just one follow up question, I have a 5-node single DC on
    production on version 3.0.9on physical server.
    We are planning to migrate to GCP along with upgradation using
    below steps.
    1. Setup GCP data center with same version 3.0.9 and rebuild
    complete data
    2. Now install and configure 4.0 version in new GCP data center
    on all 5 nodes
    3. Stop version 3.0.9 and start 4.0 on all 5 nodes of GCP one by one
    4. Run upgradesstables one by one on all 5 nodes of GCP
    5.Later move read/write traffic to GCP and remove old datacenter
    which is still on version 3.0.9

    Please guide on few things:
    1. Is the above mention approach right?

    When you’re upgrading or rebuilding you want all copies on the
    same version with proper sstables . So either add GCP then upgrade
    to 4.0 or upgrade to 4.0 and then expand to GCP. Don’t do them at
    the same time.


    2. OR should we update 4.0 on only one node on GCP at a time and
    run upgrade sstables on just one node first

    I usually do upgradesstables after all bounces are done

    The only exception is perhaps doing upgradesstables with exactly
    one copy via backup/restore to make sure 4.0 works with your data
    files, which it sounds like you’ve already done.

    3. OR should we migrate to GCP first and then think of upgrade
    4.0 later
    4. OR Is there any reason I should upgrade to 3.11.x first

    Not 3.11 but maybe latest 3.0 instead



    Regards,
    Ashish

    On Fri, Sep 3, 2021, 11:11 PM Jeff Jirsa <jji...@gmail.com
    <mailto:jji...@gmail.com>> wrote:



        On Fri, Sep 3, 2021 at 10:33 AM MyWorld
        <timeplus.1...@gmail.com <mailto:timeplus.1...@gmail.com>> wrote:

            Hi all,
            We are doing a POC on dev environment to upgrade apache
            cassandra 3.0.9 to 4.0.0. We have the below setup
            currently on cassandra 3.0.9
            DC1 - GCP(india) - 1 node
            DC2 - GCP(US) - 1 node


        3.0.9 is very old. It's got older version of data files and
        some known correctness bugs.


            For upgradation, we carried out below steps on DC2 -
            GCP(US) node:
            Step1. Install apache cassandra 4.0.0
            Step2. Did all Configuration settings
            Step3. Stop apache cassandra 3.0.9
            Step4. Start apache cassandra 4.0.0 and monitor logs
            Step5. Run nodetool upgradesstables and monitor logs

            After monitoring logs, I had below observations:
            *1. Initially during bootstrap at Step4, received below
            exceptions:*
            a) Exception (java.lang.IllegalArgumentException)
            encountered during startup: Invalid sstable file
            manifest.json: the name doesn't look like a supported
            sstable file name
            java.lang.IllegalArgumentException: Invalid sstable file
            manifest.json: the name doesn't look like a supported
            sstable file name
            b) ERROR [main] 2021-08-29 06:25:52,120
            CassandraDaemon.java:909 - Exception encountered during
            startup
            java.lang.IllegalArgumentException: Invalid sstable file
            schema.cql: the name doesn't look like a supported
            sstable file name

            *In order to resolve, we removed manifest.json and
            schema.cql files from each table directory and the issue
            was resolved. *


        Did you restore these from backup/snapshot?

            *
            *
            *2. After resolving the above issue, we received below
            WARN messages during bootstrap(step 4).*
            *WARN * [main] 2021-08-29 06:33:25,737
            CommitLogReplayer.java:305 - Origin of 1 sstables is
            unknown or doesn't match the local node;
            commitLogIntervals for them were ignored
            *DEBUG *[main] 2021-08-29
            06:33:25,737 CommitLogReplayer.java:306 - Ignored
            commitLogIntervals from the following sstables:
            
[/opt1/cassandra_poc/data/clickstream/glcat_mcat_by_flname-af4e3ac0ace511ebaf9ec13e37d013c2/mc-1-big-Data.db]
            *WARN *[main] 2021-08-29 06:33:25,737
            CommitLogReplayer.java:305 - Origin of 2 sstables is
            unknown or doesn't match the local node;
            commitLogIntervals for them were ignored
            *DEBUG *[main] 2021-08-29 06:33:25,738
            CommitLogReplayer.java:306 - Ignored commitLogIntervals
            from the following sstables:
            [/opt1/cassandra_poc/data/clickstream/gl_city_map**
            *
            *


        Your data files dont match the commitlog files it expects to
        see. Either you restored these from backup, or it's because
        3.0.9 is much older than 3.0.x that is more commonly used.

            *3. While upgrading sstables (step 5), we received below
            messages:*
            *WARN*  [CompactionExecutor:3] 2021-08-29 07:47:32,828
            DuplicateRowChecker.java:96 - Detected 2 duplicate rows
            for 29621439 during Upgrade sstables.
            *WARN*  [CompactionExecutor:3] 2021-08-29 07:47:32,831
            DuplicateRowChecker.java:96 - Detected 4 duplicate rows
            for 45016570 during Upgrade sstables.
            *WARN*  [CompactionExecutor:3] 2021-08-29 07:47:32,833
            DuplicateRowChecker.java:96 - Detected 3 duplicate rows
            for 61260692 during Upgrade sstables.


        This says you have corrupt data from an old bug. Probably
        related to 2.1 -> 3.0 upgrades, if this was originally on
        2.1. If you read those keys, you would find that the data
        returns 2-4 rows where it should be exactly 1.

            4.*Also, received below messages during upgrade*
            *DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,347
            InitialConnectionHandler.java:77 - OPTIONS received 5/v5
            *DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,349
            InitialConnectionHandler.java:121 - Response to STARTUP
            sent, configuring pipeline for 5/v5
            *DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,350
            InitialConnectionHandler.java:153 - Configured pipeline:
            DefaultChannelPipeline{(frameDecoder =
            org.apache.cassandra.net.FrameDecoderCrc), (frameEncoder
            = org.apache.cassandra.net.FrameEncoderCrc),
            (cqlProcessor =
            org.apache.cassandra.transport.CQLMessageHandler),
            (exceptionHandler =
            
org.apache.cassandra.transport.ExceptionHandlers$PostV5ExceptionHandler)}


        Logs of debug stuff, normal. It's the netty connection
        pipelines being setup.

            *5. After upgrade, we are regularly getting below messages:*
            *DEBUG* [ScheduledTasks:1] 2021-09-02 00:03:20,910
            SSLFactory.java:354 - Checking whether certificates have
            been updated []
            *DEBUG* [ScheduledTasks:1] 2021-09-02 00:13:20,910
            SSLFactory.java:354 - Checking whether certificates have
            been updated []
            *DEBUG* [ScheduledTasks:1] 2021-09-02 00:23:20,911
            SSLFactory.java:354 - Checking whether certificates have
            been updated []

        Normal. It's checking to see if the ssl cert changed, and if
        it did, it would reload it.

            *Can someone please explain what these above ERROR / WARN
            / DEBUG messages refer to? Is there anything to be
            concerned about?*
            *
            *
            *Also, received 2 READ_REQ dropped messages (may be
            due to nw latency) *
            *INFO*  [ScheduledTasks:1] 2021-09-03 11:40:10,009
            MessagingMetrics.java:206 - READ_REQ messages were
            dropped in last 5000 ms: 0 internal and 1 cross node.
            Mean internal dropped latency: 0 ms and Mean cross-node
            dropped latency: 12359 ms
            *INFO*  [ScheduledTasks:1] 2021-09-03 13:27:15,291
            MessagingMetrics.java:206 - READ_REQ messages were
            dropped in last 5000 ms: 0 internal and 1 cross node.
            Mean internal dropped latency: 0 ms and Mean cross-node
            dropped latency: 5960 ms


        12s and 6s cross-node latency isn't hugely surprising from US
        to India, given the geographical distance and likelihood of
        packet loss across that distance. Losing 1 read request every
        few hours seems like it's within normal expectations.

            Rest of the stats are pretty much normal (tpstats,
            status, info, tablestats, etc)

            Regards,
            Ashish

Reply via email to