Good Point about the Rack - Kyrill! This makes total sense to me. Deleting the System Keyspace not really, If this contains all sensitive information about the node. Maybe this makes sense in conjunction with replace_node(at_first_boot) Option. Some comments from devs about this would be great.
Regards, Jürgen > Am 03.02.2018 um 16:42 schrieb Kyrylo Lebediev <kyrylo_lebed...@epam.com>: > > I've found modified Carlos' article (more recent than that I was referring > to) and this one contains the same method as you described, Oleksandr: > https://mrcalonso.com/2016/01/26/cassandra-instantaneous-in-place-node-replacement > > Thank you for your readiness to help! > > Kind Regards, > Kyrill > From: Kyrylo Lebediev <kyrylo_lebed...@epam.com> > Sent: Saturday, February 3, 2018 12:23:15 PM > To: User > Subject: Re: Cassandra 2.1: replace running node without streaming > > Thank you Oleksandr, > Just tested on 3.11.1 and it worked for me (you may see the logs below). > Just comprehended that there is one important prerequisite this method to > work: new node MUST be located in the same rack (in terms of C*) as the old > one. Otherwise correct replicas placement order will be violated (I mean when > replicas of the same token range should be placed in different racks). > > Anyway, even having successful run of node replacement in sandbox I'm still > in doubt. > Just wondering why this procedure which seems to be much easier than > [add/remove node] or [replace a node] which are documented ways for live node > replacement, has never been included into documentation. > Does anybody in the ML know the reason for this? > > Also, for some reason in his article Carlos drops files of system keyspace > (which contains system.local table): > In the new node, delete all system tables except for the schema ones. This > will ensure that the new Cassandra node will not have any corrupt or previous > configuration assigned. > sudo cd /var/lib/cassandra/data/system && sudo ls | grep -v schema | xargs -I > {} sudo rm -rf {} > > http://engineering.mydrivesolutions.com/posts/cassandra_nodes_replacement/ > [Carlos, if you are here might you, please, comment ] > > So still a mystery to me..... > > ----- > Logs for 3.1.11 > ----- > > ====== Before: > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.10.10.222 256.61 KiB 3 100.0% > bd504008-5ff0-4b6c-a3a6-a07049e61c31 rack1 > UN 10.10.10.223 225.65 KiB 3 100.0% > c562263f-4126-4935-b9f7-f4e7d0dc70b4 rack1 <<<<<< > UN 10.10.10.221 187.39 KiB 3 100.0% > d312c083-8808-4c98-a3ab-72a7cd18b31f rack1 > > ======= After: > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.10.10.222 245.84 KiB 3 100.0% > bd504008-5ff0-4b6c-a3a6-a07049e61c31 rack1 > UN 10.10.10.221 192.8 KiB 3 100.0% > d312c083-8808-4c98-a3ab-72a7cd18b31f rack1 > UN 10.10.10.224 266.61 KiB 3 100.0% > c562263f-4126-4935-b9f7-f4e7d0dc70b4 rack1 <<<<< > > > > ====== Logs from another node (10.10.10.221): > INFO [HANDSHAKE-/10.10.10.224] 2018-02-03 11:33:01,397 > OutboundTcpConnection.java:560 - Handshaking version with /10.10.10.224 > INFO [GossipStage:1] 2018-02-03 11:33:01,431 Gossiper.java:1067 - Node > /10.10.10.224 is now part of the cluster > INFO [RequestResponseStage-1] 2018-02-03 11:33:02,190 Gossiper.java:1031 - > InetAddress /10.10.10.224 is now UP > INFO [RequestResponseStage-1] 2018-02-03 11:33:02,190 Gossiper.java:1031 - > InetAddress /10.10.10.224 is now UP > WARN [GossipStage:1] 2018-02-03 11:33:08,375 StorageService.java:2313 - Host > ID collision for c562263f-4126-4935-b9f7-f4e7d0dc70b4 between /10.10.10.223 > and /10.10.10.224; /10.10.10.224 is the new owner > INFO [GossipTasks:1] 2018-02-03 11:33:08,806 Gossiper.java:810 - FatClient > /10.10.10.223 has been silent for 30000ms, removing from gossip > > ====== Logs from new node: > INFO [main] 2018-02-03 11:33:01,926 StorageService.java:1442 - JOINING: > Finish joining ring > INFO [GossipStage:1] 2018-02-03 11:33:02,659 Gossiper.java:1067 - Node > /10.10.10.223 is now part of the cluster > WARN [GossipStage:1] 2018-02-03 11:33:02,676 StorageService.java:2307 - Not > updating host ID c562263f-4126-4935-b9f7-f4e7d0dc70b4 for /10.10.10.223 > because it's mine > INFO [GossipStage:1] 2018-02-03 11:33:02,683 StorageService.java:2365 - > Nodes /10.10.10.223 and /10.10.10.224 have the same token > -7774421781914237508. Ignoring /10.10.10.223 > INFO [GossipStage:1] 2018-02-03 11:33:02,686 StorageService.java:2365 - > Nodes /10.10.10.223 and /10.10.10.224 have the same token > 2257660731441815305. Ignoring /10.10.10.223 > INFO [GossipStage:1] 2018-02-03 11:33:02,692 StorageService.java:2365 - > Nodes /10.10.10.223 and /10.10.10.224 have the same token 51879124242594885. > Ignoring /10.10.10.223 > WARN [GossipTasks:1] 2018-02-03 11:33:03,985 Gossiper.java:789 - Gossip > stage has 5 pending tasks; skipping status check (no nodes will be marked > down) > INFO [main] 2018-02-03 11:33:04,394 SecondaryIndexManager.java:509 - > Executing pre-join tasks for: CFS(Keyspace='test', ColumnFamily='usr') > WARN [GossipTasks:1] 2018-02-03 11:33:05,088 Gossiper.java:789 - Gossip > stage has 7 pending tasks; skipping status check (no nodes will be marked > down) > INFO [GossipStage:1] 2018-02-03 11:33:05,718 Gossiper.java:1046 - > InetAddress /10.10.10.223 is now DOWN > INFO [main] 2018-02-03 11:33:06,872 StorageService.java:2268 - Node > /10.10.10.224 state jump to NORMAL > INFO [main] 2018-02-03 11:33:06,998 Gossiper.java:1655 - Waiting for gossip > to settle... > INFO [main] 2018-02-03 11:33:15,004 Gossiper.java:1686 - No gossip backlog; > proceeding > INFO [GossipTasks:1] 2018-02-03 11:33:20,114 Gossiper.java:1046 - > InetAddress /10.10.10.222 is now DOWN <<<<< have no idea why this appeared > in logs > INFO [main] 2018-02-03 11:33:20,566 NativeTransportService.java:70 - Netty > using native Epoll event loop > INFO [HANDSHAKE-/10.10.10.222] 2018-02-03 11:33:20,714 > OutboundTcpConnection.java:560 - Handshaking version with /10.10.10.222 > > > > > Kind Regards, > Kyrill > > From: Oleksandr Shulgin <oleksandr.shul...@zalando.de> > Sent: Saturday, February 3, 2018 10:44:26 AM > To: User > Subject: Re: Cassandra 2.1: replace running node without streaming > > On 3 Feb 2018 08:49, "Jürgen Albersdorfer" <jalbersdor...@gmail.com> wrote: > Cool, good to know. Do you know this is still true for 3.11.1? > > Well, I've never tried with that specific version, but this is pretty > fundamental, so I would expect it to work the same way. Test in isolation if > you want to be sure, though. > > I don't think this is documented anywhere, however, since I had the same > doubts before seeing it worked for the first time. > > -- > Alex > > Am 03.02.2018 um 08:19 schrieb Oleksandr Shulgin > <oleksandr.shul...@zalando.de>: > >> On 3 Feb 2018 02:42, "Kyrylo Lebediev" <kyrylo_lebed...@epam.com> wrote: >> Thanks, Oleksandr, >> In my case I'll need to replace all nodes in the cluster (one-by-one), so >> streaming will introduce perceptible overhead. >> My question is not about data movement/copy itself, but more about all this >> token magic. >> >> Okay, let's say we stopped old node, moved data to new node. >> Once it's started with auto_bootstrap=false it will be added to the cluster >> like an usual node, just skipping streaming stage, right? >> For a cluster with vnodes enabled, during addition of new node its token >> ranges are calculated automatically by C* on startup. >> >> So, how will C* know that this new node must be responsible for exactly the >> same token ranges as the old node was? >> How would the rest of nodes in the cluster ('peers') figure out that old >> node should be replaced in ring by the new one? >> Do you know about some limitation for this process in case of C* 2.1.x with >> vnodes enabled? >> >> A node stores its tokens and host id in the system.local table. Next time it >> starts up, it will use the same tokens as previously and the host id allows >> the rest of the cluster to see that it is the same node and ignore the IP >> address change. This happens regardless of auto_bootstrap setting. >> >> Try "select * from system.local" to see what is recorded for the old node. >> When the new node starts up it should log "Using saved tokens" with the list >> of numbers. Other nodes should log something like "ignoring IP address >> change" for the affected node addresses. >> >> Be careful though, to make sure that you put the data directory exactly >> where the new node expects to find it: otherwise it might just join as a >> brand new one, allocating new tokens. As a precaution it helps to ensure >> that the system user running the Cassandra process has no permission to >> create the data directory: this should stop the startup in case of >> misconfiguration. >> >> Cheers, >> -- >> Alex >> >