Hi Jeff Thanks for your answer.
- Regarding to the drain, I will proceed as you indicate. Run a flush and then shutdown the node. - Regarding to the Cassandra updates. I want to upgrade the version of the cluster because we are having problems with timeouts (all the nodes became unresponsive) when running compactions. In other question I ask in this list to find a solution to this problem someone recommended me two things: add new nodes to the cluster(I already have ordered two new nodes) and upgrade the version of Cassandra because the version 2.0.17 is very old and newer versions, especially 2.1 had a lot of improvements in terms of performance (which is probably the problem we are facing). - We have a demo environment but unafortunately the cluster test does not have the same size and we cannot replicate the data from the live cluster on the test cluster because of the size of the live data. Anyway in our cases, surprises will not cost millions of dolars and certainly not my job. We are also a small company, so even if we upgrade the test cluster probably nobody ( I mean real users) will test the application who uses the cluster. This means, that probably we will not detect the bugs even in the test cluster. From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com] Sent: woensdag 6 april 2016 19:13 To: user@cassandra.apache.org Subject: Re: nodetool drain running for days Drain should not run for days – if it were me, I’d be: * Checking for ‘DRAINED’ in the server logs * Running ‘nodetool flush’ just to explicitly flush the commitlog/memtables (generally useful before doing drain, too, it can be somewhat race-y) * Explicitly killing cassandra following the flush – drain should simply be a flush+shutdowneverything, so it should take on the order of seconds, not days. For your question about 3.0: historically, Cassandra has had some bugs in new major versions - Hints were broken from 1.0.0 to 1.0.3 - https://issues.apache.org/jira/browse/CASSANDRA-3466 Hints were broken again from 1.1.0 to 1.1.6 - https://issues.apache.org/jira/browse/CASSANDRA-4772 There was a corruption bug in 2.0 until 2.0.8 - https://issues.apache.org/jira/browse/CASSANDRA-6285 There were a number of rough edges in 2.1, including a memory leak fixed in 2.1.7 - https://issues.apache.org/jira/browse/CASSANDRA-9549 Compaction kept stopping in 2.2.0 until 2.2.2 - https://issues.apache.org/jira/browse/CASSANDRA-10270 Because of this history of “bugs in new versions", many operators choose to hold off on going to new versions until they’re “better tested”. The catch-22 is obvious here: if nobody uses it, nobody tests it in the real world to find the bugs not discovered in automated testing. The Datastax folks did some awesome work for 3.0 to extend the unit and distributed tests – they’re MUCH better than they were in 2.2, so hopefully there are fewer surprise bugs in 3+, but there’s bound to be a few. The apache team has also changed the release cycle to release more frequently, so that there’s less new code in each release (see http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/ ). If you’ve got a lab/demo/stage/test environment that can tolerate some outages, I definitely encourage you to upgrade there, first. If a few surprise issues will cost your company millions of dollars, or will cost you your job, let someone else upgrade and be the guinea pig, and don’t upgrade until you’re compelled to do so because of a bug fix you need, or a feature that won’t be in the version you’re running. From: Paco Trujillo Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" Date: Tuesday, April 5, 2016 at 11:12 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" Subject: nodetool drain running for days We are having performance problems with our cluster regarding to timeouts when repairs are running or massive deletes. One of the advice I received was update our casssandra version from 2.0.17 to 2.2. I am draining one of the nodes to start the upgrade and the drain is running now for two days. In the logs only see log like these from time to time: INFO [ScheduledTasks:1] 2016-04-06 08:17:10,987 ColumnFamilyStore.java (line 808) Enqueuing flush of Memtable-sstable_activity@1382334976(15653/226669 serialized/live bytes, 6023 ops) INFO [FlushWriter:1468] 2016-04-06 08:17:10,988 Memtable.java (line 362) Writing Memtable-sstable_activity@1382334976(15653/226669 serialized/live bytes, 6023 ops) INFO [ScheduledTasks:1] 2016-04-06 08:17:11,004 ColumnFamilyStore.java (line 808) Enqueuing flush of Memtable-compaction_history@1425848386(1599/15990 serialized/live bytes, 51 ops) INFO [FlushWriter:1468] 2016-04-06 08:17:11,012 Memtable.java (line 402) Completed flushing /var/lib/cassandra/data/system/sstable_activity/system-sstable_activity-jb-4826-Data.db (6348 bytes) for commitlog position ReplayPosition(segmentId=1458540068021, position=1198022) INFO [FlushWriter:1468] 2016-04-06 08:17:11,012 Memtable.java (line 362) Writing Memtable-compaction_history@1425848386(1599/15990 serialized/live bytes, 51 ops) INFO [FlushWriter:1468] 2016-04-06 08:17:11,039 Memtable.java (line 402) Completed flushing /var/lib/cassandra/data/system/compaction_history/system-compaction_history-jb-3491-Data.db (730 bytes) for commitlog position ReplayPosition(segmentId=1458540068021, position=1202850) Should I wait or just stop the node and start the migration? Another question, I have check the changes in 3.0 and I do not see any incompatibilities with the features we are using at this moment or with our actual hardware (apart from the java version). Probably more people ask this, but there is some important reason for not upgrade the cluster?