Have a test cluster with three nodes each in two datacenters. The following causes nodetool repair to go into an (apparent) infinite loop. This is with 2.0.6.
On node 10.140.140.101: cqlsh> CREATE KEYSPACE looptest WITH replication = { ... 'class': 'NetworkTopologyStrategy', ... '140': '2', ... '141': '2' ... }; cqlsh> use looptest; cqlsh:looptest> CREATE TABLE a_table ( ... id uuid, ... description text, ... PRIMARY KEY (id) ... ); cqlsh:looptest> On node 10.140.140.102: [default@unknown] describe cluster; Cluster Information: Name: Dev Cluster Snitch: org.apache.cassandra.locator.RackInferringSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: e7c46d59-fceb-38b5-947c-dcbd14950a4c: [10.141.140.101, 10.140.140.101, 10.140.140.102, 10.141.140.103, 10.141.140.102, 10.140.140.103] nodetool status: Datacenter: 141 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.141.140.101 25.09 MB 256 15.6% 3f0d60bf-dfcd-42a9-9cff-8b76146359e3 140 UN 10.141.140.102 27.83 MB 256 16.7% bbdcc640-278e-4d3d-ac12-fcb4d837d0e1 140 UN 10.141.140.103 23.78 MB 256 16.5% b030e290-b8da-4883-a13d-b2529fab37fe 140 Datacenter: 140 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 10.140.140.103 65.26 MB 256 18.1% 52a9a718-2bed-4972-ab11-bd97a8d8539c 140 UN 10.140.140.101 69.46 MB 256 17.6% d59300db-6179-484e-9ca1-8d1eada0701a 140 UN 10.140.140.102 68.08 MB 256 15.4% 22e504c9-1cc6-4744-b302-32bb5116d409 140 Back on 10.140.140.101: "nodetool repair looptest" never returns. Looking in the system.log, it is continuously looping with: INFO [AntiEntropySessions:818] 2014-04-09 13:23:31,889 RepairSession.java (line 282) [repair #24b2b1b0-bfea-11e3-85a3-911072ba5322] session completed successfully INFO [AntiEntropySessions:816] 2014-04-09 13:23:31,916 RepairSession.java (line 244) [repair #253687b0-bfea-11e3-85a3-911072ba5322] new session: will sync /10.140.140.101, /10.141.140.103, /10.140.140.103, /10.141.140.102 on range (-4377479664111251829,-4360027703686042340] for looptest.[a_table] INFO [AntiEntropyStage:1] 2014-04-09 13:23:31,949 RepairSession.java (line 164) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Received merkle tree for a_table from /10.141.140.102 INFO [RepairJobTask:3] 2014-04-09 13:23:32,002 RepairJob.java (line 134) [repair #253687b0-bfea-11e3-85a3-911072ba5322] requesting merkle trees for a_table (to [/10.141.140.103, /10.140.140.103, /10.141.140.102, /10.140.140.101]) INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,007 RepairSession.java (line 164) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Received merkle tree for a_table from /10.140.140.101 INFO [RepairJobTask:3] 2014-04-09 13:23:32,012 Differencer.java (line 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints /10.141.140.101 and /10.140.140.103 are consistent for a_table INFO [RepairJobTask:2] 2014-04-09 13:23:32,016 Differencer.java (line 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints /10.141.140.101 and /10.140.140.101 are consistent for a_table INFO [RepairJobTask:1] 2014-04-09 13:23:32,016 Differencer.java (line 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints /10.141.140.101 and /10.141.140.102 are consistent for a_table INFO [RepairJobTask:4] 2014-04-09 13:23:32,016 Differencer.java (line 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints /10.140.140.103 and /10.141.140.102 are consistent for a_table INFO [RepairJobTask:5] 2014-04-09 13:23:32,016 Differencer.java (line 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints /10.140.140.103 and /10.140.140.101 are consistent for a_table INFO [RepairJobTask:6] 2014-04-09 13:23:32,016 Differencer.java (line 67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints /10.141.140.102 and /10.140.140.101 are consistent for a_table INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,018 RepairSession.java (line 221) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] a_table is fully synced INFO [AntiEntropySessions:817] 2014-04-09 13:23:32,019 RepairSession.java (line 282) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] session completed successfully INFO [AntiEntropySessions:818] 2014-04-09 13:23:32,043 RepairSession.java (line 244) [repair #2549c190-bfea-11e3-85a3-911072ba5322] new session: will sync /10.140.140.101, /10.141.140.103, /10.140.140.102, /10.141.140.102 on range (-3457228189350977014,-3443426249422196914] for looptest.[a_table] INFO [RepairJobTask:3] 2014-04-09 13:23:32,169 RepairJob.java (line 134) [repair #2549c190-bfea-11e3-85a3-911072ba5322] requesting merkle trees for a_table (to [/10.141.140.103, /10.140.140.102, /10.141.140.102, /10.140.140.101]) INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,197 RepairSession.java (line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received merkle tree for a_table from /10.141.140.103 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,247 RepairSession.java (line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received merkle tree for a_table from /10.140.140.103 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,454 RepairSession.java (line 164) [repair #2549c190-bfea-11e3-85a3-911072ba5322] Received merkle tree for a_table from /10.141.140.103 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,516 RepairSession.java (line 164) [repair #2549c190-bfea-11e3-85a3-911072ba5322] Received merkle tree for a_table from /10.140.140.102 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,522 RepairSession.java (line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received merkle tree for a_table from /10.141.140.102 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,581 RepairSession.java (line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received merkle tree for a_table from /10.140.140.101 INFO [RepairJobTask:3] 2014-04-09 13:23:32,586 Differencer.java (line 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints /10.141.140.103 and /10.140.140.103 are consistent for a_table INFO [RepairJobTask:2] 2014-04-09 13:23:32,589 Differencer.java (line 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints /10.141.140.103 and /10.140.140.101 are consistent for a_table INFO [RepairJobTask:1] 2014-04-09 13:23:32,589 Differencer.java (line 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints /10.141.140.103 and /10.141.140.102 are consistent for a_table INFO [RepairJobTask:5] 2014-04-09 13:23:32,589 Differencer.java (line 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints /10.140.140.103 and /10.140.140.101 are consistent for a_table INFO [RepairJobTask:4] 2014-04-09 13:23:32,590 Differencer.java (line 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints /10.140.140.103 and /10.141.140.102 are consistent for a_table INFO [RepairJobTask:6] 2014-04-09 13:23:32,590 Differencer.java (line 67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints /10.141.140.102 and /10.140.140.101 are consistent for a_table INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,592 RepairSession.java (line 221) [repair #253687b0-bfea-11e3-85a3-911072ba5322] a_table is fully synced INFO [AntiEntropySessions:816] 2014-04-09 13:23:32,592 RepairSession.java (line 282) [repair #253687b0-bfea-11e3-85a3-911072ba5322] session completed successfully Any ideas? Could the fact that the rack name is the same in both datacenters have something to do with it? Thanks, --Kevin