Have a test cluster with three nodes each in two datacenters.  The
following causes nodetool repair to go into an (apparent) infinite
loop.  This is with 2.0.6.

On node 10.140.140.101:

cqlsh> CREATE KEYSPACE looptest WITH replication = {

  ...   'class': 'NetworkTopologyStrategy',

   ...   '140': '2',

   ...   '141': '2'

   ... };

cqlsh> use looptest;

cqlsh:looptest> CREATE TABLE a_table (

            ...   id uuid,

            ...   description text,

            ...   PRIMARY KEY (id)

            ... );

cqlsh:looptest>

On node 10.140.140.102:

[default@unknown] describe cluster;

Cluster Information:

   Name: Dev Cluster

   Snitch: org.apache.cassandra.locator.RackInferringSnitch

   Partitioner: org.apache.cassandra.dht.Murmur3Partitioner

   Schema versions:

e7c46d59-fceb-38b5-947c-dcbd14950a4c: [10.141.140.101, 10.140.140.101,
10.140.140.102, 10.141.140.103, 10.141.140.102, 10.140.140.103]

nodetool status:

Datacenter: 141

===============

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address         Load       Tokens  Owns   Host ID
             Rack

UN  10.141.140.101  25.09 MB   256     15.6%
3f0d60bf-dfcd-42a9-9cff-8b76146359e3  140

UN  10.141.140.102  27.83 MB   256     16.7%
bbdcc640-278e-4d3d-ac12-fcb4d837d0e1  140

UN  10.141.140.103  23.78 MB   256     16.5%
b030e290-b8da-4883-a13d-b2529fab37fe  140

Datacenter: 140

===============

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address         Load       Tokens  Owns   Host ID
             Rack

UN  10.140.140.103  65.26 MB   256     18.1%
52a9a718-2bed-4972-ab11-bd97a8d8539c  140

UN  10.140.140.101  69.46 MB   256     17.6%
d59300db-6179-484e-9ca1-8d1eada0701a  140

UN  10.140.140.102  68.08 MB   256     15.4%
22e504c9-1cc6-4744-b302-32bb5116d409  140


Back on 10.140.140.101:

"nodetool repair looptest" never returns.  Looking in the system.log,
it is continuously looping with:

INFO [AntiEntropySessions:818] 2014-04-09 13:23:31,889
RepairSession.java (line 282) [repair
#24b2b1b0-bfea-11e3-85a3-911072ba5322] session completed successfully

 INFO [AntiEntropySessions:816] 2014-04-09 13:23:31,916
RepairSession.java (line 244) [repair
#253687b0-bfea-11e3-85a3-911072ba5322] new session: will sync
/10.140.140.101, /10.141.140.103, /10.140.140.103, /10.141.140.102 on
range (-4377479664111251829,-4360027703686042340] for
looptest.[a_table]

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:31,949 RepairSession.java
(line 164) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Received
merkle tree for a_table from /10.141.140.102

 INFO [RepairJobTask:3] 2014-04-09 13:23:32,002 RepairJob.java (line
134) [repair #253687b0-bfea-11e3-85a3-911072ba5322] requesting merkle
trees for a_table (to [/10.141.140.103, /10.140.140.103,
/10.141.140.102, /10.140.140.101])

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,007 RepairSession.java
(line 164) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Received
merkle tree for a_table from /10.140.140.101

 INFO [RepairJobTask:3] 2014-04-09 13:23:32,012 Differencer.java (line
67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.141.140.101 and /10.140.140.103 are consistent for a_table

 INFO [RepairJobTask:2] 2014-04-09 13:23:32,016 Differencer.java (line
67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.141.140.101 and /10.140.140.101 are consistent for a_table

 INFO [RepairJobTask:1] 2014-04-09 13:23:32,016 Differencer.java (line
67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.141.140.101 and /10.141.140.102 are consistent for a_table

 INFO [RepairJobTask:4] 2014-04-09 13:23:32,016 Differencer.java (line
67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.140.140.103 and /10.141.140.102 are consistent for a_table

 INFO [RepairJobTask:5] 2014-04-09 13:23:32,016 Differencer.java (line
67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.140.140.103 and /10.140.140.101 are consistent for a_table

 INFO [RepairJobTask:6] 2014-04-09 13:23:32,016 Differencer.java (line
67) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.141.140.102 and /10.140.140.101 are consistent for a_table

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,018 RepairSession.java
(line 221) [repair #24e867b0-bfea-11e3-85a3-911072ba5322] a_table is
fully synced

 INFO [AntiEntropySessions:817] 2014-04-09 13:23:32,019
RepairSession.java (line 282) [repair
#24e867b0-bfea-11e3-85a3-911072ba5322] session completed successfully

 INFO [AntiEntropySessions:818] 2014-04-09 13:23:32,043
RepairSession.java (line 244) [repair
#2549c190-bfea-11e3-85a3-911072ba5322] new session: will sync
/10.140.140.101, /10.141.140.103, /10.140.140.102, /10.141.140.102 on
range (-3457228189350977014,-3443426249422196914] for
looptest.[a_table]

 INFO [RepairJobTask:3] 2014-04-09 13:23:32,169 RepairJob.java (line
134) [repair #2549c190-bfea-11e3-85a3-911072ba5322] requesting merkle
trees for a_table (to [/10.141.140.103, /10.140.140.102,
/10.141.140.102, /10.140.140.101])

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,197 RepairSession.java
(line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received
merkle tree for a_table from /10.141.140.103

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,247 RepairSession.java
(line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received
merkle tree for a_table from /10.140.140.103

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,454 RepairSession.java
(line 164) [repair #2549c190-bfea-11e3-85a3-911072ba5322] Received
merkle tree for a_table from /10.141.140.103

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,516 RepairSession.java
(line 164) [repair #2549c190-bfea-11e3-85a3-911072ba5322] Received
merkle tree for a_table from /10.140.140.102

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,522 RepairSession.java
(line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received
merkle tree for a_table from /10.141.140.102

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,581 RepairSession.java
(line 164) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Received
merkle tree for a_table from /10.140.140.101

 INFO [RepairJobTask:3] 2014-04-09 13:23:32,586 Differencer.java (line
67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.141.140.103 and /10.140.140.103 are consistent for a_table

 INFO [RepairJobTask:2] 2014-04-09 13:23:32,589 Differencer.java (line
67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.141.140.103 and /10.140.140.101 are consistent for a_table

 INFO [RepairJobTask:1] 2014-04-09 13:23:32,589 Differencer.java (line
67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.141.140.103 and /10.141.140.102 are consistent for a_table

 INFO [RepairJobTask:5] 2014-04-09 13:23:32,589 Differencer.java (line
67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.140.140.103 and /10.140.140.101 are consistent for a_table

 INFO [RepairJobTask:4] 2014-04-09 13:23:32,590 Differencer.java (line
67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.140.140.103 and /10.141.140.102 are consistent for a_table

 INFO [RepairJobTask:6] 2014-04-09 13:23:32,590 Differencer.java (line
67) [repair #253687b0-bfea-11e3-85a3-911072ba5322] Endpoints
/10.141.140.102 and /10.140.140.101 are consistent for a_table

 INFO [AntiEntropyStage:1] 2014-04-09 13:23:32,592 RepairSession.java
(line 221) [repair #253687b0-bfea-11e3-85a3-911072ba5322] a_table is
fully synced

 INFO [AntiEntropySessions:816] 2014-04-09 13:23:32,592
RepairSession.java (line 282) [repair
#253687b0-bfea-11e3-85a3-911072ba5322] session completed successfully

Any ideas?  Could the fact that the rack name is the same in both
datacenters have something to do with it?

Thanks,

--Kevin

Reply via email to