Hi, I have test setup where clients randomly make a controlled number of cas() requests (among other requests) at a cluster of cassandra 2.0 servers. After one point, I'm seeing that all requests are pending and my client's throughput has reduced to 0.0 for all kinds of requests. For this specific case I had 10 clients each making around 30 cas() requests per second at a cluster of 72 instances of cassandra.
Clients are set up to register a request as a success after the cas() call returns with CASResult.success = true, else an exception is thrown. Since I see that no client requests were actually registered and no exceptions were thrown, which indicates that the cas() call itself is hung. On the server side, I see Paxos logs as follows - they go on for 50 log files for each of the servers involved, and they span at least an hour. I have marked a particular instance where the prepare response is true but the propose response is false from all the involved servers: *At the Paxos Initiator: * None of the files among the 50 system logs have the phrase 'Propose response true', these logs just go on and on. * * DEBUG [RequestResponseStage:110] 2013-07-25 15:09:05,332 PrepareCallback.java (line 58) Prepare response PrepareResponse(true, Commit(d145fe46f5d02a54b5ea95852f94c402, 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])), Commit(d145fe46f5d02a54b5ea95852f94c402, d2093120-f576-11e2-a57e-a154d605509d, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from / 17.163.7.195 * * DEBUG [RequestResponseStage:92] 2013-07-25 15:09:05,346 PrepareCallback.java (line 58) Prepare response PrepareResponse(true, Commit(d145fe46f5d02a54b5ea95852f94c402, 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])), Commit(d145fe46f5d02a54b5ea95852f94c402, d2093120-f576-11e2-a57e-a154d605509d, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from / 17.163.7.184 DEBUG [RequestResponseStage:98] 2013-07-25 15:09:05,347 PrepareCallback.java (line 58) Prepare response PrepareResponse(true, Commit(d145fe46f5d02a54b5ea95852f94c402, 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])), Commit(d145fe46f5d02a54b5ea95852f94c402, d2093120-f576-11e2-a57e-a154d605509d, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from / 17.163.7.20 DEBUG [RequestResponseStage:93] 2013-07-25 15:09:05,350 ProposeCallback.java (line 44) Propose response false from /17.163.7.20 DEBUG [RequestResponseStage:100] 2013-07-25 15:09:05,350 ProposeCallback.java (line 44) Propose response false from /17.163.7.184 DEBUG [RequestResponseStage:111] 2013-07-25 15:09:05,350 ProposeCallback.java (line 44) Propose response false from /17.163.7.195 DEBUG [RequestResponseStage:102] 2013-07-25 15:09:05,351 PrepareCallback.java (line 58) Prepare response PrepareResponse(true, Commit(d145fe46f5d02a54b5ea95852f94c402, 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])), Commit(d145fe46f5d02a54b5ea95852f94c402, d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from / 17.163.7.195 DEBUG [RequestResponseStage:107] 2013-07-25 15:09:05,352 PrepareCallback.java (line 58) Prepare response PrepareResponse(true, Commit(d145fe46f5d02a54b5ea95852f94c402, 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])), Commit(d145fe46f5d02a54b5ea95852f94c402, d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from / 17.163.7.20 DEBUG [RequestResponseStage:108] 2013-07-25 15:09:05,352 PrepareCallback.java (line 58) Prepare response PrepareResponse(true, Commit(d145fe46f5d02a54b5ea95852f94c402, 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])), Commit(d145fe46f5d02a54b5ea95852f94c402, d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from / 17.163.7.184 DEBUG [RequestResponseStage:104] 2013-07-25 15:09:05,352 ProposeCallback.java (line 44) Propose response false from /17.163.7.20 DEBUG [RequestResponseStage:99] 2013-07-25 15:09:05,353 ProposeCallback.java (line 44) Propose response false from /17.163.7.195 DEBUG [RequestResponseStage:105] 2013-07-25 15:09:05,353 ProposeCallback.java (line 44) Propose response false from /17.163.7.184 *At 17.163.7.20:* * * DEBUG [MutationStage:58] 2013-07-25 15:09:05,347 PaxosState.java (line 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402, d20b05e0-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402, d20b7b10-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) DEBUG [MutationStage:40] 2013-07-25 15:09:05,349 PaxosState.java (line 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402, d20b7b10-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402, d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) DEBUG [MutationStage:42] 2013-07-25 15:09:05,351 PaxosState.java (line 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402, d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402, d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) *DEBUG [MutationStage:43] 2013-07-25 15:09:05,352 PaxosState.java (line 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402, d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402, d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))* At 17.163.7.195: DEBUG [MutationStage:33] 2013-07-25 15:09:05,352 PaxosState.java (line 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402, d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402, d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) DEBUG [RequestResponseStage:38] 2013-07-25 15:09:05,352 PrepareCallback.java (line 58) Prepare response PrepareResponse(true, Commit(658db3eababc5629b8bcb77dc84db81a, 0193c330-f561-11e2-ad78-7bbb6a42087c, ColumnFamily(P [62caa268bd7e90050000000000000001:false:21@1374780776163000,])), Commit(658db3eababc5629b8bcb77dc84db81a, d20c3e60-f576-11e2-b3a3-edb9f71dce8f, ColumnFamily(P [62caa268bd7e90050000000000000001:false:21@1374780776163000,]))) from / 17.163.7.162 DEBUG [RequestResponseStage:48] 2013-07-25 15:09:05,353 ProposeCallback.java (line 44) Propose response false from /17.163.7.194 DEBUG [RequestResponseStage:54] 2013-07-25 15:09:05,353 ProposeCallback.java (line 44) Propose response false from /17.163.7.162 *DEBUG [MutationStage:35] 2013-07-25 15:09:05,353 PaxosState.java (line 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402, d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402, d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))* *At 17.163.7.184:* DEBUG [MutationStage:53] 2013-07-25 15:09:05,347 PaxosState.java (line 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402, d20b05e0-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402, d20b7b10-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) DEBUG [MutationStage:38] 2013-07-25 15:09:05,348 PaxosState.java (line 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402, d20b7b10-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402, d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) DEBUG [MutationStage:64] 2013-07-25 15:09:05,351 PaxosState.java (line 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402, d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402, d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) *DEBUG [MutationStage:33] 2013-07-25 15:09:05,351 PaxosState.java (line 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402, d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402, d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))* Thanks, Soumava