Re: Node goes AWOL briefly; failed replication does not report error to client, though consistency=ALL

Jonathan Ellis Tue, 07 Dec 2010 13:11:23 -0800

I'm inclined to think there's a bug in your client, then.  DEBUG-level
logs could confirm or refute this by logging for each insert how many
replicas are being blocked for, which nodes it got responses from, and
whether a TimedOutException from not getting ALL replies was returned
to the client.


On Tue, Dec 7, 2010 at 2:37 PM, Reverend Chip <rev.c...@gmail.com> wrote:
> No, I'm afraid that's not it:
>  replica_placement_strategy: org.apache.cassandra.locator.SimpleStrategy
>  replication_factor: 3
>
> On 12/7/2010 6:37 AM, Jonathan Ellis wrote:
>> If you are using NetworkTopologyStrategy you are probably hitting
>> https://issues.apache.org/jira/browse/CASSANDRA-1804 which is fixed in
>> rc2.
>>
>> On Mon, Dec 6, 2010 at 6:58 PM, Reverend Chip <rev.c...@gmail.com> wrote:
>>> I'm running a big test -- ten nodes with 3T disk each.  I'm using
>>> 0.7.0rc1.  After some tuning help (thanks Tyler) lots of this is working
>>> as it should.  However a serious event occurred as well -- the server
>>> froze up -- and though mutations were dropped, no error was reported to
>>> the client.  Here's what the log said on host X.19:
>>>
>>>  WARN [ScheduledTasks:1] 2010-12-06 14:04:11,125 MessagingService.java
>>> (line 527) Dropped 76 MUTATION messages in the last 5000ms
>>>
>>> Meanwhile, on the OTHER nodes, gossip decided the node was not available
>>> for a while:
>>>
>>>  INFO [ScheduledTasks:1] 2010-12-06 14:04:02,396 Gossiper.java (line
>>> 195) InetAddress /X.19 is now dead.
>>>  INFO [GossipStage:1] 2010-12-06 14:04:06,127 Gossiper.java (line 569)
>>> InetAddress /X.19 is now UP
>>>
>>> And despite the fact that I was writing with consistency=ALL, none of my
>>> clients reported any errors on their mutations.
>>>
>>> Tyler has this information but I would like to know if anyone has seen
>>> this before, and/or has a diagnosis.
>>>
>>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Node goes AWOL briefly; failed replication does not report error to client, though consistency=ALL

Reply via email to