Hi Jean,

"I had to reboot a node. I killed the cassandra process on that node". You
should drain the node before killing java (or using any service stop
command). This is not what causes the issue yet it will help you to keep
consistence if you use counters, and make the reboot faster in any cases.

What is going on highly depends on what you did before.

Did you change the RF ?
Did you change the Topology ?
Are you sure this node had data before you restart it ?
What does a "nodetool status mykeyspace" outputs ?

To make the join faster you could try to bootstrap the node again. I just
hope you have a RF > 1 (btw, you have one replica down, if you want to
still have Reads / Writes working, take care of having a Consistency Level
low enough).

"It’s like the whole cluster is paralysed" --> what does it mean, be more
accurate on this please.

You should tell us action that were taken before this occurred and now what
is not working since a C* cluster in this state could perfectly run. No
SPOF.

C*heers

2015-06-23 16:10 GMT+02:00 Jean Tremblay <jean.tremb...@zen-innovations.com>
:

>  Does anyone know what to do when such an event occurs?
> Does anyone know how this could happen?
>
>  I would have tried repairing the node with nodetool repair but that
> takes much too long… I need my cluster to work for our online system. At
> this point nothing is working. It’s like the whole cluster is paralysed.
> The only solution I see is to remove temporarily that node.
>
>  Thanks for your comments.
>
>  On 23 Jun 2015, at 12:45 , Jean Tremblay <
> jean.tremb...@zen-innovations.com> wrote:
>
>  Hi,
>
>  I have a cluster with 5 nodes running Cassandra 2.1.6.
>
>  I had to reboot a node. I killed the cassandra process on that node.
> Rebooted the machine, and restarted Cassandra.
>
>   ~/apache-cassandra-DATA/data:321>nodetool status
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address        Load       Tokens  Owns    Host ID
>           Rack
> UN  192.168.2.104  158.27 GB  256     ?
> 6479205e-6a19-49a8-b1a1-7e788ec29caa  rack1
> UN  192.168.2.100  4.75 GB    256     ?
> e821da50-23c6-4ea0-b3a1-275ded63bc1f  rack1
> UN  192.168.2.101  157.43 GB  256     ?
> 01525665-bacc-4207-a8c3-eb4fd9532401  rack1
> UN  192.168.2.102  159.27 GB  256     ?
> 596a33d7-5089-4c7e-a9ad-e1f22111b160  rack1
> UN  192.168.2.103  167 GB     256     ?
> 0ce1d48e-57a9-4615-8e12-d7ef3d621c7d  rack1
>
>
>  After restarting node 192.168.2.100 I noticed that its load was diminish
> to 2%. And now when I query the cluster my queries are bombing and that
> node times out with an error
>
>  WARN  [MessagingService-Incoming-/192.168.2.102] 2015-06-23 12:26:00,056
> IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from
> socket; closing
> org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find
> cfId=ddc346b0-1372-11e5-9ba1-195596ed1fd9
> at
> org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:164)
> ~[apache-cassandra-2.1.6.jar:2.1.6]
> at
> org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:97)
> ~[apache-cassandra-2.1.6.jar:2.1.6]
> at
> org.apache.cassandra.db.Mutation$MutationSerializer.deserializeOneCf(Mutation.java:322)
> ~[apache-cassandra-2.1.6.jar:2.1.6]
> at
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:302)
> ~[apache-cassandra-2.1.6.jar:2.1.6]
> at
> org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:330)
> ~[apache-cassandra-2.1.6.jar:2.1.6]
>
>  What is going on? Did anyone live something like that?
>
>
>

Reply via email to