On Mon, Sep 20, 2010 at 09:51, shimi <shim...@gmail.com> wrote:
> I have a cluster with 6 nodes on 2 datacenters (3 on each datacenter).
> I replaced all of the servers in the cluster (0.6.4) with new ones (0.6.5).
> My old cluster was unbalanced since I was using Random Partitioner and I
> bootstrapped all the nodes without specifying their tokens.
>
> Since I wanted the the cluster to be balanced I first added all the new
> nodes one after the other (with the right tokens this time) and then I run
> decommission on all the old ones, one after the other.
> One of the decommissioned nodes began throwing too many open files errors
> while It was decommissioning taking other nodes with him. After the second
> try I decided to stop it and run removetoken on his token from one of the
> other nodes. After that everything went well except that in the end one of
> the nodes looked unbalanced.
>
> I decided to run repair on the cluster. What I got is totally unbalanced
> nodes with way to much data then what is suppose to be. each node had x2-x4
> more data.
> I run cleanup and all of them except the one which was unbalanced to begin
> with got back to the size they were suppose to be.
> Now whenever I try to run cleanup on this node I get:
>
>  INFO [COMPACTION-POOL:1] 2010-09-20 12:04:23,069 CompactionManager.java
> (line 339) AntiCompacting ...
>  INFO [GC inspection] 2010-09-20 12:05:37,600 GCInspector.java (line 129) GC
> for ConcurrentMarkSweep: 1525 ms, 13641032 reclaimed leaving 767863520 used;
> max is 6552551424
>  INFO [GC inspection] 2010-09-20 12:05:37,601 GCInspector.java (line 150)
> Pool Name                    Active   Pending
>  INFO [GC inspection] 2010-09-20 12:05:37,605 GCInspector.java (line 156)
> STREAM-STAGE                      0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,605 GCInspector.java (line 156)
> RESPONSE-STAGE                    0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,606 GCInspector.java (line 156)
> ROW-READ-STAGE                    8       717
>  INFO [GC inspection] 2010-09-20 12:05:37,607 GCInspector.java (line 156)
> LB-OPERATIONS                     0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,607 GCInspector.java (line 156)
> MISCELLANEOUS-POOL                0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,607 GCInspector.java (line 156)
> GMFD                              0         2
>  INFO [GC inspection] 2010-09-20 12:05:37,608 GCInspector.java (line 156)
> CONSISTENCY-MANAGER               0         1
>  INFO [GC inspection] 2010-09-20 12:05:37,608 GCInspector.java (line 156)
> LB-TARGET                         0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,609 GCInspector.java (line 156)
> ROW-MUTATION-STAGE                0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,610 GCInspector.java (line 156)
> MESSAGE-STREAMING-POOL            0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,610 GCInspector.java (line 156)
> LOAD-BALANCER-STAGE               0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,611 GCInspector.java (line 156)
> FLUSH-SORTER-POOL                 0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,612 GCInspector.java (line 156)
> MEMTABLE-POST-FLUSHER             0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,612 GCInspector.java (line 156)
> AE-SERVICE-STAGE                  0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,613 GCInspector.java (line 156)
> FLUSH-WRITER-POOL                 0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,613 GCInspector.java (line 156)
> HINTED-HANDOFF-POOL               0         0
>  INFO [GC inspection] 2010-09-20 12:05:37,616 GCInspector.java (line 161)
> CompactionManager               n/a         0
>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-20 12:05:40,402
> SSTableDeletingReference.java (line 104) Deleted ...
>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-20 12:05:40,727
> SSTableDeletingReference.java (line 104) Deleted ...
>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-20 12:05:40,730
> SSTableDeletingReference.java (line 104) Deleted ...
>  INFO [SSTABLE-CLEANUP-TIMER] 2010-09-20 12:05:40,735
> SSTableDeletingReference.java (line 104) Deleted ...
>
> and after that I saw an increase in the node response time and the number
> ROW-READ-STAGE pending tasks. Since there was no indication that something
> is wrong or that the node is doing anyuthing (logs ,nodetool and JMX), the
> only thing that I could have done is to restart the server.
>
> I don't know if this is related but every hour I see this error (I think it
> is the IP of the machine that I couldn't decommission properly):
>
>  INFO [Timer-0] 2010-09-20 13:56:11,406 Gossiper.java (line 402) FatClient
> /X.X.X.X has been silent for 3600000ms, removing from gossip
> ERROR [Timer-0] 2010-09-20 13:56:11,421 Gossiper.java (line 99) Gossip error
> java.util.ConcurrentModificationException
>     at java.util.Hashtable$Enumerator.next(Hashtable.java:1031)
>     at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:383)
>     at
> org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:93)
>     at java.util.TimerThread.mainLoop(Timer.java:512)
>     at java.util.TimerThread.run(Timer.java:462)
>  INFO [GMFD:1] 2010-09-20 13:56:43,251 Gossiper.java (line 586) Node
> /X.X.X.X is now part of the cluster
>
> Does anyone have any idea how can I cleanup the problematic node?

You may just need to be patient.  Have you tried monitoring the
CompactionManager in jmx to see if it is doing things?

> Does anyone have any idea how can I get rid of the Gossip error?

This is CASSANDRA-1494. You can ignore it.

Gary.

Reply via email to