What’s your output of `nodetool compactionstats`?

> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vickrum....@idioplatform.com> wrote:
> 
> Hi,
> 
> We recently added a new node to our cluster in order to replace a node that 
> died (hardware failure we believe). For the next two weeks it had high disk 
> and network activity. We replaced the server, but it's happened again. We've 
> looked into memory allowances, disk performance, number of connections, and 
> all the nodetool stats, but can't find the cause of the issue.
> 
> `nodetool tpstats`[0] shows a lot of active and pending threads, in 
> comparison to the rest of the cluster, but that's likely a symptom, not a 
> cause.
> 
> `nodetool status`[1] shows the cluster isn't quite balanced. The bad node (D) 
> has less data.
> 
> Disk Activity[2] and Network activity[3] on this node is far higher than the 
> rest.
> 
> The only other difference this node has to the rest of the cluster is that 
> its on the ext4 filesystem, whereas the rest are ext3, but we've done plenty 
> of testing there and can't see how that would affect performance on this node 
> so much.
> 
> Nothing of note in system.log.
> 
> What should our next step be in trying to diagnose this issue?
> 
> Best wishes,
> Vic
> 
> [0] `nodetool tpstats` output:
> 
> Good node:
>     Pool Name                    Active   Pending      Completed   Blocked  
> All time blocked
>     ReadStage                         0         0       46311521         0    
>              0
>     RequestResponseStage              0         0       23817366         0    
>              0
>     MutationStage                     0         0       47389269         0    
>              0
>     ReadRepairStage                   0         0          11108         0    
>              0
>     ReplicateOnWriteStage             0         0              0         0    
>              0
>     GossipStage                       0         0        5259908         0    
>              0
>     CacheCleanupExecutor              0         0              0         0    
>              0
>     MigrationStage                    0         0             30         0    
>              0
>     MemoryMeter                       0         0          16563         0    
>              0
>     FlushWriter                       0         0          39637         0    
>             26
>     ValidationExecutor                0         0          19013         0    
>              0
>     InternalResponseStage             0         0              9         0    
>              0
>     AntiEntropyStage                  0         0          38026         0    
>              0
>     MemtablePostFlusher               0         0          81740         0    
>              0
>     MiscStage                         0         0          19196         0    
>              0
>     PendingRangeCalculator            0         0             23         0    
>              0
>     CompactionExecutor                0         0          61629         0    
>              0
>     commitlog_archiver                0         0              0         0    
>              0
>     HintedHandoff                     0         0             63         0    
>              0
> 
>     Message type           Dropped
>     RANGE_SLICE                  0
>     READ_REPAIR                  0
>     PAGED_RANGE                  0
>     BINARY                       0
>     READ                       640
>     MUTATION                     0
>     _TRACE                       0
>     REQUEST_RESPONSE             0
>     COUNTER_MUTATION             0
> 
> Bad node:
>     Pool Name                    Active   Pending      Completed   Blocked  
> All time blocked
>     ReadStage                        32       113          52216         0    
>              0
>     RequestResponseStage              0         0           4167         0    
>              0
>     MutationStage                     0         0         127559         0    
>              0
>     ReadRepairStage                   0         0            125         0    
>              0
>     ReplicateOnWriteStage             0         0              0         0    
>              0
>     GossipStage                       0         0           9965         0    
>              0
>     CacheCleanupExecutor              0         0              0         0    
>              0
>     MigrationStage                    0         0              0         0    
>              0
>     MemoryMeter                       0         0             24         0    
>              0
>     FlushWriter                       0         0             27         0    
>              1
>     ValidationExecutor                0         0              0         0    
>              0
>     InternalResponseStage             0         0              0         0    
>              0
>     AntiEntropyStage                  0         0              0         0    
>              0
>     MemtablePostFlusher               0         0             96         0    
>              0
>     MiscStage                         0         0              0         0    
>              0
>     PendingRangeCalculator            0         0             10         0    
>              0
>     CompactionExecutor                1         1             73         0    
>              0
>     commitlog_archiver                0         0              0         0    
>              0
>     HintedHandoff                     0         0             15         0    
>              0
> 
>     Message type           Dropped
>     RANGE_SLICE                130
>     READ_REPAIR                  1
>     PAGED_RANGE                  0
>     BINARY                       0
>     READ                     31032
>     MUTATION                   865
>     _TRACE                       0
>     REQUEST_RESPONSE             7
>     COUNTER_MUTATION             0
> 
> 
> [1] `nodetool status` output:
> 
>     Status=Up/Down
>     |/ State=Normal/Leaving/Joining/Moving
>     --  Address         Load       Tokens  Owns   Host ID                     
>           Rack
>     UN  A (Good)        252.37 GB  256     23.0%  
> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>     UN  B (Good)        245.91 GB  256     24.4%  
> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>     UN  C (Good)        254.79 GB  256     23.7%  
> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>     UN  D (Bad)         163.85 GB  256     28.8%  
> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
> 
> [2] Disk read/write ops:
> 
>     
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>  
> <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png>
>     
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>  
> <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png>
> 
> [3] Network in/out:
> 
>     
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>  
> <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png>
>     
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>  
> <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png>

Reply via email to