What’s your output of `nodetool compactionstats`?
> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vickrum....@idioplatform.com> wrote:
>
> Hi,
>
> We recently added a new node to our cluster in order to replace a node that
> died (hardware failure we believe). For the next two weeks it had high disk
> and network activity. We replaced the server, but it's happened again. We've
> looked into memory allowances, disk performance, number of connections, and
> all the nodetool stats, but can't find the cause of the issue.
>
> `nodetool tpstats`[0] shows a lot of active and pending threads, in
> comparison to the rest of the cluster, but that's likely a symptom, not a
> cause.
>
> `nodetool status`[1] shows the cluster isn't quite balanced. The bad node (D)
> has less data.
>
> Disk Activity[2] and Network activity[3] on this node is far higher than the
> rest.
>
> The only other difference this node has to the rest of the cluster is that
> its on the ext4 filesystem, whereas the rest are ext3, but we've done plenty
> of testing there and can't see how that would affect performance on this node
> so much.
>
> Nothing of note in system.log.
>
> What should our next step be in trying to diagnose this issue?
>
> Best wishes,
> Vic
>
> [0] `nodetool tpstats` output:
>
> Good node:
> Pool Name Active Pending Completed Blocked
> All time blocked
> ReadStage 0 0 46311521 0
> 0
> RequestResponseStage 0 0 23817366 0
> 0
> MutationStage 0 0 47389269 0
> 0
> ReadRepairStage 0 0 11108 0
> 0
> ReplicateOnWriteStage 0 0 0 0
> 0
> GossipStage 0 0 5259908 0
> 0
> CacheCleanupExecutor 0 0 0 0
> 0
> MigrationStage 0 0 30 0
> 0
> MemoryMeter 0 0 16563 0
> 0
> FlushWriter 0 0 39637 0
> 26
> ValidationExecutor 0 0 19013 0
> 0
> InternalResponseStage 0 0 9 0
> 0
> AntiEntropyStage 0 0 38026 0
> 0
> MemtablePostFlusher 0 0 81740 0
> 0
> MiscStage 0 0 19196 0
> 0
> PendingRangeCalculator 0 0 23 0
> 0
> CompactionExecutor 0 0 61629 0
> 0
> commitlog_archiver 0 0 0 0
> 0
> HintedHandoff 0 0 63 0
> 0
>
> Message type Dropped
> RANGE_SLICE 0
> READ_REPAIR 0
> PAGED_RANGE 0
> BINARY 0
> READ 640
> MUTATION 0
> _TRACE 0
> REQUEST_RESPONSE 0
> COUNTER_MUTATION 0
>
> Bad node:
> Pool Name Active Pending Completed Blocked
> All time blocked
> ReadStage 32 113 52216 0
> 0
> RequestResponseStage 0 0 4167 0
> 0
> MutationStage 0 0 127559 0
> 0
> ReadRepairStage 0 0 125 0
> 0
> ReplicateOnWriteStage 0 0 0 0
> 0
> GossipStage 0 0 9965 0
> 0
> CacheCleanupExecutor 0 0 0 0
> 0
> MigrationStage 0 0 0 0
> 0
> MemoryMeter 0 0 24 0
> 0
> FlushWriter 0 0 27 0
> 1
> ValidationExecutor 0 0 0 0
> 0
> InternalResponseStage 0 0 0 0
> 0
> AntiEntropyStage 0 0 0 0
> 0
> MemtablePostFlusher 0 0 96 0
> 0
> MiscStage 0 0 0 0
> 0
> PendingRangeCalculator 0 0 10 0
> 0
> CompactionExecutor 1 1 73 0
> 0
> commitlog_archiver 0 0 0 0
> 0
> HintedHandoff 0 0 15 0
> 0
>
> Message type Dropped
> RANGE_SLICE 130
> READ_REPAIR 1
> PAGED_RANGE 0
> BINARY 0
> READ 31032
> MUTATION 865
> _TRACE 0
> REQUEST_RESPONSE 7
> COUNTER_MUTATION 0
>
>
> [1] `nodetool status` output:
>
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host ID
> Rack
> UN A (Good) 252.37 GB 256 23.0%
> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1
> UN B (Good) 245.91 GB 256 24.4%
> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1
> UN C (Good) 254.79 GB 256 23.7%
> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1
> UN D (Bad) 163.85 GB 256 28.8%
> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1
>
> [2] Disk read/write ops:
>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>
> <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>
> <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png>
>
> [3] Network in/out:
>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>
> <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>
> <https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png>