Re: New node has high network and disk usage.

Vickrum Loi Wed, 06 Jan 2016 08:03:06 -0800

I should probably have mentioned that we're on Cassandra 2.0.10.

On 6 January 2016 at 15:26, Vickrum Loi <vickrum....@idioplatform.com>
wrote:


> Hi,
>
> We recently added a new node to our cluster in order to replace a node
> that died (hardware failure we believe). For the next two weeks it had high
> disk and network activity. We replaced the server, but it's happened again.
> We've looked into memory allowances, disk performance, number of
> connections, and all the nodetool stats, but can't find the cause of the
> issue.
>
> `nodetool tpstats`[0] shows a lot of active and pending threads, in
> comparison to the rest of the cluster, but that's likely a symptom, not a
> cause.
>
> `nodetool status`[1] shows the cluster isn't quite balanced. The bad node
> (D) has less data.
>
> Disk Activity[2] and Network activity[3] on this node is far higher than
> the rest.
>
> The only other difference this node has to the rest of the cluster is that
> its on the ext4 filesystem, whereas the rest are ext3, but we've done
> plenty of testing there and can't see how that would affect performance on
> this node so much.
>
> Nothing of note in system.log.
>
> What should our next step be in trying to diagnose this issue?
>
> Best wishes,
> Vic
>
> [0] `nodetool tpstats` output:
>
> Good node:
>     Pool Name                    Active   Pending      Completed
> Blocked  All time blocked
>     ReadStage                         0         0       46311521
> 0                 0
>     RequestResponseStage              0         0       23817366
> 0                 0
>     MutationStage                     0         0       47389269
> 0                 0
>     ReadRepairStage                   0         0          11108
> 0                 0
>     ReplicateOnWriteStage             0         0              0
> 0                 0
>     GossipStage                       0         0        5259908
> 0                 0
>     CacheCleanupExecutor              0         0              0
> 0                 0
>     MigrationStage                    0         0             30
> 0                 0
>     MemoryMeter                       0         0          16563
> 0                 0
>     FlushWriter                       0         0          39637
> 0                26
>     ValidationExecutor                0         0          19013
> 0                 0
>     InternalResponseStage             0         0              9
> 0                 0
>     AntiEntropyStage                  0         0          38026
> 0                 0
>     MemtablePostFlusher               0         0          81740
> 0                 0
>     MiscStage                         0         0          19196
> 0                 0
>     PendingRangeCalculator            0         0             23
> 0                 0
>     CompactionExecutor                0         0          61629
> 0                 0
>     commitlog_archiver                0         0              0
> 0                 0
>     HintedHandoff                     0         0             63
> 0                 0
>
>     Message type           Dropped
>     RANGE_SLICE                  0
>     READ_REPAIR                  0
>     PAGED_RANGE                  0
>     BINARY                       0
>     READ                       640
>     MUTATION                     0
>     _TRACE                       0
>     REQUEST_RESPONSE             0
>     COUNTER_MUTATION             0
>
> Bad node:
>     Pool Name                    Active   Pending      Completed
> Blocked  All time blocked
>     ReadStage                        32       113          52216
> 0                 0
>     RequestResponseStage              0         0           4167
> 0                 0
>     MutationStage                     0         0         127559
> 0                 0
>     ReadRepairStage                   0         0            125
> 0                 0
>     ReplicateOnWriteStage             0         0              0
> 0                 0
>     GossipStage                       0         0           9965
> 0                 0
>     CacheCleanupExecutor              0         0              0
> 0                 0
>     MigrationStage                    0         0              0
> 0                 0
>     MemoryMeter                       0         0             24
> 0                 0
>     FlushWriter                       0         0             27
> 0                 1
>     ValidationExecutor                0         0              0
> 0                 0
>     InternalResponseStage             0         0              0
> 0                 0
>     AntiEntropyStage                  0         0              0
> 0                 0
>     MemtablePostFlusher               0         0             96
> 0                 0
>     MiscStage                         0         0              0
> 0                 0
>     PendingRangeCalculator            0         0             10
> 0                 0
>     CompactionExecutor                1         1             73
> 0                 0
>     commitlog_archiver                0         0              0
> 0                 0
>     HintedHandoff                     0         0             15
> 0                 0
>
>     Message type           Dropped
>     RANGE_SLICE                130
>     READ_REPAIR                  1
>     PAGED_RANGE                  0
>     BINARY                       0
>     READ                     31032
>     MUTATION                   865
>     _TRACE                       0
>     REQUEST_RESPONSE             7
>     COUNTER_MUTATION             0
>
>
> [1] `nodetool status` output:
>
>     Status=Up/Down
>     |/ State=Normal/Leaving/Joining/Moving
>     --  Address         Load       Tokens  Owns   Host
> ID                               Rack
>     UN  A (Good)        252.37 GB  256     23.0%
> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>     UN  B (Good)        245.91 GB  256     24.4%
> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>     UN  C (Good)        254.79 GB  256     23.7%
> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>     UN  D (Bad)         163.85 GB  256     28.8%
> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>
> [2] Disk read/write ops:
>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>
> [3] Network in/out:
>
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>
> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>

Re: New node has high network and disk usage.

Reply via email to