Re: New node has high network and disk usage.

James Griffin Thu, 14 Jan 2016 06:10:51 -0800

A summary of what we've done this morning:

   - Noted that there are no GCInspector lines in system.log on bad node
   (there are GCInspector logs on other healthy nodes)
   - Turned on GC logging, noted that we had logs which stated out total
   time for which application threads were stopped was high - ~10s.
   - Not seeing failures or any kind (promotion or concurrent mark)
   - Attached Visual VM: noted that heap usage was very low (~5% usage and
   stable) and it didn't display hallmarks GC of activity. PermGen also very
   stable
   - Downloaded GC logs and examined in GC Viewer. Noted that:
   - We had lots of pauses (again around 10s), but no full GC.
      - From a 2,300s sample, just over 2,000s were spent with threads
      paused
      - Spotted many small GCs in the new space - realised that Xmn value
      was very low (200M against a heap size of 3750M). Increased Xmn to 937M -
      no change in server behaviour (high load, high reads/s on disk, high CPU
      wait)


Current output of jstat:

  S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
2 0.00  45.20  12.82  26.84  76.21   2333   63.684     2    0.039   63.724
3 63.58   0.00  33.68   8.04  75.19     14    1.812     2    0.103    1.915

Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than 2
(which has normal load statistics).

Anywhere else you can recommend we look?

Griff

On 14 January 2016 at 01:25, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:

> Ok. I saw dropped mutations on your cluster and full gc is a common cause
> for that.
> Can you just search the word GCInspector in system.log and share the
> frequency of minor and full gc. Moreover, are you printing promotion
> failures in gc logs?? Why full gc ia getting triggered??promotion failures
> or concurrent mode failures?
>
> If you are on CMS, you need to fine tune your heap options to address full
> gc.
>
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>
> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
> <james.grif...@idioplatform.com> wrote:
> I think I was incorrect in assuming GC wasn't an issue due to the lack of
> logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
> differences, though
> comparing the startup flags on the two machines show the GC config is
> identical.:
>
> $ jstat -gcutil
>    S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
> 2  5.08   0.00  55.72  18.24  59.90  25986  619.827    28    1.597  621.424
> 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
> 11283.361
>
> Here's typical output for iostat on nodes 2 & 3 as well:
>
> $ iostat -dmx md0
>
>   Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> 2 md0               0.00     0.00  339.00    0.00     9.77     0.00
>  59.00     0.00    0.00    0.00    0.00   0.00   0.00
> 3 md0               0.00     0.00 2069.00    1.00    85.85     0.00
>  84.94     0.00    0.00    0.00    0.00   0.00   0.00
>
> Griff
>
> On 13 January 2016 at 18:36, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
>
>> Node 2 has slightly higher data but that should be ok. Not sure how read
>> ops are so high when no IO intensive activity such as repair and compaction
>> is running on node 3.May be you can try investigating logs to see whats
>> happening.
>>
>> Others on the mailing list could also share their views on the situation.
>>
>> Thanks
>> Anuj
>>
>>
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>> <james.grif...@idioplatform.com> wrote:
>> Hi Anuj,
>>
>> Below is the output of nodetool status. The nodes were replaced following
>> the instructions in Datastax documentation for replacing running nodes
>> since the nodes were running fine, it was that the servers had been
>> incorrectly initialised and they thus had less disk space. The status below
>> shows 2 has significantly higher load, however as I say 2 is operating
>> normally and is running compactions, so I guess that's not an issue?
>>
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load       Tokens  Owns   Host ID
>>           Rack
>> UN  1               253.59 GB  256     31.7%
>>  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>> UN  2               302.23 GB  256     35.3%
>>  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>> UN  3               265.02 GB  256     33.1%
>>  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
>>
>> Griff
>>
>> On 13 January 2016 at 18:12, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
>>
>>> Hi,
>>>
>>> Revisiting the thread I can see that nodetool status had both good and
>>> bad nodes at same time. How do you replace nodes? When you say bad node..I
>>> understand that the node is no more usable even though Cassandra is UP? Is
>>> that correct?
>>>
>>> If a node is in bad shape and not working, adding new node may trigger
>>> streaming huge data from bad node too. Have you considered using the
>>> procedure for replacing a dead node?
>>>
>>> Please share Latest nodetool status.
>>>
>>> nodetool output shared earlier:
>>>
>>>  `nodetool status` output:
>>>
>>>     Status=Up/Down
>>>     |/ State=Normal/Leaving/Joining/Moving
>>>     --  Address         Load       Tokens  Owns   Host
>>> ID                               Rack
>>>     UN  A (Good)        252.37 GB  256     23.0%
>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>     UN  B (Good)        245.91 GB  256     24.4%
>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>     UN  C (Good)        254.79 GB  256     23.7%
>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>
>>>
>>>
>>> Thanks
>>> Anuj
>>>
>>> Sent from Yahoo Mail on Android
>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>
>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>> <james.grif...@idioplatform.com> wrote:
>>> Hi all,
>>>
>>> We’ve spent a few days running things but are in the same position. To
>>> add some more flavour:
>>>
>>>
>>>    - We have a 3-node ring, replication factor = 3. We’ve been running
>>>    in this configuration for a few years without any real issues
>>>    - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>    brought in to replace two other nodes which had failed RAID0 
>>> configuration
>>>    and thus were lacking in disk space.
>>>    - When node 2 was brought into the ring, it exhibited high CPU wait,
>>>    IO and load metrics
>>>    - We subsequently brought 3 into the ring: as soon as 3 was fully
>>>    bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>    levels. Those same stats on 3, however, sky-rocketed
>>>    - We’ve confirmed configuration across all three nodes are identical
>>>    and in line with the recommended production settings
>>>    - We’ve run a full repair
>>>    - Node 2 is currently running compactions, 1 & 3 aren’t and have no
>>>    pending
>>>    - There is no GC happening from what I can see. Node 1 has a GC log,
>>>    but that’s not been written to since May last year
>>>
>>>
>>> What we’re seeing at the moment is similar and normal stats on nodes 1 &
>>> 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>
>>>
>>>    1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>    2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>    3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>
>>>
>>> Can you recommend any next steps?
>>>
>>> Griff
>>>
>>> On 6 January 2016 at 17:31, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
>>>
>>>> Hi Vickrum,
>>>>
>>>> I would have proceeded with diagnosis as follows:
>>>>
>>>> 1. Analysis of sar report to check system health -cpu memory swap disk
>>>> etc.
>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>
>>>> 2. Make sure that  all recommended Cassandra production settings
>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>
>>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>>> maximum token range but has significant lower data.I doubt that
>>>> bootstrapping happened properly.
>>>>
>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>
>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>
>>>> 6. Look for other common performance problems such as GC pauses to make
>>>> sure that dropped mutations are not caused by GC pauses.
>>>>
>>>>
>>>> Thanks
>>>> Anuj
>>>>
>>>> Sent from Yahoo Mail on Android
>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>
>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>> <vickrum....@idioplatform.com> wrote:
>>>> # nodetool compactionstats
>>>> pending tasks: 22
>>>>           compaction type        keyspace           table
>>>> completed           total      unit  progress
>>>>                Compactionproduction_analytics    interactions
>>>> 240410213    161172668724     bytes     0.15%
>>>>
>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>> 120815385       226295183     bytes    53.39%
>>>> Active compaction remaining time :   2h39m58s
>>>>
>>>> Worth mentioning that compactions haven't been running on this node
>>>> particularly often. The node's been performing badly regardless of whether
>>>> it's compacting or not.
>>>>
>>>> On 6 January 2016 at 16:35, Jeff Ferland <j...@tubularlabs.com> wrote:
>>>>
>>>>> What’s your output of `nodetool compactionstats`?
>>>>>
>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vickrum....@idioplatform.com>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We recently added a new node to our cluster in order to replace a node
>>>>> that died (hardware failure we believe). For the next two weeks it had 
>>>>> high
>>>>> disk and network activity. We replaced the server, but it's happened 
>>>>> again.
>>>>> We've looked into memory allowances, disk performance, number of
>>>>> connections, and all the nodetool stats, but can't find the cause of the
>>>>> issue.
>>>>>
>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>>>> comparison to the rest of the cluster, but that's likely a symptom, not a
>>>>> cause.
>>>>>
>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad
>>>>> node (D) has less data.
>>>>>
>>>>> Disk Activity[2] and Network activity[3] on this node is far higher
>>>>> than the rest.
>>>>>
>>>>> The only other difference this node has to the rest of the cluster is
>>>>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done
>>>>> plenty of testing there and can't see how that would affect performance on
>>>>> this node so much.
>>>>>
>>>>> Nothing of note in system.log.
>>>>>
>>>>> What should our next step be in trying to diagnose this issue?
>>>>>
>>>>> Best wishes,
>>>>> Vic
>>>>>
>>>>> [0] `nodetool tpstats` output:
>>>>>
>>>>> Good node:
>>>>>     Pool Name                    Active   Pending      Completed
>>>>> Blocked  All time blocked
>>>>>     ReadStage                         0         0
>>>>> 46311521         0                 0
>>>>>     RequestResponseStage              0         0
>>>>> 23817366         0                 0
>>>>>     MutationStage                     0         0
>>>>> 47389269         0                 0
>>>>>     ReadRepairStage                   0         0
>>>>> 11108         0                 0
>>>>>     ReplicateOnWriteStage             0         0
>>>>> 0         0                 0
>>>>>     GossipStage                       0         0
>>>>> 5259908         0                 0
>>>>>     CacheCleanupExecutor              0         0
>>>>> 0         0                 0
>>>>>     MigrationStage                    0         0
>>>>> 30         0                 0
>>>>>     MemoryMeter                       0         0
>>>>> 16563         0                 0
>>>>>     FlushWriter                       0         0
>>>>> 39637         0                26
>>>>>     ValidationExecutor                0         0
>>>>> 19013         0                 0
>>>>>     InternalResponseStage             0         0
>>>>> 9         0                 0
>>>>>     AntiEntropyStage                  0         0
>>>>> 38026         0                 0
>>>>>     MemtablePostFlusher               0         0
>>>>> 81740         0                 0
>>>>>     MiscStage                         0         0
>>>>> 19196         0                 0
>>>>>     PendingRangeCalculator            0         0
>>>>> 23         0                 0
>>>>>     CompactionExecutor                0         0
>>>>> 61629         0                 0
>>>>>     commitlog_archiver                0         0
>>>>> 0         0                 0
>>>>>     HintedHandoff                     0         0
>>>>> 63         0                 0
>>>>>
>>>>>     Message type           Dropped
>>>>>     RANGE_SLICE                  0
>>>>>     READ_REPAIR                  0
>>>>>     PAGED_RANGE                  0
>>>>>     BINARY                       0
>>>>>     READ                       640
>>>>>     MUTATION                     0
>>>>>     _TRACE                       0
>>>>>     REQUEST_RESPONSE             0
>>>>>     COUNTER_MUTATION             0
>>>>>
>>>>> Bad node:
>>>>>     Pool Name                    Active   Pending      Completed
>>>>> Blocked  All time blocked
>>>>>     ReadStage                        32       113
>>>>> 52216         0                 0
>>>>>     RequestResponseStage              0         0
>>>>> 4167         0                 0
>>>>>     MutationStage                     0         0
>>>>> 127559         0                 0
>>>>>     ReadRepairStage                   0         0
>>>>> 125         0                 0
>>>>>     ReplicateOnWriteStage             0         0
>>>>> 0         0                 0
>>>>>     GossipStage                       0         0
>>>>> 9965         0                 0
>>>>>     CacheCleanupExecutor              0         0
>>>>> 0         0                 0
>>>>>     MigrationStage                    0         0
>>>>> 0         0                 0
>>>>>     MemoryMeter                       0         0
>>>>> 24         0                 0
>>>>>     FlushWriter                       0         0
>>>>> 27         0                 1
>>>>>     ValidationExecutor                0         0
>>>>> 0         0                 0
>>>>>     InternalResponseStage             0         0
>>>>> 0         0                 0
>>>>>     AntiEntropyStage                  0         0
>>>>> 0         0                 0
>>>>>     MemtablePostFlusher               0         0
>>>>> 96         0                 0
>>>>>     MiscStage                         0         0
>>>>> 0         0                 0
>>>>>     PendingRangeCalculator            0         0
>>>>> 10         0                 0
>>>>>     CompactionExecutor                1         1
>>>>> 73         0                 0
>>>>>     commitlog_archiver                0         0
>>>>> 0         0                 0
>>>>>     HintedHandoff                     0         0
>>>>> 15         0                 0
>>>>>
>>>>>     Message type           Dropped
>>>>>     RANGE_SLICE                130
>>>>>     READ_REPAIR                  1
>>>>>     PAGED_RANGE                  0
>>>>>     BINARY                       0
>>>>>     READ                     31032
>>>>>     MUTATION                   865
>>>>>     _TRACE                       0
>>>>>     REQUEST_RESPONSE             7
>>>>>     COUNTER_MUTATION             0
>>>>>
>>>>>
>>>>> [1] `nodetool status` output:
>>>>>
>>>>>     Status=Up/Down
>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>> ID                               Rack
>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>
>>>>> [2] Disk read/write ops:
>>>>>
>>>>>
>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>
>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>
>>>>> [3] Network in/out:
>>>>>
>>>>>
>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>
>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: New node has high network and disk usage.

Reply via email to