Re: New node has high network and disk usage.

Kai Wang Thu, 14 Jan 2016 07:16:31 -0800

James,

I may miss something. You mentioned your cluster had RF=3. Then why does
"nodetool status" show each node owns 1/3 of the data especially after a
full repair?


On Thu, Jan 14, 2016 at 9:56 AM, James Griffin <
james.grif...@idioplatform.com> wrote:

> Hi Kai,
>
> Below - nothing going on that I can see
>
> $ nodetool netstats
> Mode: NORMAL
> Not sending any streams.
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0           6326
> Responses                       n/a         0         219356
>
>
>
> Best wishes,
>
> Griff
>
> [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin
> CTO
> Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter:
> @imaginaryroots <http://twitter.com/imaginaryroots> | Skype: j.s.griffin
> idio helps major brands and publishers to build closer relationships with
> their customers and prospects by learning from their content consumption
> and acting on that insight. We call it Content Intelligence, and it
> integrates with your existing marketing technology to provide detailed
> customer interest profiles in real-time across all channels, and to
> personalize content into every channel for every customer. See
> http://idioplatform.com
> <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F>
>  for
> more information.
>
> On 14 January 2016 at 14:22, Kai Wang <dep...@gmail.com> wrote:
>
>> James,
>>
>> Can you post the result of "nodetool netstats" on the bad node?
>>
>> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin <
>> james.grif...@idioplatform.com> wrote:
>>
>>> A summary of what we've done this morning:
>>>
>>>    - Noted that there are no GCInspector lines in system.log on bad
>>>    node (there are GCInspector logs on other healthy nodes)
>>>    - Turned on GC logging, noted that we had logs which stated out
>>>    total time for which application threads were stopped was high - ~10s.
>>>    - Not seeing failures or any kind (promotion or concurrent mark)
>>>    - Attached Visual VM: noted that heap usage was very low (~5% usage
>>>    and stable) and it didn't display hallmarks GC of activity. PermGen also
>>>    very stable
>>>    - Downloaded GC logs and examined in GC Viewer. Noted that:
>>>    - We had lots of pauses (again around 10s), but no full GC.
>>>       - From a 2,300s sample, just over 2,000s were spent with threads
>>>       paused
>>>       - Spotted many small GCs in the new space - realised that Xmn
>>>       value was very low (200M against a heap size of 3750M). Increased Xmn 
>>> to
>>>       937M - no change in server behaviour (high load, high reads/s on 
>>> disk, high
>>>       CPU wait)
>>>
>>> Current output of jstat:
>>>
>>>   S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>>> 2 0.00  45.20  12.82  26.84  76.21   2333   63.684     2    0.039
>>> 63.724
>>> 3 63.58   0.00  33.68   8.04  75.19     14    1.812     2    0.103
>>>  1.915
>>>
>>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than
>>> 2 (which has normal load statistics).
>>>
>>> Anywhere else you can recommend we look?
>>>
>>> Griff
>>>
>>> On 14 January 2016 at 01:25, Anuj Wadehra <anujw_2...@yahoo.co.in>
>>> wrote:
>>>
>>>> Ok. I saw dropped mutations on your cluster and full gc is a common
>>>> cause for that.
>>>> Can you just search the word GCInspector in system.log and share the
>>>> frequency of minor and full gc. Moreover, are you printing promotion
>>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures
>>>> or concurrent mode failures?
>>>>
>>>> If you are on CMS, you need to fine tune your heap options to address
>>>> full gc.
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Anuj
>>>>
>>>> Sent from Yahoo Mail on Android
>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>
>>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin
>>>> <james.grif...@idioplatform.com> wrote:
>>>> I think I was incorrect in assuming GC wasn't an issue due to the lack
>>>> of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked
>>>> differences, though
>>>> comparing the startup flags on the two machines show the GC config is
>>>> identical.:
>>>>
>>>> $ jstat -gcutil
>>>>    S0     S1     E      O      P     YGC     YGCT    FGC    FGCT     GCT
>>>> 2  5.08   0.00  55.72  18.24  59.90  25986  619.827    28    1.597
>>>>  621.424
>>>> 3  0.00   0.00  22.79  17.87  59.99 422600 11225.979   668   57.383
>>>> 11283.361
>>>>
>>>> Here's typical output for iostat on nodes 2 & 3 as well:
>>>>
>>>> $ iostat -dmx md0
>>>>
>>>>   Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
>>>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>>>> 2 md0               0.00     0.00  339.00    0.00     9.77     0.00
>>>>  59.00     0.00    0.00    0.00    0.00   0.00   0.00
>>>> 3 md0               0.00     0.00 2069.00    1.00    85.85     0.00
>>>>  84.94     0.00    0.00    0.00    0.00   0.00   0.00
>>>>
>>>> Griff
>>>>
>>>> On 13 January 2016 at 18:36, Anuj Wadehra <anujw_2...@yahoo.co.in>
>>>> wrote:
>>>>
>>>>> Node 2 has slightly higher data but that should be ok. Not sure how
>>>>> read ops are so high when no IO intensive activity such as repair and
>>>>> compaction is running on node 3.May be you can try investigating logs to
>>>>> see whats happening.
>>>>>
>>>>> Others on the mailing list could also share their views on the
>>>>> situation.
>>>>>
>>>>> Thanks
>>>>> Anuj
>>>>>
>>>>>
>>>>>
>>>>> Sent from Yahoo Mail on Android
>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>
>>>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin
>>>>> <james.grif...@idioplatform.com> wrote:
>>>>> Hi Anuj,
>>>>>
>>>>> Below is the output of nodetool status. The nodes were replaced
>>>>> following the instructions in Datastax documentation for replacing running
>>>>> nodes since the nodes were running fine, it was that the servers had been
>>>>> incorrectly initialised and they thus had less disk space. The status 
>>>>> below
>>>>> shows 2 has significantly higher load, however as I say 2 is operating
>>>>> normally and is running compactions, so I guess that's not an issue?
>>>>>
>>>>> Datacenter: datacenter1
>>>>> =======================
>>>>> Status=Up/Down
>>>>> |/ State=Normal/Leaving/Joining/Moving
>>>>> --  Address         Load       Tokens  Owns   Host ID
>>>>>               Rack
>>>>> UN  1               253.59 GB  256     31.7%
>>>>>  6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>> UN  2               302.23 GB  256     35.3%
>>>>>  faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>> UN  3               265.02 GB  256     33.1%
>>>>>  74b15507-db5c-45df-81db-6e5bcb7438a3  rack1
>>>>>
>>>>> Griff
>>>>>
>>>>> On 13 January 2016 at 18:12, Anuj Wadehra <anujw_2...@yahoo.co.in>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Revisiting the thread I can see that nodetool status had both good
>>>>>> and bad nodes at same time. How do you replace nodes? When you say bad
>>>>>> node..I understand that the node is no more usable even though Cassandra 
>>>>>> is
>>>>>> UP? Is that correct?
>>>>>>
>>>>>> If a node is in bad shape and not working, adding new node may
>>>>>> trigger streaming huge data from bad node too. Have you considered using
>>>>>> the procedure for replacing a dead node?
>>>>>>
>>>>>> Please share Latest nodetool status.
>>>>>>
>>>>>> nodetool output shared earlier:
>>>>>>
>>>>>>  `nodetool status` output:
>>>>>>
>>>>>>     Status=Up/Down
>>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>>> ID                               Rack
>>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Anuj
>>>>>>
>>>>>> Sent from Yahoo Mail on Android
>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>
>>>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin
>>>>>> <james.grif...@idioplatform.com> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> We’ve spent a few days running things but are in the same position.
>>>>>> To add some more flavour:
>>>>>>
>>>>>>
>>>>>>    - We have a 3-node ring, replication factor = 3. We’ve been
>>>>>>    running in this configuration for a few years without any real issues
>>>>>>    - Nodes 2 & 3 are much newer than node 1. These two nodes were
>>>>>>    brought in to replace two other nodes which had failed RAID0 
>>>>>> configuration
>>>>>>    and thus were lacking in disk space.
>>>>>>    - When node 2 was brought into the ring, it exhibited high CPU
>>>>>>    wait, IO and load metrics
>>>>>>    - We subsequently brought 3 into the ring: as soon as 3 was fully
>>>>>>    bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal
>>>>>>    levels. Those same stats on 3, however, sky-rocketed
>>>>>>    - We’ve confirmed configuration across all three nodes are
>>>>>>    identical and in line with the recommended production settings
>>>>>>    - We’ve run a full repair
>>>>>>    - Node 2 is currently running compactions, 1 & 3 aren’t and have
>>>>>>    no pending
>>>>>>    - There is no GC happening from what I can see. Node 1 has a GC
>>>>>>    log, but that’s not been written to since May last year
>>>>>>
>>>>>>
>>>>>> What we’re seeing at the moment is similar and normal stats on nodes
>>>>>> 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot:
>>>>>>
>>>>>>
>>>>>>    1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s
>>>>>>    2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s
>>>>>>    3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s
>>>>>>
>>>>>>
>>>>>> Can you recommend any next steps?
>>>>>>
>>>>>> Griff
>>>>>>
>>>>>> On 6 January 2016 at 17:31, Anuj Wadehra <anujw_2...@yahoo.co.in>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Vickrum,
>>>>>>>
>>>>>>> I would have proceeded with diagnosis as follows:
>>>>>>>
>>>>>>> 1. Analysis of sar report to check system health -cpu memory swap
>>>>>>> disk etc.
>>>>>>> System seems to be overloaded. This is evident from mutation drops.
>>>>>>>
>>>>>>> 2. Make sure that  all recommended Cassandra production settings
>>>>>>> available at Datastax site are applied ,disable zone reclaim and THP.
>>>>>>>
>>>>>>> 3.Run full Repair on bad node and check data size. Node is owner of
>>>>>>> maximum token range but has significant lower data.I doubt that
>>>>>>> bootstrapping happened properly.
>>>>>>>
>>>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling
>>>>>>> compactions via reducing cincurent compactors or compaction throughput.
>>>>>>>
>>>>>>> 5.Analyze logs to make sure bootstrapping happened without errors.
>>>>>>>
>>>>>>> 6. Look for other common performance problems such as GC pauses to
>>>>>>> make sure that dropped mutations are not caused by GC pauses.
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Anuj
>>>>>>>
>>>>>>> Sent from Yahoo Mail on Android
>>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>>>>
>>>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi
>>>>>>> <vickrum....@idioplatform.com> wrote:
>>>>>>> # nodetool compactionstats
>>>>>>> pending tasks: 22
>>>>>>>           compaction type        keyspace           table
>>>>>>> completed           total      unit  progress
>>>>>>>                Compactionproduction_analytics    interactions
>>>>>>> 240410213    161172668724     bytes     0.15%
>>>>>>>
>>>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx
>>>>>>> 120815385       226295183     bytes    53.39%
>>>>>>> Active compaction remaining time :   2h39m58s
>>>>>>>
>>>>>>> Worth mentioning that compactions haven't been running on this node
>>>>>>> particularly often. The node's been performing badly regardless of 
>>>>>>> whether
>>>>>>> it's compacting or not.
>>>>>>>
>>>>>>> On 6 January 2016 at 16:35, Jeff Ferland <j...@tubularlabs.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> What’s your output of `nodetool compactionstats`?
>>>>>>>>
>>>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <
>>>>>>>> vickrum....@idioplatform.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We recently added a new node to our cluster in order to replace a
>>>>>>>> node that died (hardware failure we believe). For the next two weeks 
>>>>>>>> it had
>>>>>>>> high disk and network activity. We replaced the server, but it's 
>>>>>>>> happened
>>>>>>>> again. We've looked into memory allowances, disk performance, number of
>>>>>>>> connections, and all the nodetool stats, but can't find the cause of 
>>>>>>>> the
>>>>>>>> issue.
>>>>>>>>
>>>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in
>>>>>>>> comparison to the rest of the cluster, but that's likely a symptom, 
>>>>>>>> not a
>>>>>>>> cause.
>>>>>>>>
>>>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The
>>>>>>>> bad node (D) has less data.
>>>>>>>>
>>>>>>>> Disk Activity[2] and Network activity[3] on this node is far higher
>>>>>>>> than the rest.
>>>>>>>>
>>>>>>>> The only other difference this node has to the rest of the cluster
>>>>>>>> is that its on the ext4 filesystem, whereas the rest are ext3, but 
>>>>>>>> we've
>>>>>>>> done plenty of testing there and can't see how that would affect
>>>>>>>> performance on this node so much.
>>>>>>>>
>>>>>>>> Nothing of note in system.log.
>>>>>>>>
>>>>>>>> What should our next step be in trying to diagnose this issue?
>>>>>>>>
>>>>>>>> Best wishes,
>>>>>>>> Vic
>>>>>>>>
>>>>>>>> [0] `nodetool tpstats` output:
>>>>>>>>
>>>>>>>> Good node:
>>>>>>>>     Pool Name                    Active   Pending      Completed
>>>>>>>> Blocked  All time blocked
>>>>>>>>     ReadStage                         0         0
>>>>>>>> 46311521         0                 0
>>>>>>>>     RequestResponseStage              0         0
>>>>>>>> 23817366         0                 0
>>>>>>>>     MutationStage                     0         0
>>>>>>>> 47389269         0                 0
>>>>>>>>     ReadRepairStage                   0         0
>>>>>>>> 11108         0                 0
>>>>>>>>     ReplicateOnWriteStage             0         0
>>>>>>>> 0         0                 0
>>>>>>>>     GossipStage                       0         0
>>>>>>>> 5259908         0                 0
>>>>>>>>     CacheCleanupExecutor              0         0
>>>>>>>> 0         0                 0
>>>>>>>>     MigrationStage                    0         0
>>>>>>>> 30         0                 0
>>>>>>>>     MemoryMeter                       0         0
>>>>>>>> 16563         0                 0
>>>>>>>>     FlushWriter                       0         0
>>>>>>>> 39637         0                26
>>>>>>>>     ValidationExecutor                0         0
>>>>>>>> 19013         0                 0
>>>>>>>>     InternalResponseStage             0         0
>>>>>>>> 9         0                 0
>>>>>>>>     AntiEntropyStage                  0         0
>>>>>>>> 38026         0                 0
>>>>>>>>     MemtablePostFlusher               0         0
>>>>>>>> 81740         0                 0
>>>>>>>>     MiscStage                         0         0
>>>>>>>> 19196         0                 0
>>>>>>>>     PendingRangeCalculator            0         0
>>>>>>>> 23         0                 0
>>>>>>>>     CompactionExecutor                0         0
>>>>>>>> 61629         0                 0
>>>>>>>>     commitlog_archiver                0         0
>>>>>>>> 0         0                 0
>>>>>>>>     HintedHandoff                     0         0
>>>>>>>> 63         0                 0
>>>>>>>>
>>>>>>>>     Message type           Dropped
>>>>>>>>     RANGE_SLICE                  0
>>>>>>>>     READ_REPAIR                  0
>>>>>>>>     PAGED_RANGE                  0
>>>>>>>>     BINARY                       0
>>>>>>>>     READ                       640
>>>>>>>>     MUTATION                     0
>>>>>>>>     _TRACE                       0
>>>>>>>>     REQUEST_RESPONSE             0
>>>>>>>>     COUNTER_MUTATION             0
>>>>>>>>
>>>>>>>> Bad node:
>>>>>>>>     Pool Name                    Active   Pending      Completed
>>>>>>>> Blocked  All time blocked
>>>>>>>>     ReadStage                        32       113
>>>>>>>> 52216         0                 0
>>>>>>>>     RequestResponseStage              0         0
>>>>>>>> 4167         0                 0
>>>>>>>>     MutationStage                     0         0
>>>>>>>> 127559         0                 0
>>>>>>>>     ReadRepairStage                   0         0
>>>>>>>> 125         0                 0
>>>>>>>>     ReplicateOnWriteStage             0         0
>>>>>>>> 0         0                 0
>>>>>>>>     GossipStage                       0         0
>>>>>>>> 9965         0                 0
>>>>>>>>     CacheCleanupExecutor              0         0
>>>>>>>> 0         0                 0
>>>>>>>>     MigrationStage                    0         0
>>>>>>>> 0         0                 0
>>>>>>>>     MemoryMeter                       0         0
>>>>>>>> 24         0                 0
>>>>>>>>     FlushWriter                       0         0
>>>>>>>> 27         0                 1
>>>>>>>>     ValidationExecutor                0         0
>>>>>>>> 0         0                 0
>>>>>>>>     InternalResponseStage             0         0
>>>>>>>> 0         0                 0
>>>>>>>>     AntiEntropyStage                  0         0
>>>>>>>> 0         0                 0
>>>>>>>>     MemtablePostFlusher               0         0
>>>>>>>> 96         0                 0
>>>>>>>>     MiscStage                         0         0
>>>>>>>> 0         0                 0
>>>>>>>>     PendingRangeCalculator            0         0
>>>>>>>> 10         0                 0
>>>>>>>>     CompactionExecutor                1         1
>>>>>>>> 73         0                 0
>>>>>>>>     commitlog_archiver                0         0
>>>>>>>> 0         0                 0
>>>>>>>>     HintedHandoff                     0         0
>>>>>>>> 15         0                 0
>>>>>>>>
>>>>>>>>     Message type           Dropped
>>>>>>>>     RANGE_SLICE                130
>>>>>>>>     READ_REPAIR                  1
>>>>>>>>     PAGED_RANGE                  0
>>>>>>>>     BINARY                       0
>>>>>>>>     READ                     31032
>>>>>>>>     MUTATION                   865
>>>>>>>>     _TRACE                       0
>>>>>>>>     REQUEST_RESPONSE             7
>>>>>>>>     COUNTER_MUTATION             0
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] `nodetool status` output:
>>>>>>>>
>>>>>>>>     Status=Up/Down
>>>>>>>>     |/ State=Normal/Leaving/Joining/Moving
>>>>>>>>     --  Address         Load       Tokens  Owns   Host
>>>>>>>> ID                               Rack
>>>>>>>>     UN  A (Good)        252.37 GB  256     23.0%
>>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f  rack1
>>>>>>>>     UN  B (Good)        245.91 GB  256     24.4%
>>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44  rack1
>>>>>>>>     UN  C (Good)        254.79 GB  256     23.7%
>>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6  rack1
>>>>>>>>     UN  D (Bad)         163.85 GB  256     28.8%
>>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3  rack1
>>>>>>>>
>>>>>>>> [2] Disk read/write ops:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png
>>>>>>>>
>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png
>>>>>>>>
>>>>>>>> [3] Network in/out:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png
>>>>>>>>
>>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: New node has high network and disk usage.

Reply via email to