James, Can you post the result of "nodetool netstats" on the bad node?
On Thu, Jan 14, 2016 at 9:09 AM, James Griffin < james.grif...@idioplatform.com> wrote: > A summary of what we've done this morning: > > - Noted that there are no GCInspector lines in system.log on bad node > (there are GCInspector logs on other healthy nodes) > - Turned on GC logging, noted that we had logs which stated out total > time for which application threads were stopped was high - ~10s. > - Not seeing failures or any kind (promotion or concurrent mark) > - Attached Visual VM: noted that heap usage was very low (~5% usage > and stable) and it didn't display hallmarks GC of activity. PermGen also > very stable > - Downloaded GC logs and examined in GC Viewer. Noted that: > - We had lots of pauses (again around 10s), but no full GC. > - From a 2,300s sample, just over 2,000s were spent with threads > paused > - Spotted many small GCs in the new space - realised that Xmn value > was very low (200M against a heap size of 3750M). Increased Xmn to 937M > - > no change in server behaviour (high load, high reads/s on disk, high CPU > wait) > > Current output of jstat: > > S0 S1 E O P YGC YGCT FGC FGCT GCT > 2 0.00 45.20 12.82 26.84 76.21 2333 63.684 2 0.039 63.724 > 3 63.58 0.00 33.68 8.04 75.19 14 1.812 2 0.103 1.915 > > Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than 2 > (which has normal load statistics). > > Anywhere else you can recommend we look? > > Griff > > On 14 January 2016 at 01:25, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > >> Ok. I saw dropped mutations on your cluster and full gc is a common cause >> for that. >> Can you just search the word GCInspector in system.log and share the >> frequency of minor and full gc. Moreover, are you printing promotion >> failures in gc logs?? Why full gc ia getting triggered??promotion failures >> or concurrent mode failures? >> >> If you are on CMS, you need to fine tune your heap options to address >> full gc. >> >> >> >> Thanks >> Anuj >> >> Sent from Yahoo Mail on Android >> <https://overview.mail.yahoo.com/mobile/?.src=Android> >> >> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin >> <james.grif...@idioplatform.com> wrote: >> I think I was incorrect in assuming GC wasn't an issue due to the lack of >> logs. Comparing jstat output on nodes 2 & 3 show some fairly marked >> differences, though >> comparing the startup flags on the two machines show the GC config is >> identical.: >> >> $ jstat -gcutil >> S0 S1 E O P YGC YGCT FGC FGCT GCT >> 2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597 >> 621.424 >> 3 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383 >> 11283.361 >> >> Here's typical output for iostat on nodes 2 & 3 as well: >> >> $ iostat -dmx md0 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >> avgrq-sz avgqu-sz await r_await w_await svctm %util >> 2 md0 0.00 0.00 339.00 0.00 9.77 0.00 >> 59.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 3 md0 0.00 0.00 2069.00 1.00 85.85 0.00 >> 84.94 0.00 0.00 0.00 0.00 0.00 0.00 >> >> Griff >> >> On 13 January 2016 at 18:36, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: >> >>> Node 2 has slightly higher data but that should be ok. Not sure how read >>> ops are so high when no IO intensive activity such as repair and compaction >>> is running on node 3.May be you can try investigating logs to see whats >>> happening. >>> >>> Others on the mailing list could also share their views on the situation. >>> >>> Thanks >>> Anuj >>> >>> >>> >>> Sent from Yahoo Mail on Android >>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>> >>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin >>> <james.grif...@idioplatform.com> wrote: >>> Hi Anuj, >>> >>> Below is the output of nodetool status. The nodes were replaced >>> following the instructions in Datastax documentation for replacing running >>> nodes since the nodes were running fine, it was that the servers had been >>> incorrectly initialised and they thus had less disk space. The status below >>> shows 2 has significantly higher load, however as I say 2 is operating >>> normally and is running compactions, so I guess that's not an issue? >>> >>> Datacenter: datacenter1 >>> ======================= >>> Status=Up/Down >>> |/ State=Normal/Leaving/Joining/Moving >>> -- Address Load Tokens Owns Host ID >>> Rack >>> UN 1 253.59 GB 256 31.7% >>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1 >>> UN 2 302.23 GB 256 35.3% >>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1 >>> UN 3 265.02 GB 256 33.1% >>> 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1 >>> >>> Griff >>> >>> On 13 January 2016 at 18:12, Anuj Wadehra <anujw_2...@yahoo.co.in> >>> wrote: >>> >>>> Hi, >>>> >>>> Revisiting the thread I can see that nodetool status had both good and >>>> bad nodes at same time. How do you replace nodes? When you say bad node..I >>>> understand that the node is no more usable even though Cassandra is UP? Is >>>> that correct? >>>> >>>> If a node is in bad shape and not working, adding new node may trigger >>>> streaming huge data from bad node too. Have you considered using the >>>> procedure for replacing a dead node? >>>> >>>> Please share Latest nodetool status. >>>> >>>> nodetool output shared earlier: >>>> >>>> `nodetool status` output: >>>> >>>> Status=Up/Down >>>> |/ State=Normal/Leaving/Joining/Moving >>>> -- Address Load Tokens Owns Host >>>> ID Rack >>>> UN A (Good) 252.37 GB 256 23.0% >>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1 >>>> UN B (Good) 245.91 GB 256 24.4% >>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1 >>>> UN C (Good) 254.79 GB 256 23.7% >>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1 >>>> UN D (Bad) 163.85 GB 256 28.8% >>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1 >>>> >>>> >>>> >>>> Thanks >>>> Anuj >>>> >>>> Sent from Yahoo Mail on Android >>>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>>> >>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin >>>> <james.grif...@idioplatform.com> wrote: >>>> Hi all, >>>> >>>> We’ve spent a few days running things but are in the same position. To >>>> add some more flavour: >>>> >>>> >>>> - We have a 3-node ring, replication factor = 3. We’ve been running >>>> in this configuration for a few years without any real issues >>>> - Nodes 2 & 3 are much newer than node 1. These two nodes were >>>> brought in to replace two other nodes which had failed RAID0 >>>> configuration >>>> and thus were lacking in disk space. >>>> - When node 2 was brought into the ring, it exhibited high CPU >>>> wait, IO and load metrics >>>> - We subsequently brought 3 into the ring: as soon as 3 was fully >>>> bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal >>>> levels. Those same stats on 3, however, sky-rocketed >>>> - We’ve confirmed configuration across all three nodes are >>>> identical and in line with the recommended production settings >>>> - We’ve run a full repair >>>> - Node 2 is currently running compactions, 1 & 3 aren’t and have no >>>> pending >>>> - There is no GC happening from what I can see. Node 1 has a GC >>>> log, but that’s not been written to since May last year >>>> >>>> >>>> What we’re seeing at the moment is similar and normal stats on nodes 1 >>>> & 2, but high CPU wait, IO and load stats on 3. As a snapshot: >>>> >>>> >>>> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s >>>> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s >>>> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s >>>> >>>> >>>> Can you recommend any next steps? >>>> >>>> Griff >>>> >>>> On 6 January 2016 at 17:31, Anuj Wadehra <anujw_2...@yahoo.co.in> >>>> wrote: >>>> >>>>> Hi Vickrum, >>>>> >>>>> I would have proceeded with diagnosis as follows: >>>>> >>>>> 1. Analysis of sar report to check system health -cpu memory swap >>>>> disk etc. >>>>> System seems to be overloaded. This is evident from mutation drops. >>>>> >>>>> 2. Make sure that all recommended Cassandra production settings >>>>> available at Datastax site are applied ,disable zone reclaim and THP. >>>>> >>>>> 3.Run full Repair on bad node and check data size. Node is owner of >>>>> maximum token range but has significant lower data.I doubt that >>>>> bootstrapping happened properly. >>>>> >>>>> 4.Compactionstats shows 22 pending compactions. Try throttling >>>>> compactions via reducing cincurent compactors or compaction throughput. >>>>> >>>>> 5.Analyze logs to make sure bootstrapping happened without errors. >>>>> >>>>> 6. Look for other common performance problems such as GC pauses to >>>>> make sure that dropped mutations are not caused by GC pauses. >>>>> >>>>> >>>>> Thanks >>>>> Anuj >>>>> >>>>> Sent from Yahoo Mail on Android >>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>>>> >>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi >>>>> <vickrum....@idioplatform.com> wrote: >>>>> # nodetool compactionstats >>>>> pending tasks: 22 >>>>> compaction type keyspace table >>>>> completed total unit progress >>>>> Compactionproduction_analytics interactions >>>>> 240410213 161172668724 bytes 0.15% >>>>> >>>>> Compactionproduction_decisionsdecisions.decisions_q_idx >>>>> 120815385 226295183 bytes 53.39% >>>>> Active compaction remaining time : 2h39m58s >>>>> >>>>> Worth mentioning that compactions haven't been running on this node >>>>> particularly often. The node's been performing badly regardless of whether >>>>> it's compacting or not. >>>>> >>>>> On 6 January 2016 at 16:35, Jeff Ferland <j...@tubularlabs.com> wrote: >>>>> >>>>>> What’s your output of `nodetool compactionstats`? >>>>>> >>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vickrum....@idioplatform.com> >>>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> We recently added a new node to our cluster in order to replace a >>>>>> node that died (hardware failure we believe). For the next two weeks it >>>>>> had >>>>>> high disk and network activity. We replaced the server, but it's happened >>>>>> again. We've looked into memory allowances, disk performance, number of >>>>>> connections, and all the nodetool stats, but can't find the cause of the >>>>>> issue. >>>>>> >>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in >>>>>> comparison to the rest of the cluster, but that's likely a symptom, not a >>>>>> cause. >>>>>> >>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad >>>>>> node (D) has less data. >>>>>> >>>>>> Disk Activity[2] and Network activity[3] on this node is far higher >>>>>> than the rest. >>>>>> >>>>>> The only other difference this node has to the rest of the cluster is >>>>>> that its on the ext4 filesystem, whereas the rest are ext3, but we've >>>>>> done >>>>>> plenty of testing there and can't see how that would affect performance >>>>>> on >>>>>> this node so much. >>>>>> >>>>>> Nothing of note in system.log. >>>>>> >>>>>> What should our next step be in trying to diagnose this issue? >>>>>> >>>>>> Best wishes, >>>>>> Vic >>>>>> >>>>>> [0] `nodetool tpstats` output: >>>>>> >>>>>> Good node: >>>>>> Pool Name Active Pending Completed >>>>>> Blocked All time blocked >>>>>> ReadStage 0 0 >>>>>> 46311521 0 0 >>>>>> RequestResponseStage 0 0 >>>>>> 23817366 0 0 >>>>>> MutationStage 0 0 >>>>>> 47389269 0 0 >>>>>> ReadRepairStage 0 0 >>>>>> 11108 0 0 >>>>>> ReplicateOnWriteStage 0 0 >>>>>> 0 0 0 >>>>>> GossipStage 0 0 >>>>>> 5259908 0 0 >>>>>> CacheCleanupExecutor 0 0 >>>>>> 0 0 0 >>>>>> MigrationStage 0 0 >>>>>> 30 0 0 >>>>>> MemoryMeter 0 0 >>>>>> 16563 0 0 >>>>>> FlushWriter 0 0 >>>>>> 39637 0 26 >>>>>> ValidationExecutor 0 0 >>>>>> 19013 0 0 >>>>>> InternalResponseStage 0 0 >>>>>> 9 0 0 >>>>>> AntiEntropyStage 0 0 >>>>>> 38026 0 0 >>>>>> MemtablePostFlusher 0 0 >>>>>> 81740 0 0 >>>>>> MiscStage 0 0 >>>>>> 19196 0 0 >>>>>> PendingRangeCalculator 0 0 >>>>>> 23 0 0 >>>>>> CompactionExecutor 0 0 >>>>>> 61629 0 0 >>>>>> commitlog_archiver 0 0 >>>>>> 0 0 0 >>>>>> HintedHandoff 0 0 >>>>>> 63 0 0 >>>>>> >>>>>> Message type Dropped >>>>>> RANGE_SLICE 0 >>>>>> READ_REPAIR 0 >>>>>> PAGED_RANGE 0 >>>>>> BINARY 0 >>>>>> READ 640 >>>>>> MUTATION 0 >>>>>> _TRACE 0 >>>>>> REQUEST_RESPONSE 0 >>>>>> COUNTER_MUTATION 0 >>>>>> >>>>>> Bad node: >>>>>> Pool Name Active Pending Completed >>>>>> Blocked All time blocked >>>>>> ReadStage 32 113 >>>>>> 52216 0 0 >>>>>> RequestResponseStage 0 0 >>>>>> 4167 0 0 >>>>>> MutationStage 0 0 >>>>>> 127559 0 0 >>>>>> ReadRepairStage 0 0 >>>>>> 125 0 0 >>>>>> ReplicateOnWriteStage 0 0 >>>>>> 0 0 0 >>>>>> GossipStage 0 0 >>>>>> 9965 0 0 >>>>>> CacheCleanupExecutor 0 0 >>>>>> 0 0 0 >>>>>> MigrationStage 0 0 >>>>>> 0 0 0 >>>>>> MemoryMeter 0 0 >>>>>> 24 0 0 >>>>>> FlushWriter 0 0 >>>>>> 27 0 1 >>>>>> ValidationExecutor 0 0 >>>>>> 0 0 0 >>>>>> InternalResponseStage 0 0 >>>>>> 0 0 0 >>>>>> AntiEntropyStage 0 0 >>>>>> 0 0 0 >>>>>> MemtablePostFlusher 0 0 >>>>>> 96 0 0 >>>>>> MiscStage 0 0 >>>>>> 0 0 0 >>>>>> PendingRangeCalculator 0 0 >>>>>> 10 0 0 >>>>>> CompactionExecutor 1 1 >>>>>> 73 0 0 >>>>>> commitlog_archiver 0 0 >>>>>> 0 0 0 >>>>>> HintedHandoff 0 0 >>>>>> 15 0 0 >>>>>> >>>>>> Message type Dropped >>>>>> RANGE_SLICE 130 >>>>>> READ_REPAIR 1 >>>>>> PAGED_RANGE 0 >>>>>> BINARY 0 >>>>>> READ 31032 >>>>>> MUTATION 865 >>>>>> _TRACE 0 >>>>>> REQUEST_RESPONSE 7 >>>>>> COUNTER_MUTATION 0 >>>>>> >>>>>> >>>>>> [1] `nodetool status` output: >>>>>> >>>>>> Status=Up/Down >>>>>> |/ State=Normal/Leaving/Joining/Moving >>>>>> -- Address Load Tokens Owns Host >>>>>> ID Rack >>>>>> UN A (Good) 252.37 GB 256 23.0% >>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1 >>>>>> UN B (Good) 245.91 GB 256 24.4% >>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1 >>>>>> UN C (Good) 254.79 GB 256 23.7% >>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1 >>>>>> UN D (Bad) 163.85 GB 256 28.8% >>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1 >>>>>> >>>>>> [2] Disk read/write ops: >>>>>> >>>>>> >>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png >>>>>> >>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png >>>>>> >>>>>> [3] Network in/out: >>>>>> >>>>>> >>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png >>>>>> >>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >