A summary of what we've done this morning: - Noted that there are no GCInspector lines in system.log on bad node (there are GCInspector logs on other healthy nodes) - Turned on GC logging, noted that we had logs which stated out total time for which application threads were stopped was high - ~10s. - Not seeing failures or any kind (promotion or concurrent mark) - Attached Visual VM: noted that heap usage was very low (~5% usage and stable) and it didn't display hallmarks GC of activity. PermGen also very stable - Downloaded GC logs and examined in GC Viewer. Noted that: - We had lots of pauses (again around 10s), but no full GC. - From a 2,300s sample, just over 2,000s were spent with threads paused - Spotted many small GCs in the new space - realised that Xmn value was very low (200M against a heap size of 3750M). Increased Xmn to 937M - no change in server behaviour (high load, high reads/s on disk, high CPU wait)
Current output of jstat: S0 S1 E O P YGC YGCT FGC FGCT GCT 2 0.00 45.20 12.82 26.84 76.21 2333 63.684 2 0.039 63.724 3 63.58 0.00 33.68 8.04 75.19 14 1.812 2 0.103 1.915 Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than 2 (which has normal load statistics). Anywhere else you can recommend we look? Griff On 14 January 2016 at 01:25, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > Ok. I saw dropped mutations on your cluster and full gc is a common cause > for that. > Can you just search the word GCInspector in system.log and share the > frequency of minor and full gc. Moreover, are you printing promotion > failures in gc logs?? Why full gc ia getting triggered??promotion failures > or concurrent mode failures? > > If you are on CMS, you need to fine tune your heap options to address full > gc. > > > > Thanks > Anuj > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Thu, 14 Jan, 2016 at 12:57 am, James Griffin > <james.grif...@idioplatform.com> wrote: > I think I was incorrect in assuming GC wasn't an issue due to the lack of > logs. Comparing jstat output on nodes 2 & 3 show some fairly marked > differences, though > comparing the startup flags on the two machines show the GC config is > identical.: > > $ jstat -gcutil > S0 S1 E O P YGC YGCT FGC FGCT GCT > 2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597 621.424 > 3 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383 > 11283.361 > > Here's typical output for iostat on nodes 2 & 3 as well: > > $ iostat -dmx md0 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > 2 md0 0.00 0.00 339.00 0.00 9.77 0.00 > 59.00 0.00 0.00 0.00 0.00 0.00 0.00 > 3 md0 0.00 0.00 2069.00 1.00 85.85 0.00 > 84.94 0.00 0.00 0.00 0.00 0.00 0.00 > > Griff > > On 13 January 2016 at 18:36, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > >> Node 2 has slightly higher data but that should be ok. Not sure how read >> ops are so high when no IO intensive activity such as repair and compaction >> is running on node 3.May be you can try investigating logs to see whats >> happening. >> >> Others on the mailing list could also share their views on the situation. >> >> Thanks >> Anuj >> >> >> >> Sent from Yahoo Mail on Android >> <https://overview.mail.yahoo.com/mobile/?.src=Android> >> >> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin >> <james.grif...@idioplatform.com> wrote: >> Hi Anuj, >> >> Below is the output of nodetool status. The nodes were replaced following >> the instructions in Datastax documentation for replacing running nodes >> since the nodes were running fine, it was that the servers had been >> incorrectly initialised and they thus had less disk space. The status below >> shows 2 has significantly higher load, however as I say 2 is operating >> normally and is running compactions, so I guess that's not an issue? >> >> Datacenter: datacenter1 >> ======================= >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns Host ID >> Rack >> UN 1 253.59 GB 256 31.7% >> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1 >> UN 2 302.23 GB 256 35.3% >> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1 >> UN 3 265.02 GB 256 33.1% >> 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1 >> >> Griff >> >> On 13 January 2016 at 18:12, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: >> >>> Hi, >>> >>> Revisiting the thread I can see that nodetool status had both good and >>> bad nodes at same time. How do you replace nodes? When you say bad node..I >>> understand that the node is no more usable even though Cassandra is UP? Is >>> that correct? >>> >>> If a node is in bad shape and not working, adding new node may trigger >>> streaming huge data from bad node too. Have you considered using the >>> procedure for replacing a dead node? >>> >>> Please share Latest nodetool status. >>> >>> nodetool output shared earlier: >>> >>> `nodetool status` output: >>> >>> Status=Up/Down >>> |/ State=Normal/Leaving/Joining/Moving >>> -- Address Load Tokens Owns Host >>> ID Rack >>> UN A (Good) 252.37 GB 256 23.0% >>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1 >>> UN B (Good) 245.91 GB 256 24.4% >>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1 >>> UN C (Good) 254.79 GB 256 23.7% >>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1 >>> UN D (Bad) 163.85 GB 256 28.8% >>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1 >>> >>> >>> >>> Thanks >>> Anuj >>> >>> Sent from Yahoo Mail on Android >>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>> >>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin >>> <james.grif...@idioplatform.com> wrote: >>> Hi all, >>> >>> We’ve spent a few days running things but are in the same position. To >>> add some more flavour: >>> >>> >>> - We have a 3-node ring, replication factor = 3. We’ve been running >>> in this configuration for a few years without any real issues >>> - Nodes 2 & 3 are much newer than node 1. These two nodes were >>> brought in to replace two other nodes which had failed RAID0 >>> configuration >>> and thus were lacking in disk space. >>> - When node 2 was brought into the ring, it exhibited high CPU wait, >>> IO and load metrics >>> - We subsequently brought 3 into the ring: as soon as 3 was fully >>> bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal >>> levels. Those same stats on 3, however, sky-rocketed >>> - We’ve confirmed configuration across all three nodes are identical >>> and in line with the recommended production settings >>> - We’ve run a full repair >>> - Node 2 is currently running compactions, 1 & 3 aren’t and have no >>> pending >>> - There is no GC happening from what I can see. Node 1 has a GC log, >>> but that’s not been written to since May last year >>> >>> >>> What we’re seeing at the moment is similar and normal stats on nodes 1 & >>> 2, but high CPU wait, IO and load stats on 3. As a snapshot: >>> >>> >>> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s >>> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s >>> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s >>> >>> >>> Can you recommend any next steps? >>> >>> Griff >>> >>> On 6 January 2016 at 17:31, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: >>> >>>> Hi Vickrum, >>>> >>>> I would have proceeded with diagnosis as follows: >>>> >>>> 1. Analysis of sar report to check system health -cpu memory swap disk >>>> etc. >>>> System seems to be overloaded. This is evident from mutation drops. >>>> >>>> 2. Make sure that all recommended Cassandra production settings >>>> available at Datastax site are applied ,disable zone reclaim and THP. >>>> >>>> 3.Run full Repair on bad node and check data size. Node is owner of >>>> maximum token range but has significant lower data.I doubt that >>>> bootstrapping happened properly. >>>> >>>> 4.Compactionstats shows 22 pending compactions. Try throttling >>>> compactions via reducing cincurent compactors or compaction throughput. >>>> >>>> 5.Analyze logs to make sure bootstrapping happened without errors. >>>> >>>> 6. Look for other common performance problems such as GC pauses to make >>>> sure that dropped mutations are not caused by GC pauses. >>>> >>>> >>>> Thanks >>>> Anuj >>>> >>>> Sent from Yahoo Mail on Android >>>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>>> >>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi >>>> <vickrum....@idioplatform.com> wrote: >>>> # nodetool compactionstats >>>> pending tasks: 22 >>>> compaction type keyspace table >>>> completed total unit progress >>>> Compactionproduction_analytics interactions >>>> 240410213 161172668724 bytes 0.15% >>>> >>>> Compactionproduction_decisionsdecisions.decisions_q_idx >>>> 120815385 226295183 bytes 53.39% >>>> Active compaction remaining time : 2h39m58s >>>> >>>> Worth mentioning that compactions haven't been running on this node >>>> particularly often. The node's been performing badly regardless of whether >>>> it's compacting or not. >>>> >>>> On 6 January 2016 at 16:35, Jeff Ferland <j...@tubularlabs.com> wrote: >>>> >>>>> What’s your output of `nodetool compactionstats`? >>>>> >>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vickrum....@idioplatform.com> >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> We recently added a new node to our cluster in order to replace a node >>>>> that died (hardware failure we believe). For the next two weeks it had >>>>> high >>>>> disk and network activity. We replaced the server, but it's happened >>>>> again. >>>>> We've looked into memory allowances, disk performance, number of >>>>> connections, and all the nodetool stats, but can't find the cause of the >>>>> issue. >>>>> >>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in >>>>> comparison to the rest of the cluster, but that's likely a symptom, not a >>>>> cause. >>>>> >>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad >>>>> node (D) has less data. >>>>> >>>>> Disk Activity[2] and Network activity[3] on this node is far higher >>>>> than the rest. >>>>> >>>>> The only other difference this node has to the rest of the cluster is >>>>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done >>>>> plenty of testing there and can't see how that would affect performance on >>>>> this node so much. >>>>> >>>>> Nothing of note in system.log. >>>>> >>>>> What should our next step be in trying to diagnose this issue? >>>>> >>>>> Best wishes, >>>>> Vic >>>>> >>>>> [0] `nodetool tpstats` output: >>>>> >>>>> Good node: >>>>> Pool Name Active Pending Completed >>>>> Blocked All time blocked >>>>> ReadStage 0 0 >>>>> 46311521 0 0 >>>>> RequestResponseStage 0 0 >>>>> 23817366 0 0 >>>>> MutationStage 0 0 >>>>> 47389269 0 0 >>>>> ReadRepairStage 0 0 >>>>> 11108 0 0 >>>>> ReplicateOnWriteStage 0 0 >>>>> 0 0 0 >>>>> GossipStage 0 0 >>>>> 5259908 0 0 >>>>> CacheCleanupExecutor 0 0 >>>>> 0 0 0 >>>>> MigrationStage 0 0 >>>>> 30 0 0 >>>>> MemoryMeter 0 0 >>>>> 16563 0 0 >>>>> FlushWriter 0 0 >>>>> 39637 0 26 >>>>> ValidationExecutor 0 0 >>>>> 19013 0 0 >>>>> InternalResponseStage 0 0 >>>>> 9 0 0 >>>>> AntiEntropyStage 0 0 >>>>> 38026 0 0 >>>>> MemtablePostFlusher 0 0 >>>>> 81740 0 0 >>>>> MiscStage 0 0 >>>>> 19196 0 0 >>>>> PendingRangeCalculator 0 0 >>>>> 23 0 0 >>>>> CompactionExecutor 0 0 >>>>> 61629 0 0 >>>>> commitlog_archiver 0 0 >>>>> 0 0 0 >>>>> HintedHandoff 0 0 >>>>> 63 0 0 >>>>> >>>>> Message type Dropped >>>>> RANGE_SLICE 0 >>>>> READ_REPAIR 0 >>>>> PAGED_RANGE 0 >>>>> BINARY 0 >>>>> READ 640 >>>>> MUTATION 0 >>>>> _TRACE 0 >>>>> REQUEST_RESPONSE 0 >>>>> COUNTER_MUTATION 0 >>>>> >>>>> Bad node: >>>>> Pool Name Active Pending Completed >>>>> Blocked All time blocked >>>>> ReadStage 32 113 >>>>> 52216 0 0 >>>>> RequestResponseStage 0 0 >>>>> 4167 0 0 >>>>> MutationStage 0 0 >>>>> 127559 0 0 >>>>> ReadRepairStage 0 0 >>>>> 125 0 0 >>>>> ReplicateOnWriteStage 0 0 >>>>> 0 0 0 >>>>> GossipStage 0 0 >>>>> 9965 0 0 >>>>> CacheCleanupExecutor 0 0 >>>>> 0 0 0 >>>>> MigrationStage 0 0 >>>>> 0 0 0 >>>>> MemoryMeter 0 0 >>>>> 24 0 0 >>>>> FlushWriter 0 0 >>>>> 27 0 1 >>>>> ValidationExecutor 0 0 >>>>> 0 0 0 >>>>> InternalResponseStage 0 0 >>>>> 0 0 0 >>>>> AntiEntropyStage 0 0 >>>>> 0 0 0 >>>>> MemtablePostFlusher 0 0 >>>>> 96 0 0 >>>>> MiscStage 0 0 >>>>> 0 0 0 >>>>> PendingRangeCalculator 0 0 >>>>> 10 0 0 >>>>> CompactionExecutor 1 1 >>>>> 73 0 0 >>>>> commitlog_archiver 0 0 >>>>> 0 0 0 >>>>> HintedHandoff 0 0 >>>>> 15 0 0 >>>>> >>>>> Message type Dropped >>>>> RANGE_SLICE 130 >>>>> READ_REPAIR 1 >>>>> PAGED_RANGE 0 >>>>> BINARY 0 >>>>> READ 31032 >>>>> MUTATION 865 >>>>> _TRACE 0 >>>>> REQUEST_RESPONSE 7 >>>>> COUNTER_MUTATION 0 >>>>> >>>>> >>>>> [1] `nodetool status` output: >>>>> >>>>> Status=Up/Down >>>>> |/ State=Normal/Leaving/Joining/Moving >>>>> -- Address Load Tokens Owns Host >>>>> ID Rack >>>>> UN A (Good) 252.37 GB 256 23.0% >>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1 >>>>> UN B (Good) 245.91 GB 256 24.4% >>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1 >>>>> UN C (Good) 254.79 GB 256 23.7% >>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1 >>>>> UN D (Bad) 163.85 GB 256 28.8% >>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1 >>>>> >>>>> [2] Disk read/write ops: >>>>> >>>>> >>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png >>>>> >>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png >>>>> >>>>> [3] Network in/out: >>>>> >>>>> >>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png >>>>> >>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png >>>>> >>>>> >>>>> >>>> >>> >> >