I think I was incorrect in assuming GC wasn't an issue due to the lack of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked differences, though comparing the startup flags on the two machines show the GC config is identical.:
$ jstat -gcutil S0 S1 E O P YGC YGCT FGC FGCT GCT 2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597 621.424 3 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383 11283.361 Here's typical output for iostat on nodes 2 & 3 as well: $ iostat -dmx md0 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util 2 md0 0.00 0.00 339.00 0.00 9.77 0.00 59.00 0.00 0.00 0.00 0.00 0.00 0.00 3 md0 0.00 0.00 2069.00 1.00 85.85 0.00 84.94 0.00 0.00 0.00 0.00 0.00 0.00 Griff On 13 January 2016 at 18:36, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > Node 2 has slightly higher data but that should be ok. Not sure how read > ops are so high when no IO intensive activity such as repair and compaction > is running on node 3.May be you can try investigating logs to see whats > happening. > > Others on the mailing list could also share their views on the situation. > > Thanks > Anuj > > > > Sent from Yahoo Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin > <james.grif...@idioplatform.com> wrote: > Hi Anuj, > > Below is the output of nodetool status. The nodes were replaced following > the instructions in Datastax documentation for replacing running nodes > since the nodes were running fine, it was that the servers had been > incorrectly initialised and they thus had less disk space. The status below > shows 2 has significantly higher load, however as I say 2 is operating > normally and is running compactions, so I guess that's not an issue? > > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN 1 253.59 GB 256 31.7% > 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1 > UN 2 302.23 GB 256 35.3% > faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1 > UN 3 265.02 GB 256 33.1% > 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1 > > Griff > > On 13 January 2016 at 18:12, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: > >> Hi, >> >> Revisiting the thread I can see that nodetool status had both good and >> bad nodes at same time. How do you replace nodes? When you say bad node..I >> understand that the node is no more usable even though Cassandra is UP? Is >> that correct? >> >> If a node is in bad shape and not working, adding new node may trigger >> streaming huge data from bad node too. Have you considered using the >> procedure for replacing a dead node? >> >> Please share Latest nodetool status. >> >> nodetool output shared earlier: >> >> `nodetool status` output: >> >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns Host >> ID Rack >> UN A (Good) 252.37 GB 256 23.0% >> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1 >> UN B (Good) 245.91 GB 256 24.4% >> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1 >> UN C (Good) 254.79 GB 256 23.7% >> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1 >> UN D (Bad) 163.85 GB 256 28.8% >> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1 >> >> >> >> Thanks >> Anuj >> >> Sent from Yahoo Mail on Android >> <https://overview.mail.yahoo.com/mobile/?.src=Android> >> >> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin >> <james.grif...@idioplatform.com> wrote: >> Hi all, >> >> We’ve spent a few days running things but are in the same position. To >> add some more flavour: >> >> >> - We have a 3-node ring, replication factor = 3. We’ve been running >> in this configuration for a few years without any real issues >> - Nodes 2 & 3 are much newer than node 1. These two nodes were >> brought in to replace two other nodes which had failed RAID0 configuration >> and thus were lacking in disk space. >> - When node 2 was brought into the ring, it exhibited high CPU wait, >> IO and load metrics >> - We subsequently brought 3 into the ring: as soon as 3 was fully >> bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal >> levels. Those same stats on 3, however, sky-rocketed >> - We’ve confirmed configuration across all three nodes are identical >> and in line with the recommended production settings >> - We’ve run a full repair >> - Node 2 is currently running compactions, 1 & 3 aren’t and have no >> pending >> - There is no GC happening from what I can see. Node 1 has a GC log, >> but that’s not been written to since May last year >> >> >> What we’re seeing at the moment is similar and normal stats on nodes 1 & >> 2, but high CPU wait, IO and load stats on 3. As a snapshot: >> >> >> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s >> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s >> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s >> >> >> Can you recommend any next steps? >> >> Griff >> >> On 6 January 2016 at 17:31, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote: >> >>> Hi Vickrum, >>> >>> I would have proceeded with diagnosis as follows: >>> >>> 1. Analysis of sar report to check system health -cpu memory swap disk >>> etc. >>> System seems to be overloaded. This is evident from mutation drops. >>> >>> 2. Make sure that all recommended Cassandra production settings >>> available at Datastax site are applied ,disable zone reclaim and THP. >>> >>> 3.Run full Repair on bad node and check data size. Node is owner of >>> maximum token range but has significant lower data.I doubt that >>> bootstrapping happened properly. >>> >>> 4.Compactionstats shows 22 pending compactions. Try throttling >>> compactions via reducing cincurent compactors or compaction throughput. >>> >>> 5.Analyze logs to make sure bootstrapping happened without errors. >>> >>> 6. Look for other common performance problems such as GC pauses to make >>> sure that dropped mutations are not caused by GC pauses. >>> >>> >>> Thanks >>> Anuj >>> >>> Sent from Yahoo Mail on Android >>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>> >>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi >>> <vickrum....@idioplatform.com> wrote: >>> # nodetool compactionstats >>> pending tasks: 22 >>> compaction type keyspace table >>> completed total unit progress >>> Compactionproduction_analytics interactions >>> 240410213 161172668724 bytes 0.15% >>> >>> Compactionproduction_decisionsdecisions.decisions_q_idx >>> 120815385 226295183 bytes 53.39% >>> Active compaction remaining time : 2h39m58s >>> >>> Worth mentioning that compactions haven't been running on this node >>> particularly often. The node's been performing badly regardless of whether >>> it's compacting or not. >>> >>> On 6 January 2016 at 16:35, Jeff Ferland <j...@tubularlabs.com> wrote: >>> >>>> What’s your output of `nodetool compactionstats`? >>>> >>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi <vickrum....@idioplatform.com> >>>> wrote: >>>> >>>> Hi, >>>> >>>> We recently added a new node to our cluster in order to replace a node >>>> that died (hardware failure we believe). For the next two weeks it had high >>>> disk and network activity. We replaced the server, but it's happened again. >>>> We've looked into memory allowances, disk performance, number of >>>> connections, and all the nodetool stats, but can't find the cause of the >>>> issue. >>>> >>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in >>>> comparison to the rest of the cluster, but that's likely a symptom, not a >>>> cause. >>>> >>>> `nodetool status`[1] shows the cluster isn't quite balanced. The bad >>>> node (D) has less data. >>>> >>>> Disk Activity[2] and Network activity[3] on this node is far higher >>>> than the rest. >>>> >>>> The only other difference this node has to the rest of the cluster is >>>> that its on the ext4 filesystem, whereas the rest are ext3, but we've done >>>> plenty of testing there and can't see how that would affect performance on >>>> this node so much. >>>> >>>> Nothing of note in system.log. >>>> >>>> What should our next step be in trying to diagnose this issue? >>>> >>>> Best wishes, >>>> Vic >>>> >>>> [0] `nodetool tpstats` output: >>>> >>>> Good node: >>>> Pool Name Active Pending Completed >>>> Blocked All time blocked >>>> ReadStage 0 0 >>>> 46311521 0 0 >>>> RequestResponseStage 0 0 >>>> 23817366 0 0 >>>> MutationStage 0 0 >>>> 47389269 0 0 >>>> ReadRepairStage 0 0 >>>> 11108 0 0 >>>> ReplicateOnWriteStage 0 0 >>>> 0 0 0 >>>> GossipStage 0 0 >>>> 5259908 0 0 >>>> CacheCleanupExecutor 0 0 >>>> 0 0 0 >>>> MigrationStage 0 0 >>>> 30 0 0 >>>> MemoryMeter 0 0 >>>> 16563 0 0 >>>> FlushWriter 0 0 >>>> 39637 0 26 >>>> ValidationExecutor 0 0 >>>> 19013 0 0 >>>> InternalResponseStage 0 0 >>>> 9 0 0 >>>> AntiEntropyStage 0 0 >>>> 38026 0 0 >>>> MemtablePostFlusher 0 0 >>>> 81740 0 0 >>>> MiscStage 0 0 >>>> 19196 0 0 >>>> PendingRangeCalculator 0 0 >>>> 23 0 0 >>>> CompactionExecutor 0 0 >>>> 61629 0 0 >>>> commitlog_archiver 0 0 >>>> 0 0 0 >>>> HintedHandoff 0 0 >>>> 63 0 0 >>>> >>>> Message type Dropped >>>> RANGE_SLICE 0 >>>> READ_REPAIR 0 >>>> PAGED_RANGE 0 >>>> BINARY 0 >>>> READ 640 >>>> MUTATION 0 >>>> _TRACE 0 >>>> REQUEST_RESPONSE 0 >>>> COUNTER_MUTATION 0 >>>> >>>> Bad node: >>>> Pool Name Active Pending Completed >>>> Blocked All time blocked >>>> ReadStage 32 113 >>>> 52216 0 0 >>>> RequestResponseStage 0 0 >>>> 4167 0 0 >>>> MutationStage 0 0 >>>> 127559 0 0 >>>> ReadRepairStage 0 0 >>>> 125 0 0 >>>> ReplicateOnWriteStage 0 0 >>>> 0 0 0 >>>> GossipStage 0 0 >>>> 9965 0 0 >>>> CacheCleanupExecutor 0 0 >>>> 0 0 0 >>>> MigrationStage 0 0 >>>> 0 0 0 >>>> MemoryMeter 0 0 >>>> 24 0 0 >>>> FlushWriter 0 0 >>>> 27 0 1 >>>> ValidationExecutor 0 0 >>>> 0 0 0 >>>> InternalResponseStage 0 0 >>>> 0 0 0 >>>> AntiEntropyStage 0 0 >>>> 0 0 0 >>>> MemtablePostFlusher 0 0 >>>> 96 0 0 >>>> MiscStage 0 0 >>>> 0 0 0 >>>> PendingRangeCalculator 0 0 >>>> 10 0 0 >>>> CompactionExecutor 1 1 >>>> 73 0 0 >>>> commitlog_archiver 0 0 >>>> 0 0 0 >>>> HintedHandoff 0 0 >>>> 15 0 0 >>>> >>>> Message type Dropped >>>> RANGE_SLICE 130 >>>> READ_REPAIR 1 >>>> PAGED_RANGE 0 >>>> BINARY 0 >>>> READ 31032 >>>> MUTATION 865 >>>> _TRACE 0 >>>> REQUEST_RESPONSE 7 >>>> COUNTER_MUTATION 0 >>>> >>>> >>>> [1] `nodetool status` output: >>>> >>>> Status=Up/Down >>>> |/ State=Normal/Leaving/Joining/Moving >>>> -- Address Load Tokens Owns Host >>>> ID Rack >>>> UN A (Good) 252.37 GB 256 23.0% >>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1 >>>> UN B (Good) 245.91 GB 256 24.4% >>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1 >>>> UN C (Good) 254.79 GB 256 23.7% >>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1 >>>> UN D (Bad) 163.85 GB 256 28.8% >>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1 >>>> >>>> [2] Disk read/write ops: >>>> >>>> >>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png >>>> >>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png >>>> >>>> [3] Network in/out: >>>> >>>> >>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png >>>> >>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png >>>> >>>> >>>> >>> >> >