James, I may miss something. You mentioned your cluster had RF=3. Then why does "nodetool status" show each node owns 1/3 of the data especially after a full repair?
On Thu, Jan 14, 2016 at 9:56 AM, James Griffin < james.grif...@idioplatform.com> wrote: > Hi Kai, > > Below - nothing going on that I can see > > $ nodetool netstats > Mode: NORMAL > Not sending any streams. > Read Repair Statistics: > Attempted: 0 > Mismatch (Blocking): 0 > Mismatch (Background): 0 > Pool Name Active Pending Completed > Commands n/a 0 6326 > Responses n/a 0 219356 > > > > Best wishes, > > Griff > > [image: idioplatform] <http://idioplatform.com/>James "Griff" Griffin > CTO > Switchboard: +44 (0)20 3540 1920 | Direct: +44 (0)7763 139 206 | Twitter: > @imaginaryroots <http://twitter.com/imaginaryroots> | Skype: j.s.griffin > idio helps major brands and publishers to build closer relationships with > their customers and prospects by learning from their content consumption > and acting on that insight. We call it Content Intelligence, and it > integrates with your existing marketing technology to provide detailed > customer interest profiles in real-time across all channels, and to > personalize content into every channel for every customer. See > http://idioplatform.com > <https://t.yesware.com/tl/0e637e4938676b6f3897def79d0810a71e59612e/10068de2036c2daf922e0a879bb2fe92/9dae8be0f7693bf2b28a88cc4b38c554?ytl=http%3A%2F%2Fidioplatform.com%2F> > for > more information. > > On 14 January 2016 at 14:22, Kai Wang <dep...@gmail.com> wrote: > >> James, >> >> Can you post the result of "nodetool netstats" on the bad node? >> >> On Thu, Jan 14, 2016 at 9:09 AM, James Griffin < >> james.grif...@idioplatform.com> wrote: >> >>> A summary of what we've done this morning: >>> >>> - Noted that there are no GCInspector lines in system.log on bad >>> node (there are GCInspector logs on other healthy nodes) >>> - Turned on GC logging, noted that we had logs which stated out >>> total time for which application threads were stopped was high - ~10s. >>> - Not seeing failures or any kind (promotion or concurrent mark) >>> - Attached Visual VM: noted that heap usage was very low (~5% usage >>> and stable) and it didn't display hallmarks GC of activity. PermGen also >>> very stable >>> - Downloaded GC logs and examined in GC Viewer. Noted that: >>> - We had lots of pauses (again around 10s), but no full GC. >>> - From a 2,300s sample, just over 2,000s were spent with threads >>> paused >>> - Spotted many small GCs in the new space - realised that Xmn >>> value was very low (200M against a heap size of 3750M). Increased Xmn >>> to >>> 937M - no change in server behaviour (high load, high reads/s on >>> disk, high >>> CPU wait) >>> >>> Current output of jstat: >>> >>> S0 S1 E O P YGC YGCT FGC FGCT GCT >>> 2 0.00 45.20 12.82 26.84 76.21 2333 63.684 2 0.039 >>> 63.724 >>> 3 63.58 0.00 33.68 8.04 75.19 14 1.812 2 0.103 >>> 1.915 >>> >>> Correct me if I'm wrong, but it seems 3 is lot more healthy GC wise than >>> 2 (which has normal load statistics). >>> >>> Anywhere else you can recommend we look? >>> >>> Griff >>> >>> On 14 January 2016 at 01:25, Anuj Wadehra <anujw_2...@yahoo.co.in> >>> wrote: >>> >>>> Ok. I saw dropped mutations on your cluster and full gc is a common >>>> cause for that. >>>> Can you just search the word GCInspector in system.log and share the >>>> frequency of minor and full gc. Moreover, are you printing promotion >>>> failures in gc logs?? Why full gc ia getting triggered??promotion failures >>>> or concurrent mode failures? >>>> >>>> If you are on CMS, you need to fine tune your heap options to address >>>> full gc. >>>> >>>> >>>> >>>> Thanks >>>> Anuj >>>> >>>> Sent from Yahoo Mail on Android >>>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>>> >>>> On Thu, 14 Jan, 2016 at 12:57 am, James Griffin >>>> <james.grif...@idioplatform.com> wrote: >>>> I think I was incorrect in assuming GC wasn't an issue due to the lack >>>> of logs. Comparing jstat output on nodes 2 & 3 show some fairly marked >>>> differences, though >>>> comparing the startup flags on the two machines show the GC config is >>>> identical.: >>>> >>>> $ jstat -gcutil >>>> S0 S1 E O P YGC YGCT FGC FGCT GCT >>>> 2 5.08 0.00 55.72 18.24 59.90 25986 619.827 28 1.597 >>>> 621.424 >>>> 3 0.00 0.00 22.79 17.87 59.99 422600 11225.979 668 57.383 >>>> 11283.361 >>>> >>>> Here's typical output for iostat on nodes 2 & 3 as well: >>>> >>>> $ iostat -dmx md0 >>>> >>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s >>>> avgrq-sz avgqu-sz await r_await w_await svctm %util >>>> 2 md0 0.00 0.00 339.00 0.00 9.77 0.00 >>>> 59.00 0.00 0.00 0.00 0.00 0.00 0.00 >>>> 3 md0 0.00 0.00 2069.00 1.00 85.85 0.00 >>>> 84.94 0.00 0.00 0.00 0.00 0.00 0.00 >>>> >>>> Griff >>>> >>>> On 13 January 2016 at 18:36, Anuj Wadehra <anujw_2...@yahoo.co.in> >>>> wrote: >>>> >>>>> Node 2 has slightly higher data but that should be ok. Not sure how >>>>> read ops are so high when no IO intensive activity such as repair and >>>>> compaction is running on node 3.May be you can try investigating logs to >>>>> see whats happening. >>>>> >>>>> Others on the mailing list could also share their views on the >>>>> situation. >>>>> >>>>> Thanks >>>>> Anuj >>>>> >>>>> >>>>> >>>>> Sent from Yahoo Mail on Android >>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>>>> >>>>> On Wed, 13 Jan, 2016 at 11:46 pm, James Griffin >>>>> <james.grif...@idioplatform.com> wrote: >>>>> Hi Anuj, >>>>> >>>>> Below is the output of nodetool status. The nodes were replaced >>>>> following the instructions in Datastax documentation for replacing running >>>>> nodes since the nodes were running fine, it was that the servers had been >>>>> incorrectly initialised and they thus had less disk space. The status >>>>> below >>>>> shows 2 has significantly higher load, however as I say 2 is operating >>>>> normally and is running compactions, so I guess that's not an issue? >>>>> >>>>> Datacenter: datacenter1 >>>>> ======================= >>>>> Status=Up/Down >>>>> |/ State=Normal/Leaving/Joining/Moving >>>>> -- Address Load Tokens Owns Host ID >>>>> Rack >>>>> UN 1 253.59 GB 256 31.7% >>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1 >>>>> UN 2 302.23 GB 256 35.3% >>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1 >>>>> UN 3 265.02 GB 256 33.1% >>>>> 74b15507-db5c-45df-81db-6e5bcb7438a3 rack1 >>>>> >>>>> Griff >>>>> >>>>> On 13 January 2016 at 18:12, Anuj Wadehra <anujw_2...@yahoo.co.in> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Revisiting the thread I can see that nodetool status had both good >>>>>> and bad nodes at same time. How do you replace nodes? When you say bad >>>>>> node..I understand that the node is no more usable even though Cassandra >>>>>> is >>>>>> UP? Is that correct? >>>>>> >>>>>> If a node is in bad shape and not working, adding new node may >>>>>> trigger streaming huge data from bad node too. Have you considered using >>>>>> the procedure for replacing a dead node? >>>>>> >>>>>> Please share Latest nodetool status. >>>>>> >>>>>> nodetool output shared earlier: >>>>>> >>>>>> `nodetool status` output: >>>>>> >>>>>> Status=Up/Down >>>>>> |/ State=Normal/Leaving/Joining/Moving >>>>>> -- Address Load Tokens Owns Host >>>>>> ID Rack >>>>>> UN A (Good) 252.37 GB 256 23.0% >>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1 >>>>>> UN B (Good) 245.91 GB 256 24.4% >>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1 >>>>>> UN C (Good) 254.79 GB 256 23.7% >>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1 >>>>>> UN D (Bad) 163.85 GB 256 28.8% >>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1 >>>>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> Anuj >>>>>> >>>>>> Sent from Yahoo Mail on Android >>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>>>>> >>>>>> On Wed, 13 Jan, 2016 at 10:34 pm, James Griffin >>>>>> <james.grif...@idioplatform.com> wrote: >>>>>> Hi all, >>>>>> >>>>>> We’ve spent a few days running things but are in the same position. >>>>>> To add some more flavour: >>>>>> >>>>>> >>>>>> - We have a 3-node ring, replication factor = 3. We’ve been >>>>>> running in this configuration for a few years without any real issues >>>>>> - Nodes 2 & 3 are much newer than node 1. These two nodes were >>>>>> brought in to replace two other nodes which had failed RAID0 >>>>>> configuration >>>>>> and thus were lacking in disk space. >>>>>> - When node 2 was brought into the ring, it exhibited high CPU >>>>>> wait, IO and load metrics >>>>>> - We subsequently brought 3 into the ring: as soon as 3 was fully >>>>>> bootstrapped, the load, CPU wait and IO stats on 2 dropped to normal >>>>>> levels. Those same stats on 3, however, sky-rocketed >>>>>> - We’ve confirmed configuration across all three nodes are >>>>>> identical and in line with the recommended production settings >>>>>> - We’ve run a full repair >>>>>> - Node 2 is currently running compactions, 1 & 3 aren’t and have >>>>>> no pending >>>>>> - There is no GC happening from what I can see. Node 1 has a GC >>>>>> log, but that’s not been written to since May last year >>>>>> >>>>>> >>>>>> What we’re seeing at the moment is similar and normal stats on nodes >>>>>> 1 & 2, but high CPU wait, IO and load stats on 3. As a snapshot: >>>>>> >>>>>> >>>>>> 1. Load: 3.96, CPU wait: 30.8%, Disk Read Ops: 408/s >>>>>> 2. Load: 5.88, CPU wait: 14.6%, Disk Read Ops: 275/s >>>>>> 3. Load: 58.15, CPU wait: 87.0%, Disk Read Ops: 2,408/s >>>>>> >>>>>> >>>>>> Can you recommend any next steps? >>>>>> >>>>>> Griff >>>>>> >>>>>> On 6 January 2016 at 17:31, Anuj Wadehra <anujw_2...@yahoo.co.in> >>>>>> wrote: >>>>>> >>>>>>> Hi Vickrum, >>>>>>> >>>>>>> I would have proceeded with diagnosis as follows: >>>>>>> >>>>>>> 1. Analysis of sar report to check system health -cpu memory swap >>>>>>> disk etc. >>>>>>> System seems to be overloaded. This is evident from mutation drops. >>>>>>> >>>>>>> 2. Make sure that all recommended Cassandra production settings >>>>>>> available at Datastax site are applied ,disable zone reclaim and THP. >>>>>>> >>>>>>> 3.Run full Repair on bad node and check data size. Node is owner of >>>>>>> maximum token range but has significant lower data.I doubt that >>>>>>> bootstrapping happened properly. >>>>>>> >>>>>>> 4.Compactionstats shows 22 pending compactions. Try throttling >>>>>>> compactions via reducing cincurent compactors or compaction throughput. >>>>>>> >>>>>>> 5.Analyze logs to make sure bootstrapping happened without errors. >>>>>>> >>>>>>> 6. Look for other common performance problems such as GC pauses to >>>>>>> make sure that dropped mutations are not caused by GC pauses. >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> Anuj >>>>>>> >>>>>>> Sent from Yahoo Mail on Android >>>>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android> >>>>>>> >>>>>>> On Wed, 6 Jan, 2016 at 10:12 pm, Vickrum Loi >>>>>>> <vickrum....@idioplatform.com> wrote: >>>>>>> # nodetool compactionstats >>>>>>> pending tasks: 22 >>>>>>> compaction type keyspace table >>>>>>> completed total unit progress >>>>>>> Compactionproduction_analytics interactions >>>>>>> 240410213 161172668724 bytes 0.15% >>>>>>> >>>>>>> Compactionproduction_decisionsdecisions.decisions_q_idx >>>>>>> 120815385 226295183 bytes 53.39% >>>>>>> Active compaction remaining time : 2h39m58s >>>>>>> >>>>>>> Worth mentioning that compactions haven't been running on this node >>>>>>> particularly often. The node's been performing badly regardless of >>>>>>> whether >>>>>>> it's compacting or not. >>>>>>> >>>>>>> On 6 January 2016 at 16:35, Jeff Ferland <j...@tubularlabs.com> >>>>>>> wrote: >>>>>>> >>>>>>>> What’s your output of `nodetool compactionstats`? >>>>>>>> >>>>>>>> On Jan 6, 2016, at 7:26 AM, Vickrum Loi < >>>>>>>> vickrum....@idioplatform.com> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> We recently added a new node to our cluster in order to replace a >>>>>>>> node that died (hardware failure we believe). For the next two weeks >>>>>>>> it had >>>>>>>> high disk and network activity. We replaced the server, but it's >>>>>>>> happened >>>>>>>> again. We've looked into memory allowances, disk performance, number of >>>>>>>> connections, and all the nodetool stats, but can't find the cause of >>>>>>>> the >>>>>>>> issue. >>>>>>>> >>>>>>>> `nodetool tpstats`[0] shows a lot of active and pending threads, in >>>>>>>> comparison to the rest of the cluster, but that's likely a symptom, >>>>>>>> not a >>>>>>>> cause. >>>>>>>> >>>>>>>> `nodetool status`[1] shows the cluster isn't quite balanced. The >>>>>>>> bad node (D) has less data. >>>>>>>> >>>>>>>> Disk Activity[2] and Network activity[3] on this node is far higher >>>>>>>> than the rest. >>>>>>>> >>>>>>>> The only other difference this node has to the rest of the cluster >>>>>>>> is that its on the ext4 filesystem, whereas the rest are ext3, but >>>>>>>> we've >>>>>>>> done plenty of testing there and can't see how that would affect >>>>>>>> performance on this node so much. >>>>>>>> >>>>>>>> Nothing of note in system.log. >>>>>>>> >>>>>>>> What should our next step be in trying to diagnose this issue? >>>>>>>> >>>>>>>> Best wishes, >>>>>>>> Vic >>>>>>>> >>>>>>>> [0] `nodetool tpstats` output: >>>>>>>> >>>>>>>> Good node: >>>>>>>> Pool Name Active Pending Completed >>>>>>>> Blocked All time blocked >>>>>>>> ReadStage 0 0 >>>>>>>> 46311521 0 0 >>>>>>>> RequestResponseStage 0 0 >>>>>>>> 23817366 0 0 >>>>>>>> MutationStage 0 0 >>>>>>>> 47389269 0 0 >>>>>>>> ReadRepairStage 0 0 >>>>>>>> 11108 0 0 >>>>>>>> ReplicateOnWriteStage 0 0 >>>>>>>> 0 0 0 >>>>>>>> GossipStage 0 0 >>>>>>>> 5259908 0 0 >>>>>>>> CacheCleanupExecutor 0 0 >>>>>>>> 0 0 0 >>>>>>>> MigrationStage 0 0 >>>>>>>> 30 0 0 >>>>>>>> MemoryMeter 0 0 >>>>>>>> 16563 0 0 >>>>>>>> FlushWriter 0 0 >>>>>>>> 39637 0 26 >>>>>>>> ValidationExecutor 0 0 >>>>>>>> 19013 0 0 >>>>>>>> InternalResponseStage 0 0 >>>>>>>> 9 0 0 >>>>>>>> AntiEntropyStage 0 0 >>>>>>>> 38026 0 0 >>>>>>>> MemtablePostFlusher 0 0 >>>>>>>> 81740 0 0 >>>>>>>> MiscStage 0 0 >>>>>>>> 19196 0 0 >>>>>>>> PendingRangeCalculator 0 0 >>>>>>>> 23 0 0 >>>>>>>> CompactionExecutor 0 0 >>>>>>>> 61629 0 0 >>>>>>>> commitlog_archiver 0 0 >>>>>>>> 0 0 0 >>>>>>>> HintedHandoff 0 0 >>>>>>>> 63 0 0 >>>>>>>> >>>>>>>> Message type Dropped >>>>>>>> RANGE_SLICE 0 >>>>>>>> READ_REPAIR 0 >>>>>>>> PAGED_RANGE 0 >>>>>>>> BINARY 0 >>>>>>>> READ 640 >>>>>>>> MUTATION 0 >>>>>>>> _TRACE 0 >>>>>>>> REQUEST_RESPONSE 0 >>>>>>>> COUNTER_MUTATION 0 >>>>>>>> >>>>>>>> Bad node: >>>>>>>> Pool Name Active Pending Completed >>>>>>>> Blocked All time blocked >>>>>>>> ReadStage 32 113 >>>>>>>> 52216 0 0 >>>>>>>> RequestResponseStage 0 0 >>>>>>>> 4167 0 0 >>>>>>>> MutationStage 0 0 >>>>>>>> 127559 0 0 >>>>>>>> ReadRepairStage 0 0 >>>>>>>> 125 0 0 >>>>>>>> ReplicateOnWriteStage 0 0 >>>>>>>> 0 0 0 >>>>>>>> GossipStage 0 0 >>>>>>>> 9965 0 0 >>>>>>>> CacheCleanupExecutor 0 0 >>>>>>>> 0 0 0 >>>>>>>> MigrationStage 0 0 >>>>>>>> 0 0 0 >>>>>>>> MemoryMeter 0 0 >>>>>>>> 24 0 0 >>>>>>>> FlushWriter 0 0 >>>>>>>> 27 0 1 >>>>>>>> ValidationExecutor 0 0 >>>>>>>> 0 0 0 >>>>>>>> InternalResponseStage 0 0 >>>>>>>> 0 0 0 >>>>>>>> AntiEntropyStage 0 0 >>>>>>>> 0 0 0 >>>>>>>> MemtablePostFlusher 0 0 >>>>>>>> 96 0 0 >>>>>>>> MiscStage 0 0 >>>>>>>> 0 0 0 >>>>>>>> PendingRangeCalculator 0 0 >>>>>>>> 10 0 0 >>>>>>>> CompactionExecutor 1 1 >>>>>>>> 73 0 0 >>>>>>>> commitlog_archiver 0 0 >>>>>>>> 0 0 0 >>>>>>>> HintedHandoff 0 0 >>>>>>>> 15 0 0 >>>>>>>> >>>>>>>> Message type Dropped >>>>>>>> RANGE_SLICE 130 >>>>>>>> READ_REPAIR 1 >>>>>>>> PAGED_RANGE 0 >>>>>>>> BINARY 0 >>>>>>>> READ 31032 >>>>>>>> MUTATION 865 >>>>>>>> _TRACE 0 >>>>>>>> REQUEST_RESPONSE 7 >>>>>>>> COUNTER_MUTATION 0 >>>>>>>> >>>>>>>> >>>>>>>> [1] `nodetool status` output: >>>>>>>> >>>>>>>> Status=Up/Down >>>>>>>> |/ State=Normal/Leaving/Joining/Moving >>>>>>>> -- Address Load Tokens Owns Host >>>>>>>> ID Rack >>>>>>>> UN A (Good) 252.37 GB 256 23.0% >>>>>>>> 9cd2e58c-a062-48a4-8d3f-b7bd9ee0576f rack1 >>>>>>>> UN B (Good) 245.91 GB 256 24.4% >>>>>>>> 6f0cfff2-babe-4de2-a1e3-6201228dee44 rack1 >>>>>>>> UN C (Good) 254.79 GB 256 23.7% >>>>>>>> f4891729-9179-4f19-ab2c-50d387da7ac6 rack1 >>>>>>>> UN D (Bad) 163.85 GB 256 28.8% >>>>>>>> faa5b073-6af4-4c80-b280-e7fdd61924d3 rack1 >>>>>>>> >>>>>>>> [2] Disk read/write ops: >>>>>>>> >>>>>>>> >>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/dRs4jV1ukMeFHGE/cass-disk-read-ops.png >>>>>>>> >>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/gbE58N2WosiOomF/cass-disk-write-ops.png >>>>>>>> >>>>>>>> [3] Network in/out: >>>>>>>> >>>>>>>> >>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/RwOVdUBxu6fPLgF/cass-network-in.png >>>>>>>> >>>>>>>> https://s3-eu-west-1.amazonaws.com/uploads-eu.hipchat.com/28299/178477/OpZM6ypNVN0O30q/cass-network-out.png >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >