Thanks a lot Hitesh! I'll try to re-tune the heap to a lower level
Shalom Sagges DBA T: +972-74-700-4035 <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections <https://liveperson.docsend.com/view/8iiswfp> On Thu, Apr 19, 2018 at 12:42 AM, hitesh dua <hiteshd...@gmail.com> wrote: > Hi , > > I'll recommend tuning you heap size further( preferably lower) as large > Heap size can lead to Large Garbage collection pauses also known as also > known as a stop-the-world event. A pause occurs when a region of memory is > full and the JVM needs to make space to continue. During a pause all > operations are suspended. Because a pause affects networking, the node can > appear as down to other nodes in the cluster. Additionally, any Select and > Insert statements will wait, which increases read and write latencies. > > Any pause of more than a second, or multiple pauses within a second that > add to a large fraction of that second, should be avoided. The basic cause > of the problem is the rate of data stored in memory outpaces the rate at > which data can be removed > > MUTATION : If a write message is processed after its timeout > (write_request_timeout_in_ms) it either sent a failure to the client or it > met its requested consistency level and will relay on hinted handoff and > read repairs to do the mutation if it succeeded. > > Another possible cause of the Issue could be you HDDs as that could too > be a bottleneck. > > *MAX_HEAP_SIZE* > The recommended maximum heap size depends on which GC is used: > Hardware setupRecommended MAX_HEAP_SIZE > Older computers Typically 8 GB. > CMS for newer computers (8+ cores) with up to 256 GB RAM No more 14 GB. > > > Thanks, > Hitesh dua > hiteshd...@gmail.com > > On Wed, Apr 18, 2018 at 10:07 PM, shalom sagges <shalomsag...@gmail.com> > wrote: > >> Hi All, >> >> I have a 44 node cluster (22 nodes on each DC). >> Each node has 24 cores and 130 GB RAM, 3 TB HDDs. >> Version 2.0.14 (soon to be upgraded) >> ~10K writes per second per node. >> Heap size: 8 GB max, 2.4 GB newgen >> >> I deployed Reaper and GC started to increase rapidly. I'm not sure if >> it's because there was a lot of inconsistency in the data, but I decided to >> increase the heap to 16 GB and new gen to 6 GB. I increased the max tenure >> from 1 to 5. >> >> I tested on a canary node and everything was fine but when I changed the >> entire DC, I suddenly saw a lot of dropped mutations in the logs on most of >> the nodes. (Reaper was not running on the cluster yet but a manual repair >> was running). >> >> Can the heap increment cause lots of dropped mutations? >> When is a mutation considered as dropped? Is it during flush? Is it >> during the write to the commit log or memtable? >> >> Thanks! >> >> >> >> > -- This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this on behalf of the addressee you must not use, copy, disclose or take action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply email and delete this message. Thank you.