Re: rebuild constantly fails, 3.11

kurt greaves Fri, 11 Aug 2017 08:56:05 -0700

How much memory do these machines have?  Typically we've found that G1
isn't worth it until you get to around 24G heaps, and even at that it's not
really better than CMS. You could try CMS with an 8G heap and 2G new size.


However as the oom is only happening on one node have you ensured there are
no extra processes running on that node that could be consuming extra
memory? Note that the oom killer will kill the process with the highest oom
score, which generally corresponds to the process using the most memory,
but not necessarily the problem.

Also could you run nodetool info on the problem node and 1 other and dump
the output in a gist? It would be interesting to see if there is a
significant difference in off-heap.

On 11 Aug. 2017 17:30, "Micha" <[email protected]> wrote:

> It's an oom issue, the kernel kills the cassandra job.
> The config was to use offheap buffers and 20G java heap, I changed this
> to use heap buffers and 16G java heap. I added a  new node yesterday
> which got streams from 4 other nodes. They all succeeded except on the
> one node which failed before. This time again the db was killed by the
> kernel. At the moment I don't know what is the reason here, since the
> nodes are equal.
>
> For me it seems the g1gc is not able to free the memory fast enough.
> The settings were for  MaxGCPauseMillis=600 and ParallelGCThreads=10
> ConcGCThreads=10 which maybe are too high since the node has only 8 cores..
> I changed this ParallelGCThreads=8 and ConcGCThreads=2 as is mentioned
> in the comments of jvm.options
>
> Since the bootstrap of the fifth node did not complete I will start it
> again and check if the memory is still decreasing over time.
>
>
>
>  Michael
>
>
>
> On 11.08.2017 01:25, Jeff Jirsa wrote:
> >
> >
> > On 2017-08-08 01:00 (-0700), Micha <[email protected]> wrote:
> >> Hi,
> >>
> >> it seems I'm not able to add add 3 node dc to a 3 node dc. After
> >> starting the rebuild on a new node, nodetool netstats show it will
> >> receive 1200 files from node-1 and 5000 from node-2. The stream from
> >> node-1 completes but the stream from node-2 allways fails, after sending
> >> ca 4000 files.
> >>
> >> After restarting the rebuild it again starts to send the 5000 files.
> >> The whole cluster is connected via one switch only , no firewall
> >> between, the networks shows no errors.
> >> The machines have 8 cores, 32GB RAM and two 1TB discs as raid0.
> >> the logs show no errors. The size of the data is ca 1TB.
> >
> > Is there anything in `dmesg` ?  System logs? Nothing? Is node2 running?
> Is node3 running?
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: rebuild constantly fails, 3.11

Reply via email to