Re: Sporadic high IO bandwidth and Linux OOM killer

Oleksandr Shulgin Fri, 07 Dec 2018 03:44:12 -0800

On Thu, Dec 6, 2018 at 3:39 PM Riccardo Ferrari <ferra...@gmail.com> wrote:


> To be honest I've never seen the OOM in action on those instances. My Xmx
> was 8GB just like yours and that let me think you have some process that is
> competing for memory, is it? Do you have any cron, any backup, anything
> that can trick the OOMKiller ?
>

Riccardo,

As I've mentioned previously, apart from docker running Cassandra on JVM,
there is a small number of houskeeping processes, namely cron to trigger
log rotation, a log shipping agent, node metrics exporter (prometheus) and
some other small things.  None of those come close in their memory
requirements compared to Cassandra and are routinely pretty low in memory
usage reports from atop and similar tools.  The overhead of these seems to
be minimal.

My unresponsiveness was seconds long. This is/was bad becasue gossip
> protocol was going crazy by marking nodes down and all the consequences
> this can lead in distributed system, think about hints, dynamic snitch, and
> whatever depends on node availability ...
> Can you share some number about your `tpstats` or system load in general?
>

Here's some pretty typical tpstats output from one of the nodes:

Pool Name                    Active   Pending      Completed   Blocked  All
time blocked
MutationStage                     0         0      319319724         0
           0
ViewMutationStage                 0         0              0         0
           0
ReadStage                         0         0       80006984         0
           0
RequestResponseStage              0         0      258548356         0
           0
ReadRepairStage                   0         0        2707455         0
           0
CounterMutationStage              0         0              0         0
           0
MiscStage                         0         0              0         0
           0
CompactionExecutor                1        55        1552918         0
           0
MemtableReclaimMemory             0         0           4042         0
           0
PendingRangeCalculator            0         0            111         0
           0
GossipStage                       0         0        6343859         0
           0
SecondaryIndexManagement          0         0              0         0
           0
HintsDispatcher                   0         0            226         0
           0
MigrationStage                    0         0              0         0
           0
MemtablePostFlush                 0         0           4046         0
           0
ValidationExecutor                1         1           1510         0
           0
Sampler                           0         0              0         0
           0
MemtableFlushWriter               0         0           4042         0
           0
InternalResponseStage             0         0           5890         0
           0
AntiEntropyStage                  0         0           5532         0
           0
CacheCleanupExecutor              0         0              0         0
           0
Repair#250                        1         1              1         0
           0
Native-Transport-Requests         2         0      260447405         0
          18

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     1
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

Speaking of CPU utilization, it is consistently within 30-60% on all nodes
(and even less in the night).


> On the tuning side I just went through the following article:
> https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.html
>
> No rollbacks, just moving forward! Right now we are upgrading the instance
> size to something more recent than m1.xlarge (for many different reasons,
> including security, ECU and network).Nevertheless it might be a good idea
> to upgrade to the 3.X branch to leverage on better off-heap memory
> management.
>

One thing we have noticed very recently is that our nodes are indeed
running low on memory.  It even seems now that the IO is a side effect of
impending OOM, not the other way round as we have thought initially.

After a fresh JVM start the memory allocation looks roughly like this:

             total       used       free     shared    buffers     cached
Mem:           14G        14G       173M       1.1M        12M       3.2G
-/+ buffers/cache:        11G       3.4G
Swap:           0B         0B         0B

Then, within a number of days, the allocated disk cache shrinks all the way
down to unreasonable numbers like only 150M.  At the same time "free" stays
at the original level and "used" grows all the way up to 14G.  Shortly
after that the node becomes unavailable because of the IO and ultimately
after some time the JVM gets killed.

Most importantly, the resident size of JVM process stays at around 11-12G
all the time, like it was shortly after the start.  How can we find where
the rest of the memory gets allocated?  Is it just some sort of malloc
fragmentation?

As we are running a relatively recent version of JDK, we've tried to use
the option -Djdk.nio.maxCachedBufferSize=262144 on one of the nodes, as
suggested in this issue:
https://issues.apache.org/jira/browse/CASSANDRA-13931
But we didn't see any improvement.  Also, the expectation is if it would be
the issue in the first place, the resident size of JVM process would grow
at the same rate as available memory is shrinking, correct?

Another thing we didn't find the answer so far is why within JVM heap.used
(<= 6GB) never reaches heap.committed = 8GB.  Any ideas?

Regards,
--
Alex

Re: Sporadic high IO bandwidth and Linux OOM killer

Reply via email to