Ah, I focused too much on the literal meaning of startup. If it's happening
JUST AFTER startup, it's probably getting flooded with hints from the other
hosts when it comes online.

If that's the case, it may be just simply overrunning the memtable, or it
may be a deadlock like https://issues.apache.org/jira/browse/CASSANDRA-15367
(which benedict just updated this morning, good timing)

If it's after the host comes online and it's hint replay from the other
hosts, you probably want to throttle hint replay significantly on the rest
of the cluster. Whatever your hinted handoff throttle is, consider dropping
it by 50-90% to work around whichever of those two problems it is.


On Fri, Jan 24, 2020 at 9:06 AM Jeff Jirsa <jji...@gmail.com> wrote:

> 6 GB of mutations on heap
> Startup would replay commitlog, which would re-materialize all of those
> mutations and put them into the memtable. The memtable would flush over
> time to disk, and clear the commitlog.
>
> It looks like PERHAPS the commitlog replay is faster than the memtable
> flush, so you're blowing out the memtable while you're replaying the
> commitlog.
>
> How much memory does the machine have? How much of that is allocated to
> the heap? What are your memtable settings? Do you see log lines about
> flushing memtables to free room (probably something like the slab pool
> cleaner)?
>
>
>
> On Fri, Jan 24, 2020 at 3:16 AM Behroz Sikander <bsikan...@apache.org>
> wrote:
>
>> We recently had a lot of OOM in C* and it was generally happening during
>> startup.
>> We took some heap dumps but still cannot pin point the exact reason. So,
>> we need some help from experts.
>>
>> Our clients are not explicitly deleting data but they have TTL enabled.
>>
>> C* details:
>> > show version
>> [cqlsh 5.0.1 | Cassandra 2.2.9 | CQL spec 3.3.1 | Native protocol v4]
>>
>> Most of the heap was allocated was the object[]
>> - org.apache.cassandra.db.Cell
>>
>> Heap dump images:
>> Heap usage by class: https://pasteboard.co/IRrfu70.png
>> Classes using most heap: https://pasteboard.co/IRrgszZ.png
>> Overall heap usage: https://pasteboard.co/IRrg7t1.png
>>
>> What could be the reason for such OOM? Something that we can tune to
>> improve this?
>> Any help would be much appreciated.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>

Reply via email to