Re: How to make Cassandra flush CommitLog files more frequently?

vytenis silgalis Wed, 05 May 2021 19:36:58 -0700

I believe you could set your tables to flush to disk at specific intervals
(memtable_flush_period_in_ms), note that you'd have to set this for all
tables (not just the CDC enabled tables) to ensure that commitlog files are
flushed to the cdc_raw directory. Or as Dhanunjaya noted you could just
periodically call the `nodetool flush` endpoint to flush all the tables at
one go.


Vytenis





On Tue, May 4, 2021 at 11:04 PM Dhanunjaya Tokala <
dhanunjayatok...@gmail.com> wrote:

> One way to flush commitlog is nodetool flush
> On Cassandra nodes .
>
> On Tue, May 4, 2021 at 3:58 PM Bingqin Zhou <bingq...@wepay.com> wrote:
>
>> Hi Kane,
>>
>> Thank you for the insights!
>>
>> Reducing the total space on its own will help, however definitely test
>>> this as such a large drop could result in a massive increase in SSTables
>>> and thus compaction overhead. You'll in general want to look into any
>>> property that makes memtables flush more frequently (which is based on heap
>>> size and some tuning properties in cassandra.yaml).
>>
>>
>> If we decrease *memtable_heap_sapce_in_mb* and *memtable_off_space_in_mb*,
>> is it going to cause more compaction activities potentially as well?
>>
>> I'm typically not a fan of using a database as a streaming/workflow
>>> service, so I have to ask have you considered managing this from your
>>> clients rather than using CDC in C*?
>>
>>
>> Actually, the design and initiation of our service is based on the fact
>> that the CDC feature in Cassandra is used for streaming data changes in
>> Cassandra with low latency. If this is not the case, may I understand
>> what's the purpose and the intended use case for the CDC feature in
>> Cassandra please?
>>
>> Thank you so much!
>> Bingqin Zhou
>>
>> On Mon, May 3, 2021 at 5:00 PM Kane Wilson <k...@raft.so> wrote:
>>
>>> (removing dev)
>>>
>>> commitlog_segment_size_in_mb isn't going to help, in fact you probably
>>> don't want to modify this as it'll reduce the maximum size of your
>>> mutations.
>>> Reducing the total space on its own will help, however definitely test
>>> this as such a large drop could result in a massive increase in SSTables
>>> and thus compaction overhead. You'll in general want to look into any
>>> property that makes memtables flush more frequently (which is based on heap
>>> size and some tuning properties in cassandra.yaml).
>>>
>>> I'm typically not a fan of using a database as a streaming/workflow
>>> service, so I have to ask have you considered managing this from your
>>> clients rather than using CDC in C*?
>>>
>>> raft.so - Cassandra consulting, support, and managed services
>>>
>>>
>>> On Tue, May 4, 2021 at 4:16 AM Bingqin Zhou <bingq...@wepay.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We're working with the CDC feature to develop an agent to stream
>>>> changes in Cassandra DB into Kafka. However, the CDC feature doesn't work
>>>> well for us so far because CommitLog files are rarely flushed into cdc_raw
>>>> directory, and the frequency can be as low as a few months.
>>>>
>>>> Is there any suggested and feasible way to increase the frequency for
>>>> Cassandra to flush CommitLog files please?
>>>>
>>>> We're thinking about decreasing *commitlog_segment_size_in_mb* from 32
>>>> to 16, and decreasing *commitlog_total_space_in_mb* from 8192 to 160.
>>>> Does this sound like a reasonable approach? Is there any concern or
>>>> anything we need to be warned about trying this please?
>>>>
>>>> Thank you!
>>>>
>>>> Bingqin Zhou
>>>>
>>>

Re: How to make Cassandra flush CommitLog files more frequently?

Reply via email to