I believe you could set your tables to flush to disk at specific intervals (memtable_flush_period_in_ms), note that you'd have to set this for all tables (not just the CDC enabled tables) to ensure that commitlog files are flushed to the cdc_raw directory. Or as Dhanunjaya noted you could just periodically call the `nodetool flush` endpoint to flush all the tables at one go.
Vytenis On Tue, May 4, 2021 at 11:04 PM Dhanunjaya Tokala < dhanunjayatok...@gmail.com> wrote: > One way to flush commitlog is nodetool flush > On Cassandra nodes . > > On Tue, May 4, 2021 at 3:58 PM Bingqin Zhou <bingq...@wepay.com> wrote: > >> Hi Kane, >> >> Thank you for the insights! >> >> Reducing the total space on its own will help, however definitely test >>> this as such a large drop could result in a massive increase in SSTables >>> and thus compaction overhead. You'll in general want to look into any >>> property that makes memtables flush more frequently (which is based on heap >>> size and some tuning properties in cassandra.yaml). >> >> >> If we decrease *memtable_heap_sapce_in_mb* and *memtable_off_space_in_mb*, >> is it going to cause more compaction activities potentially as well? >> >> I'm typically not a fan of using a database as a streaming/workflow >>> service, so I have to ask have you considered managing this from your >>> clients rather than using CDC in C*? >> >> >> Actually, the design and initiation of our service is based on the fact >> that the CDC feature in Cassandra is used for streaming data changes in >> Cassandra with low latency. If this is not the case, may I understand >> what's the purpose and the intended use case for the CDC feature in >> Cassandra please? >> >> Thank you so much! >> Bingqin Zhou >> >> On Mon, May 3, 2021 at 5:00 PM Kane Wilson <k...@raft.so> wrote: >> >>> (removing dev) >>> >>> commitlog_segment_size_in_mb isn't going to help, in fact you probably >>> don't want to modify this as it'll reduce the maximum size of your >>> mutations. >>> Reducing the total space on its own will help, however definitely test >>> this as such a large drop could result in a massive increase in SSTables >>> and thus compaction overhead. You'll in general want to look into any >>> property that makes memtables flush more frequently (which is based on heap >>> size and some tuning properties in cassandra.yaml). >>> >>> I'm typically not a fan of using a database as a streaming/workflow >>> service, so I have to ask have you considered managing this from your >>> clients rather than using CDC in C*? >>> >>> raft.so - Cassandra consulting, support, and managed services >>> >>> >>> On Tue, May 4, 2021 at 4:16 AM Bingqin Zhou <bingq...@wepay.com> wrote: >>> >>>> Hi, >>>> >>>> We're working with the CDC feature to develop an agent to stream >>>> changes in Cassandra DB into Kafka. However, the CDC feature doesn't work >>>> well for us so far because CommitLog files are rarely flushed into cdc_raw >>>> directory, and the frequency can be as low as a few months. >>>> >>>> Is there any suggested and feasible way to increase the frequency for >>>> Cassandra to flush CommitLog files please? >>>> >>>> We're thinking about decreasing *commitlog_segment_size_in_mb* from 32 >>>> to 16, and decreasing *commitlog_total_space_in_mb* from 8192 to 160. >>>> Does this sound like a reasonable approach? Is there any concern or >>>> anything we need to be warned about trying this please? >>>> >>>> Thank you! >>>> >>>> Bingqin Zhou >>>> >>>