[
https://issues.apache.org/jira/browse/CASSANDRA-20920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Petrov updated CASSANDRA-20920:
------------------------------------
Description:
This patch introduces SegmentStateTracker, which tracks Mutation Journal
allocations and listens to Memtable->SSTable flushes.
* SegmentStateTracker logic is largely adapted from CommitLog (i.e. tracking
dirty allocation upper bound, and clean/flushed min/max bounds, and checking
their intersection to see if entire segment is flushed.
* MutationJournal now tracks new allocations per segment (and, within a
segment, per table), and listens to Memtable flushes, marking CommitLogPosition
bounds reported by the flush as clean
* Memtables now distinguish between commit log positions they track vs
journal positions. It might be a good idea to mark CommitLogPosition with a
corresponding flag to avoid accidentally passing a wrong position
{{MutationJournal#replay}} is added, and serves a purpose similar to CommitLog
replay, albeit lacks some of its functionality, such as replay filter, and DROP
TABLE support for now (missing pieces documented inline). Only segments holding
allocations that were not memtable->sstable flushed are considered for replay.
Testing:
* A fuzz test for SegmentStateTracker
* A test that assumes some memtable behavior to exercise MutationJournal
integration
* Full integration / bounce test validating that data is being recovered by
replay
Important: a follow-up patch (in work by [~aleksey]) adds compaction and
truncation of segments. Added tests will need to be expanded to support this.
Remark: [trunk/accord commit I reviewed
today|https://github.com/apache/cassandra/pull/4384/files#diff-a6dbe3c4eab186aaa33f04d7826c2239bfe99b3b392b026d18b8f31584652303R939]
has changes StaticSegmentIterator to just SegmentIterator, which largely
aligns with this patch, but might cause a rebase.
was:
This patch introduces SegmentStateTracker, which tracks Mutation Journal
allocations and listens to Memtable->SSTable flushes.
* SegmentStateTracker logic is largely adapted from CommitLog (i.e. tracking
dirty allocation upper bound, and clean/flushed min/max bounds, and checking
their intersection to see if entire segment is flushed.
* MutationJournal now tracks new allocations per segment (and, within a
segment, per table), and listens to Memtable flushes, marking CommitLogPosition
bounds reported by the flush as clean
* Memtables now distinguish between commit log positions they track vs
journal positions. It might be a good idea to mark CommitLogPosition with a
corresponding flag to avoid accidentally passing
{{MutationJournal#replay}} is added, and serves a purpose similar to CommitLog
replay, albeit lacks some of its functionality, such as replay filter, and DROP
TABLE support for now (missing pieces documented inline). Only segments holding
allocations that were not memtable->sstable flushed are considered for replay.
Testing:
* A fuzz test for SegmentStateTracker
* A test that assumes some memtable behavior to exercise MutationJournal
integration
* Full integration / bounce test validating that data is being recovered by
replay
Important: a follow-up patch (in work by [~aleksey]) adds compaction and
truncation of segments. Added tests will need to be expanded to support this.
> Mutation Tracking: Journal Replay
> ----------------------------------
>
> Key: CASSANDRA-20920
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20920
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: Consistency/Coordination
> Reporter: Alex Petrov
> Assignee: Alex Petrov
> Priority: High
> Attachments: ci_summary.html
>
>
> This patch introduces SegmentStateTracker, which tracks Mutation Journal
> allocations and listens to Memtable->SSTable flushes.
> * SegmentStateTracker logic is largely adapted from CommitLog (i.e.
> tracking dirty allocation upper bound, and clean/flushed min/max bounds, and
> checking their intersection to see if entire segment is flushed.
> * MutationJournal now tracks new allocations per segment (and, within a
> segment, per table), and listens to Memtable flushes, marking
> CommitLogPosition bounds reported by the flush as clean
> * Memtables now distinguish between commit log positions they track vs
> journal positions. It might be a good idea to mark CommitLogPosition with a
> corresponding flag to avoid accidentally passing a wrong position
> {{MutationJournal#replay}} is added, and serves a purpose similar to
> CommitLog replay, albeit lacks some of its functionality, such as replay
> filter, and DROP TABLE support for now (missing pieces documented inline).
> Only segments holding allocations that were not memtable->sstable flushed are
> considered for replay.
> Testing:
> * A fuzz test for SegmentStateTracker
> * A test that assumes some memtable behavior to exercise MutationJournal
> integration
> * Full integration / bounce test validating that data is being recovered by
> replay
> Important: a follow-up patch (in work by [~aleksey]) adds compaction and
> truncation of segments. Added tests will need to be expanded to support this.
> Remark: [trunk/accord commit I reviewed
> today|https://github.com/apache/cassandra/pull/4384/files#diff-a6dbe3c4eab186aaa33f04d7826c2239bfe99b3b392b026d18b8f31584652303R939]
> has changes StaticSegmentIterator to just SegmentIterator, which largely
> aligns with this patch, but might cause a rebase.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]