[ https://issues.apache.org/jira/browse/CASSANDRA-19597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842098#comment-17842098 ]
Benedict Elliott Smith edited comment on CASSANDRA-19597 at 4/29/24 8:02 PM: ----------------------------------------------------------------------------- Yes, exactly. If I remember correctly, this "queue" was originally intended to achieve two things: 1) ensure commit log records are invalidated correctly, as it used to only support essentially invalidations of a complete prefix; 2) serve as a kind of fsync so that when awaiting the completion of a flush on a particular table you can be certain all data written prior has made it to sstables I'm not actually sure if any of this is necessary today though. Pretty sure we invalidate explicit ranges now, so the commit log semantics do not require this. I'm not sure off the top of my head why (except for non-durable tables/writes, or things that might want to read sstables prior to commit log replay) you would ever need to know all prior flushes had completed though, since the commit log will ensure they are re-written on restart. But a low risk approach would be to just make this a per table queue. was (Author: benedict): Yes, exactly. If I remember correctly, this "queue" was originally intended to achieve two things: 1) ensure commit log records are invalidated correctly, as it used to only support essentially invalidations of a complete prefix; 2) serve as a kind of fsync so that when awaiting the completion of a flush on a particular table you can be certain all data written prior has made it to disk I'm not actually sure if any of this is necessary today though. Pretty sure we invalidate explicit ranges now, so the commit log semantics do not require this. I'm not off the top of my head sure why (except for non-durable tables/writes) you would ever need to know all prior flushes had completed though, since the commit log will ensure they are re-written on restart. But a low risk approach would be to just make this a per table queue. > SystemKeyspace CFS flushing blocked by unrelated keyspace flushing/compaction > ----------------------------------------------------------------------------- > > Key: CASSANDRA-19597 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19597 > Project: Cassandra > Issue Type: Bug > Reporter: Ariel Weisberg > Assignee: Ariel Weisberg > Priority: Normal > > There is a single post flush thread and that thread processes tasks in order > and one of those tasks can be a memtable flush for an unrelated keyspace/cfs, > and that memtable flush can be blocked by slow IntervalTree building and > racing with compactors to try and build an interval tree. > Unless there is a requirement for ordering we probably want to loosen this to > the actual ordering requirement so that problems in one keyspace can’t effect > another. > SystemKeyspace and Gossip in particular cause lots of weird problems like > nodes marking each other down because Gossip can’t process nodes being > removed (blocking flush each time in SystemKeyspace.removeNode) > A very simple fix here might be to queue the post flush task at the same time > as the flush in a per CFS queue, and then submit the task only once the flush > is completed. > If flushes complete out of order the queue will still ensure their > completions are processed in order. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org