Thanks Tyler. Sounds like the risk of making this change is tolerable for us.
Josef: thanks for the links. We're thinking carefully about our NTP configuration, but we also would like to account for unusual failure modes (e.g., nodes with fast clocks experiencing partitions from NTP servers/peers). Making sure Cassandra's internals are arranged to minimize the damage when these things happen is part of our strategy. Thanks, SK On 06/17/2015 11:17 AM, Tyler Hobbs wrote: >> >> Why does Cassandra consistently prefer tombstones to other kinds of cells? >> > > It's primarily to have deterministic conflict resolution. I don't recall > any specific conversations about preferring tombstones expiring cells, and > the original ticket (https://issues.apache.org/jira/browse/CASSANDRA-699) > doesn't mention anything. > > By modifying this behavior in this particular case, do we risk hitting >> bizarre corner cases? >> > > I think this would be safe. The only problem I can think of would happen > while your cluster has some patched nodes and some unpatched nodes. If you > had any nodes that had both an expiring cell and a tombstone with the same > timestamp, then patched replicas would return different results/digests > than the unpatched nodes. However, unless you're mixing TTLs with deletes, > that's not too likely to happen. Maybe repair combined with clock skew > could result in that, but not much else. > > > On Wed, Jun 17, 2015 at 10:05 AM, Josef Lindman Hörnlund <jo...@appdata.biz> > wrote: > >> >> Hello Sam, >> >> This is not answering your direct question but if you worry about clock >> skew take a look at this great two-part blogpost: >> >> >> https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/ >> < >> https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-1-the-problem/ >>> >> >> https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/ >> < >> https://blog.logentries.com/2014/03/synchronizing-clocks-in-a-cassandra-cluster-pt-2-solutions/ >>> >> >> >> Josef Lindman Hörnlund >> Chief Data Scientist >> AppData >> jo...@appdata.biz >> >> >> >> >>> On 16 Jun 2015, at 20:45, Sam Klock <skl...@akamai.com> wrote: >>> >>> Hi folks, >>> >>> I have a question about a design choice on how expiring cells are >>> reconciled with tombstones. For two cells with the same timestamp, if >>> one is expiring and one is a tombstone, Cassandra *always* prefers the >>> tombstone. This matches its behavior for normal/non-expiring cells, but >>> the folks in my organization worry about what it may imply for nodes >>> experiencing clock skew. Specifically, we're concerned about scenarios >>> like the following: >>> >>> 1) An expiring cell is committed via some node with a non-skewed clock. >>> 2) Another replica for that cell experiences forward clock skew and >>> decides that the cell is expired. It eventually runs a compaction that >>> converts the cell to a tombstone. >>> 3) The tombstone propagates to other nodes via, e.g., node repair. >>> 4) The other nodes all eventually run their own compactions. Because of >>> the reconciliation logic, the expiring cell is purged on all of the >>> replicas, leaving behind only the tombstone. >>> >>> If the cell should have still been live at (4), the reconciliation logic >>> will result in it being prematurely purged. We have confirmed this >>> behavior experimentally. >>> >>> My organization may be more concerned about clock skew than the larger >>> community, so I don't think we're inclined to propose a patch at this >>> time. But to account for this kind of scenario we would like to patch >>> our internal version of Cassandra to conditionally prefer expiring cells >>> to tombstones if the node believes they should still be live; i.e., in >>> reconcile() in *ExpiringCell.java, instead of: >>> >>> if (cell instanceof DeletedCell) >>> return cell; >>> >>> use: >>> >>> if (cell instanceof DeletedCell) >>> return isLive() ? this : cell; >>> >>> Before we do so, however, we'd like to understand the rationale for the >>> existing behavior and the risks of making changes to it. Why does >>> Cassandra consistently prefer tombstones to other kinds of cells? By >>> modifying this behavior in this particular case, do we risk hitting >>> bizarre corner cases? >>> >>> Thanks, >>> SK >> >> > >