> Is this a realistic case when Cassandra (unless I'm missing something) is limited to adding or removing a single node at a time? I'm sure this can happen under some sort of generic range movement of some sort (how does one initiate such movement, and why), but will it happen under "normal" conditions of node bootstrap or decomission of a single node?
It's possible to make simultaneous range movements when either {{-Dcassandra.consistent.range.movement=false}}(CASSANDRA-7069) or {{-Dcassandra.consistent.simultaneousmoves.allow=true}} (CASSANDRA-11005) are specified. In any case, I'm not saying it's not possible, just that we cannot apply this optimization when there are simultaneous range movements in the same rack. > How/when would we have two pending nodes for a single view partition? Actually I meant if there are multiple range movements going on in the same rack, not exactly in the same partition. > Yes, it seems it will not be trivial. But if this is the common case in common operations such as node addition or removal, it may significantly reduce (from RF*2 to RF+1) the number of view updates being sent around, and avoid MV update performance degredation during the streaming process. Agreed, we should definitely look into making this optimization, but just was never done before due to other priorities, please open a ticket for it. There's a similar optimization that can be done for view batchlog replays - right now the view update is sent to all replicas during batchlog replay, but we could simplify it and also send only to the paired view replicas. > Is it actually possible to repair *only* a view, not its base table? If you > repair a view table which has an inconsistency, namely one view row in one > replica and a different view row in another replica, won't the repair just > cause both versions to be kept, which is wrong? It's possible to repair either the base table or the views. Normally you will want to repair only the base tables, but sometimes you will want to repair the views too, for instance, after a node is replaced - just like you do it with ordinary tables, since the node may have streamed from an inconsistent view replica. In this particular case, repairing the base table alone won't help because the base table can already be in sync, so it's necessary to repair the view to ensure missed updates during range movements are propagated to all replicas. In fact repairing only the views without repairing the base table beforehand may propagate temporary inconsistencies if stale views were already garbage collected on a subset of the replicas, so I will update the notice to state that repair must be run on the base table (to fix temporary inconsistencies) and then on the views. When there are permanent inconsistencies though (when the base is consistent and the view has extraneous rows), it doesn't really matter if the inconsistency is present on a subset or all view replicas, since the inconsistency is already visible to clients. The only way to fix permanent inconsistencies currently is to drop and re-create the view. CASSANDRA-10346 was created to address this. If you have more comments about CASSANDRA-14251 would you mind adding them to the ticket itself so the discussion is registered on the relevant JIRA? 2018-02-22 7:53 GMT-03:00 Nadav Har'El <n...@scylladb.com>: > On Thu, Feb 22, 2018 at 12:54 AM, Paulo Motta <pauloricard...@gmail.com> > wrote: > >> >> Good catch! This indeed seems to be a regression caused by >> CASSANDRA-13069, so I created CASSANDRA-14251 to restore the correct >> behavior. >> > > I have a question about your patch > https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-14251 > In the NEWS.txt you say that users "should run repair on the views". Is it > actually possible to repair *only* a view, > not its base table? If you repair a view table which has an inconsistency, > namely one view row in one replica and a > different view row in another replica, won't the repair just cause both > versions to be kept, which is wrong? > > Nadav. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org