> Is this a realistic case when Cassandra (unless I'm missing something) is
limited to adding or removing a single node at a time? I'm sure this
can happen under some sort of generic range movement of some
sort (how does one initiate such movement, and why), but will it happen
under "normal" conditions of node bootstrap or decomission of a single node?

It's possible to make simultaneous range movements when either
{{-Dcassandra.consistent.range.movement=false}}(CASSANDRA-7069) or
{{-Dcassandra.consistent.simultaneousmoves.allow=true}}
(CASSANDRA-11005) are specified.

In any case, I'm not saying it's not possible, just that we cannot
apply this optimization when there are simultaneous range movements in
the same rack.

> How/when would we have two pending nodes for a single view partition?

Actually I meant if there are multiple range movements going on in the
same rack, not exactly in the same partition.

> Yes, it seems it will not be trivial. But if this is the common case in
common operations such as node addition or removal, it may significantly reduce
(from RF*2 to RF+1) the number of view updates being sent around, and avoid
MV update performance degredation during the streaming process.

Agreed, we should definitely look into making this optimization, but
just was never done before due to other priorities, please open a
ticket for it. There's a similar optimization that can be done for
view batchlog replays - right now the view update is sent to all
replicas during batchlog replay, but we could simplify it and also
send only to the paired view replicas.

> Is it actually possible to repair *only* a view, not its base table? If you 
> repair a view table which has an inconsistency, namely one view row in one 
> replica and a different view row in another replica, won't the repair just 
> cause both versions to be kept, which is wrong?

It's possible to repair either the base table or the views. Normally
you will want to repair only the base tables, but sometimes you will
want to repair the views too, for instance, after a node is replaced -
just like you do it with ordinary tables, since the node may have
streamed from an inconsistent view replica.

In this particular case, repairing the base table alone won't help
because the base table can already be in sync, so it's necessary to
repair the view to ensure missed updates during range movements are
propagated to all replicas.

In fact repairing only the views without repairing the base table
beforehand may propagate temporary inconsistencies if stale views were
already garbage collected on a subset of the replicas, so I will
update the notice to state that repair must be run on the base table
(to fix temporary inconsistencies) and then on the views.

When there are permanent inconsistencies though (when the base is
consistent and the view has extraneous rows), it doesn't really matter
if the inconsistency is present on a subset or all view replicas,
since the inconsistency is already visible to clients. The only way to
fix permanent inconsistencies currently is to drop and re-create the
view. CASSANDRA-10346 was created to address this.

If you have more comments about CASSANDRA-14251 would you mind adding
them to the ticket itself so the discussion is registered on the
relevant JIRA?

2018-02-22 7:53 GMT-03:00 Nadav Har'El <n...@scylladb.com>:
> On Thu, Feb 22, 2018 at 12:54 AM, Paulo Motta <pauloricard...@gmail.com>
> wrote:
>
>>
>> Good catch! This indeed seems to be a regression caused by
>> CASSANDRA-13069, so I created CASSANDRA-14251 to restore the correct
>> behavior.
>>
>
> I have a question about your patch
> https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-14251
> In the NEWS.txt you say that users "should run repair on the views". Is it
> actually possible to repair *only* a view,
> not its base table? If you repair a view table which has an inconsistency,
> namely one view row in one replica and a
> different view row in another replica, won't the repair just cause both
> versions to be kept, which is wrong?
>
> Nadav.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to