Marcus Eriksson created CASSANDRA-18791:
-------------------------------------------
Summary: CEP-21 - Multiple TCM fixes for issues discovered by
unit, integration and simulation testing
Key: CASSANDRA-18791
URL: https://issues.apache.org/jira/browse/CASSANDRA-18791
Project: Cassandra
Issue Type: Improvement
Components: Cluster/Membership
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson
Full branch: [https://github.com/krummas/cassandra/commits/marcuse/cep-21-tcm]
Tests:
[cci|https://app.circleci.com/pipelines/github/krummas/cassandra/885/workflows/7cc3e1a1-45b9-4069-bb02-2a46855b4dbe]
- current status is:
unit tests: 35/12052 failures
jvm dtests: 23/1459 failures
python dtests: 110/1018 failures
We will spend the next few weeks getting all the test targets down to 0
Summary of changes;
[CEP-21] Python dtest fixes * maybe fix hintedhandoff test
- [https://github.com/krummas/cassandra/commit/3da91c26fb]
- [https://github.com/krummas/cassandra/commit/98e444adbb]
[CEP-21] In-JVM DTest fixes
- [https://github.com/krummas/cassandra/compare/6491a70041...c9126a8024]
[CEP-21] Unit test fixes
- [https://github.com/krummas/cassandra/compare/02be7aa71c...6491a70041]
[CEP-21] Escape infinite local log loop on replica mis-configuration
- [https://github.com/krummas/cassandra/commit/02be7aa71c]
Currently different replicas can have different configurations (guardrails for
example) If a transformation is not applied on a replica, this node got stuck
in an infinite loop. For now escape that loop until we have a better solution.
[CEP-21] Fix batchlog consistency errors during epoch bumps
- [https://github.com/krummas/cassandra/commit/c853bf864e]
[CEP-21] Avoid using batches in distributed metadata log keyspace
- [https://github.com/krummas/cassandra/commit/d4b4766e0b]
[CEP-21] Fix table metadata serialization
- [https://github.com/krummas/cassandra/commit/4056bab669]
[CEP-21] add more metrics
- [https://github.com/krummas/cassandra/commit/b6ccb559f5]
[CEP-21] getHostIdForEndpoint return null if unknown endpoint
- [https://github.com/krummas/cassandra/commit/b9243df05b]
[CEP-21] CMS handling
- [https://github.com/krummas/cassandra/commit/15eea30d43]
- [https://github.com/krummas/cassandra/commit/d64da5f5e4]
- [https://github.com/krummas/cassandra/commit/61deb52811]
[CEP-21] Upgrade fixes
- [https://github.com/krummas/cassandra/commit/33d186b4ce]
Properly set system.local host id on upgrade.
- [https://github.com/krummas/cassandra/commit/b96bdc83e1]
If replica misses migration message, set migration as successfull when it sees
the first epoch bump.
- [https://github.com/krummas/cassandra/commit/712828bc82]
Handle hints on upgrade - we change the hostid when enabling CMS, hints should
be delivered before that.
[CEP-21] Catchup/log fetching improvements
- [https://github.com/krummas/cassandra/commit/31a183e236]
When an instance sees a message from a peer with a newer epoch, try to catch up
from that peer instead of the CMS to reduce load on the CMS nodes and to allow
for cluster to quiesce in the case of the CMS being down.
- [https://github.com/krummas/cassandra/commit/8c6a4b35db]
We can get a snapshot when catching up, in this case the pending log should
first apply the snapshot and skip any previous entries.
- [https://github.com/krummas/cassandra/commit/387853487f]
When deserializing partition update, allow if current epoch >= serialized epoch
- [https://github.com/krummas/cassandra/commit/626d224716]
When we replay from a snapshot we might see a node as LEFT for the first time
(it was bootstrapped and left while we were down)
[CEP-21] Require Paxos V2 for cluster metadata log operations
- [https://github.com/krummas/cassandra/commit/2217f551a6]
TCM is required to use Paxos V2 to because of the way the legacy paxos path
uses a keyspace’s RF to assert whether there are enough available replicas to
perform the read before a CAS. It doesn’t work properly with meta strategy when
adding CMS members
[CEP-21] Disaster recovery
- [https://github.com/krummas/cassandra/commit/9011233604]
Allow an instance to dump its current cluster metadata, and force-boot from it.
Basically we need a way to force an instance to become the CMS, in case the
original CMS goes down.
[CEP-21] Switch nodeId from uuid to int
- [https://github.com/krummas/cassandra/commit/aea5500ae0]
[CEP-21] Make CQLSSTableWriter exclusively a client utility
- [https://github.com/krummas/cassandra/commit/0693b22297]
[CEP-21] Support nodetool assasinate
- [https://github.com/krummas/cassandra/commit/312a1c1b0e]
[CEP-21] In progress sequence updates
- [https://github.com/krummas/cassandra/commit/7f56e0e5b3]
Protection against out-of-order and repeated execution, sequence rediscovery
and reliability improvements.
- [https://github.com/krummas/cassandra/commit/7ddb941d80]
DC and RF aware acks for multistep operations. Make progress barrier
consistency level configurable.
[CEP-21] Enforce data ownership checks
- [https://github.com/krummas/cassandra/commit/5c42fd098c]
Never accept operations for ranges we don't own.
[CEP-21] Gossip fixes
- [https://github.com/krummas/cassandra/commit/8572735e28]
Several gossip issues found during upgrade and load testing
- [https://github.com/krummas/cassandra/commit/ed785cb414]
Avoid gossip deadlock when merging CM nodes to gossip
- [https://github.com/krummas/cassandra/commit/e40c3a4ea]
Replaced endpoints should be evicted from gossip like in previous versions.
[CEP-21] Re-enable startup checks on non-test initialization
- [https://github.com/krummas/cassandra/commit/63013ad366]
[CEP-21] Unify streaming: make all operations use explicit ranges for streaming
- [https://github.com/krummas/cassandra/commit/ccba2e84de]
All streaming operations now use a movement map describing what should be
streamed where.
[CEP-21] Add vtable for metadata log
- [https://github.com/krummas/cassandra/commit/9fa4d61e5a]
[CEP-21] Add exception code to commit result if rejected
- [https://github.com/krummas/cassandra/commit/7331e0842b]
[CEP-21] Make cleanup safe to run during range movements
- [https://github.com/krummas/cassandra/commit/6f990c118f]
[CEP-21] ReplicaPlan recomputation and stillAppliesTo implementation for Paxos
- [https://github.com/krummas/cassandra/commit/0e5cc6a4fd]
[CEP-21] Update index status fixes post-rebase
- [https://github.com/krummas/cassandra/commit/06fba6bbc0]
[CEP-21] Create new auth tables, remove cidr constants for column names
- [https://github.com/krummas/cassandra/commit/fef280dda6]
[CEP-21] Schema fixes
- [https://github.com/krummas/cassandra/commit/dd9a7e9752]
Schema cleanups, remove old schema pulling
- [https://github.com/krummas/cassandra/commit/9f0538c4b3]
Don't include system_distributed in initial schema.
- [https://github.com/krummas/cassandra/commit/4d5fce6884]
Simplify check for whether DROP COMPACT STORAGE is permitted
- [https://github.com/krummas/cassandra/commit/0bb8efb8f0]
Don't invalidate prepared stmt cache on every schema change
- [https://github.com/krummas/cassandra/commit/93517d9ee4]
Allow Schema.instance to be initialized empty for client apps
- [https://github.com/krummas/cassandra/commit/f612d2cd3d]
Simplistic schema metadata diff
- [https://github.com/krummas/cassandra/commit/42bc2dd5ee]
Don't warn about new system tables in StartupCheck
- [https://github.com/krummas/cassandra/commit/b570e74bf3]
Exclude meta keyspace from TableMetrics::totalNonSystemTablesSize
[CEP-21] Simulator updates
- [https://github.com/krummas/cassandra/commit/7e368cfc3e]
Simulate NTS
- [https://github.com/krummas/cassandra/commit/00a34d88c3]
Multi cms simulation, Deadlines for local processor, reworked retries for local
and remote processor
- [https://github.com/krummas/cassandra/commit/e6dce927da]
Simulator harry integration
- [https://github.com/krummas/cassandra/commit/f30bf25060]
Eclipse warn
[CEP-21] Bootstrap fixes
- [https://github.com/krummas/cassandra/commit/6eea8aad69]
ClusterMetadata::writePlacementAllSettled handles bootstrapping nodes correctly
- [https://github.com/krummas/cassandra/commit/ef9e9c6074]
Reenable write survey mode
[CEP-21] Minor cleanups
- [https://github.com/krummas/cassandra/commit/335d10c9d6]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]