andresbeckruiz opened a new pull request, #4746:
URL: https://github.com/apache/cassandra/pull/4746
## Summary
`BTree.FastBuilder.reset()` does not clear `savedBuffer` or `savedNextKey`,
allowing stale `ColumnMetadata` objects to leak when a `FastBuilder` is reused
from the thread-local pool after an exception during message deserialization.
During a schema disagreement, a `READ_REQ` deserialization failure on a
replica leaves a `FastBuilder` in a dirty state with `savedBuffer` and
`savedNextKey` populated from the source table's `ColumnMetadata`. When the
same thread reuses that `FastBuilder` for a subsequent BTree construction, the
stale entries leak into the new `BTree`, causing:
1. **ClassCastException** (CASSANDRA-21216): `ColumnMetadata` objects from
the source table end up in a `Row` BTree, causing `ClassCastException:
ColumnMetadata cannot be cast to Row` during mutations, reads, or flushes. This
occurs on the large-message path where messages exceeding ~64KB are
deserialized on `SEPWorker` threads that also service mutation tasks.
2. **SSTable header corruption** (CASSANDRA-21260): Stale columns from the
source table's `savedBuffer` leak into a victim table's `SerializationHeader`
via deletion-only mutations, writing foreign column entries into the SSTable
metadata on disk. This can also occur on the small-message path via Netty event
loop thread reuse, lowering the trigger threshold to tables with more than 31
columns.
More context regarding these bugs can be found in this [discussion
thread](https://lists.apache.org/thread/rzbcj3row70zynjvg6xwml1qprro6fz4).
## Fix
Null out `savedBuffer` and `savedNextKey` in `FastBuilder.reset()` for
both leaf and branch BTree nodes. Also add `savedNextKey = null` to
`AbstractUpdater.reset()` for consistency.
## Test plan
- JVM Dtest `BTreeFastBuilderContaminationTest`:
- `testSchemaDisagreementCorruptsPartitionViaFastBuilder`: Wide table
(4200 columns) triggers large-message deserialization on `SEPWorker` threads,
verifies no `ClassCastException` occurs after schema disagreement.
- `testSmallMessageContaminatesSSTableHeaderViaNettyEventLoop`:
Small-message scenario (150 columns) triggers deserialization on Netty event
loop, verifies no foreign columns appear in victim SSTable headers.
- Unit test `BTreeTest.testFastBuilderResetClearsSavedState`: Verifies
`FastBuilder.reset()` clears `savedBuffer`/`savedNextKey` when a builder is
abandoned without calling `build()`.
- All existing `BTreeTest` tests pass (12/12).
Patch by Andrés Beck-Ruiz, Runtian Liu, reviewed by <> for
CASSANDRA-21216, CASSANDRA-21260
Co-authored-by: Runtian Liu [email protected]
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]