Caleb Rackliffe created CASSANDRA-20639:
-------------------------------------------

             Summary: Replica filtering protection can trigger short-read 
protection too aggressively when the LIMIT is less than the number of results 
in a partition
                 Key: CASSANDRA-20639
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20639
             Project: Apache Cassandra
          Issue Type: Improvement
          Components: Consistency/Coordination, Feature/SAI
            Reporter: Caleb Rackliffe
            Assignee: Caleb Rackliffe


{{ReplicaFilteringProtection#queryProtectedPartitions()}} provides "completed" 
partitions to the {{DataResolver}} in two steps. First, it consumes the initial 
merged query results from the replicas, via a {{PartitionIterator}} which is 
short-read protected. As it does this, it consumes all matches in a partition. 
This forces the row data through RFP's merge listener, which catalogs the 
places where replicas are "silent" marks them for completion. Second, 
PartitionBuilder uses this information to complete the partition with data from 
the replicas that provided ambiguous results.

The problem here is in the first step. When the total number of matches in a 
large partition is a large multiple of the LIMIT, consuming all the marches in 
the partition triggers a flurry of short-read protection reads to any replicas 
that actually provided enough results to hit the limit. This problem is 
somewhat mitigated by CASSANDRA-20566 if we can use strict filtering and 
therefore {{SinglePartitionReadCommand}}, where digest matches bypass RFP 
altogether. (This would be especially likely with small limits and reasonably 
repaired data.)

Here's a short test that should hit all of this:

(Just put a breakpoint in {{queryProtectedPartitions()}} in {{hasNext()}} and 
then in {{ShortReadPartitionsProtection#executeReadCommand()}} to see SRP reads 
being sent.)

{noformat}
@Test
public void testShortReadNoSRP()
{
    CLUSTER.schemaChange(withKeyspace("CREATE TABLE %s.short_read_no_srp (k 
int, c int, a int, b int, PRIMARY KEY (k, c)) WITH read_repair = 'NONE'"));
    CLUSTER.schemaChange(withKeyspace("CREATE INDEX ON %s.short_read_no_srp(a) 
USING 'sai'"));
    CLUSTER.schemaChange(withKeyspace("CREATE INDEX ON %s.short_read_no_srp(b) 
USING 'sai'"));
    SAIUtil.waitForIndexQueryable(CLUSTER, KEYSPACE);

    CLUSTER.get(1).executeInternal(withKeyspace("INSERT INTO 
%s.short_read_no_srp(k, c, a) VALUES (0, 2, 1) USING TIMESTAMP 5"));

    String select = withKeyspace("SELECT * FROM %s.short_read_no_srp WHERE k = 
0 AND a = 1");
    Iterator<Object[]> initialRows = 
CLUSTER.coordinator(1).executeWithPaging(select, ConsistencyLevel.ALL, 1);
    assertRows(initialRows, row(0, 2, 1, null));
}
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to