Hi, We are trying to evaluate read performance impact of having a wide row by pushing a partition out into clustering column. From all the information I could gather[1] <https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_how_cache_works_c.html> [2] <https://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlAboutReads.html> [3] <https://wiki.apache.org/cassandra/ReadPathForUsers> Key Cache as well as Partition Index point to Block Location of partition on the disk.
In case if we have a schema like below which would result in a wide table if pk is of high cardinality (Say Month in a time series data): CREATE TABLE ks.wide_row_table ( pk int, ck1 bigint, ck2 text, v1 text, v2 text, v3 bigint, PRIMARY KEY (pk, ck1, ck2) ); Suppose that a there is only one SSTable for this table at this instance and specific partition has reached 100MB will reading the first row by specifying first 0th row in the partition same as the last row in the partition (At 100 MB). In other words is there any heuristic to determine the disk offset by clustering column after partition key is specified to locate to the block in the disk or in the 2nd case complete 100MB partition will have to be scanned in order to figure out the relevant row. For simplicity sake lets assume that Row cache & OS page cache is disabled and all reads are hitting disk. Thanks & Regards, Bhuvan