[Correction of the original message which contains typos in code.]
Is it good for performance to put rows that are of different types but
are always queried together in the same table partition?
My consideration is that whether doing so will result in better
memory/disk cache locality.
Suppose I need to query for 2 different types of rows for a frequent
user request, I can use 2 tables or 1 table:
2 tables:
create table t1(
partitionkey int primary key,
col1 int, col2 int, ...
)
create table t2(
partitionkey int primary key,
col3 int, col4 int, ...
)
query-2table:
select col1,col2 from t1 where partitionkey = ?
select col3,col4 from t2 where partitionkey = ?
1 table:
create table t(
partitionkey int,
rowtype tinyint,
col1 int, col2 int, ...
col3 int, col4 int, ...
primary key( partitionkey, rowtype )
)
query-1table-a:
select col1,col2 from t where partitionkey = ? and rowtype = 1
select col3,col4 from t where partitionkey = ? and rowtype = 2
or alternatively, query-1table-b:
select rowtype,col1,col2,col3,col4 from t where partitionkey = ?
// Used columns are `null`. Switch on `rowtype` in the app code
Is there significant performance difference in query-2table,
query-1table-a, query-1table-b?
Is the cassandra client/coordinator smart enough to direct subsequent
queries of the same (table, partitionkey) to the same node so they can
reuse a cached page?
Regards & Thanks