[Correction of the original message which contains typos in code.]

Is it good for performance to put rows that are of different types but are always queried together in the same table partition?

My consideration is that whether doing so will result in better memory/disk cache locality.

Suppose I need to query for 2 different types of rows for a frequent user request, I can use 2 tables or 1 table:

2 tables:

  create table t1(
    partitionkey int primary key,
    col1 int, col2 int, ...
  )
  create table t2(
    partitionkey int primary key,
    col3 int, col4 int, ...
  )

query-2table:
  select col1,col2 from t1 where partitionkey = ?
  select col3,col4 from t2 where partitionkey = ?

1 table:

  create table t(
    partitionkey int,
    rowtype tinyint,
    col1 int, col2 int, ...
    col3 int, col4 int, ...
    primary key( partitionkey, rowtype )
  )

query-1table-a:
  select col1,col2 from t where partitionkey = ? and rowtype = 1
  select col3,col4 from t where partitionkey = ? and rowtype = 2

or alternatively, query-1table-b:
  select rowtype,col1,col2,col3,col4 from t where partitionkey = ?
  // Used columns are `null`. Switch on `rowtype` in the app code

Is there significant performance difference in query-2table, query-1table-a, query-1table-b? Is the cassandra client/coordinator smart enough to direct subsequent queries of the same (table, partitionkey) to the same node so they can reuse a cached page?

Regards & Thanks

Reply via email to