I over-simplified the original example. In the real model I cannot just
merge the row types. Suppose
create table t1(
p int,
q1 int,
c1 int,
primary key( p, q1 )
)
create table t2(
p int,
q2 uuid,
c2 text,
primary key( p, q2 )
)
Merging the tables will be slightly ugly and waste some storage in the
clustering columns:
create table t(
p int,
rowtype tinyint, // t1 or t2
q1 int, q2 uuid, // depending on rowtype, either q1 or q2 is unused
c1 int, c2 text, // depending on rowtype, either c1 or c2 is null
primary key( p, rowtype, q1, q2)
)
Nevertheless, putting them into one table seems faster as we only need
one query to get both types, and have better cache locality. Am I correct?
On Saturday, January 09, 2016 06:47 AM, Jack Krupansky wrote:
A simple denormalization is probably all that is called for - just
merge the two tables into one (their union.) No need for this row type.
-- Jack Krupansky
On Fri, Jan 8, 2016 at 9:30 AM, Jeff Jirsa <jeff.ji...@crowdstrike.com
<mailto:jeff.ji...@crowdstrike.com>> wrote:
You’ll see better performance using a slice (which is effectively
what will happen if you put them into the same table and use
query-1table-b), as each node will only need to merge
cells/results once. It may not be twice as fast, but it’ll be fast
enough to make it worthwhile.
On 1/8/16, 12:13 AM, "Bamoqi" <bam...@gmail.com
<mailto:bam...@gmail.com>> wrote:
>[Correction of the original message which contains typos in code.]
>
>Is it good for performance to put rows that are of different
types but
>are always queried together in the same table partition?
>
>My consideration is that whether doing so will result in better
>memory/disk cache locality.
>
>Suppose I need to query for 2 different types of rows for a frequent
>user request, I can use 2 tables or 1 table:
>
>2 tables:
>
> create table t1(
> partitionkey int primary key,
> col1 int, col2 int, ...
> )
> create table t2(
> partitionkey int primary key,
> col3 int, col4 int, ...
> )
>
>query-2table:
> select col1,col2 from t1 where partitionkey = ?
> select col3,col4 from t2 where partitionkey = ?
>
>1 table:
>
> create table t(
> partitionkey int,
> rowtype tinyint,
> col1 int, col2 int, ...
> col3 int, col4 int, ...
> primary key( partitionkey, rowtype )
> )
>
>query-1table-a:
> select col1,col2 from t where partitionkey = ? and rowtype = 1
> select col3,col4 from t where partitionkey = ? and rowtype = 2
>
>or alternatively, query-1table-b:
> select rowtype,col1,col2,col3,col4 from t where partitionkey = ?
> // Used columns are `null`. Switch on `rowtype` in the app code
>
>Is there significant performance difference in query-2table,
>query-1table-a, query-1table-b?
>Is the cassandra client/coordinator smart enough to direct subsequent
>queries of the same (table, partitionkey) to the same node so
they can
>reuse a cached page?
>
>Regards & Thanks