Hello,

Say I have time series data for a table like this:

CREATE TABLE mytimeseries (
    pk_part1  text,
    partition bigint, << e.g. partition per day or per hour
    pk_part2  text, << this is part of the partition key so I can
split write load
    message_id  timeuuid,
    secondary_key1  text,
    secondary_key2   text,
    .
    <more columns>
    .
    PRIMARY KEY ((pk_part1, partition, pk_part2), message_id));

Most of the time I will need to do queries with
pk_part1/partition/pk_part2/message_id range. So this is what I
optimize for.

Sometimes, however, I will need to do queries with
pk_part1/partition/message_id range and some combination of
secondary_key1 (95% of the time there is a one-to-one relationship
with pk_part1) or secondary_key2 (for each secondary_key2 there will
be many pk_part2 values).

In this time series scenario, to efficiently make use of
secondary_key1/secondary_key2 as Cassandra secondary indexes for these
queries I assume that secondary_key1/secondary_key_2 would really need
to be composites combined into one column (in SQL I would create
multi-column indexes)? i.e.:

secondary_key_1 - pk_part1 + partition_key + real_secondary_key_1
secondary_key_2 - pl_part2 + partition_key + real_secondary_key_2

Would this be correct? Just making sure I understand how to best use
secondary indexes in Cassandra with time series data.

thanks in advance,
Gareth

Reply via email to