Hi Chuck,

On Thu, May 7, 2020 at 10:14 AM Check Peck <comptechge...@gmail.com> wrote:

> I have a scylla table as shown below:
>

(Please note that this is the Apache Cassandra users mailing list. Of
course, the feature is the same, so let me answer it here.)


>
>     cqlsh:sampleks> describe table test;
>
>
>     CREATE TABLE test (
>
>         client_id int,
>
>         when timestamp,
>
>         process_ids list<int>,
>
>         md text,
>
>         PRIMARY KEY (client_id, when) ) WITH CLUSTERING ORDER BY (when
> DESC)
>
>         AND bloom_filter_fp_chance = 0.01
>
>         AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
>
>         AND comment = ''
>
>         AND compaction = {'class': 'TimeWindowCompactionStrategy',
> 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS'}
>
>         AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
>         AND crc_check_chance = 1.0
>
>         AND dclocal_read_repair_chance = 0.1
>
>         AND default_time_to_live = 0
>
>         AND gc_grace_seconds = 172800
>
>         AND max_index_interval = 1024
>
>         AND memtable_flush_period_in_ms = 0
>
>         AND min_index_interval = 128
>
>         AND read_repair_chance = 0.0
>
>         AND speculative_retry = '99.0PERCENTILE';
>
>
> And I see this is how we are querying it. It's been a long time I worked
> on cassandra so this “PER PARTITION LIMIT” is new thing to me (looks like
> recently added). Can someone explain what does this do with some example in
> a layman language? I couldn't find any good doc on that which explains
> easily.
>
>
>     SELECT * FROM test WHERE client_id IN ? PER PARTITION LIMIT 1;
>

The "PER PARTITION LIMIT" option is documented here, although I do agree
it's a rather terse explanation:

https://cassandra.apache.org/doc/latest/cql/dml.html#limiting-results

What it does is it limits the number of returned rows *per partition*. So,
for example, with your schema, if you have the following data:

cqlsh:ks1> SELECT client_id, when FROM test;

 client_id | when
-----------+---------------------------------
         1 | 2020-01-01 22:00:00.000000+0000
         1 | 2019-12-31 22:00:00.000000+0000
         2 | 2020-02-12 22:00:00.000000+0000
         2 | 2020-02-11 22:00:00.000000+0000
         2 | 2020-02-10 22:00:00.000000+0000

(5 rows)

You can ask the query to limit the number of rows returned for each
"client_id". For example, with limit of "1", you'd have:

cqlsh:ks1> SELECT client_id, when FROM test PER PARTITION LIMIT 1;

 client_id | when
-----------+---------------------------------
         1 | 2020-01-01 22:00:00.000000+0000
         2 | 2020-02-12 22:00:00.000000+0000

(2 rows)

Increasing limit to "2", would yield:

cqlsh:ks1> SELECT client_id, when FROM test PER PARTITION LIMIT 2;

 client_id | when
-----------+---------------------------------
         1 | 2020-01-01 22:00:00.000000+0000
         1 | 2019-12-31 22:00:00.000000+0000
         2 | 2020-02-12 22:00:00.000000+0000
         2 | 2020-02-11 22:00:00.000000+0000

(4 rows)

Hope this helps!

Regards,

- Pekka

Reply via email to