Hi Chuck, On Thu, May 7, 2020 at 10:14 AM Check Peck <comptechge...@gmail.com> wrote:
> I have a scylla table as shown below: > (Please note that this is the Apache Cassandra users mailing list. Of course, the feature is the same, so let me answer it here.) > > cqlsh:sampleks> describe table test; > > > CREATE TABLE test ( > > client_id int, > > when timestamp, > > process_ids list<int>, > > md text, > > PRIMARY KEY (client_id, when) ) WITH CLUSTERING ORDER BY (when > DESC) > > AND bloom_filter_fp_chance = 0.01 > > AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} > > AND comment = '' > > AND compaction = {'class': 'TimeWindowCompactionStrategy', > 'compaction_window_size': '1', 'compaction_window_unit': 'DAYS'} > > AND compression = {'sstable_compression': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > > AND crc_check_chance = 1.0 > > AND dclocal_read_repair_chance = 0.1 > > AND default_time_to_live = 0 > > AND gc_grace_seconds = 172800 > > AND max_index_interval = 1024 > > AND memtable_flush_period_in_ms = 0 > > AND min_index_interval = 128 > > AND read_repair_chance = 0.0 > > AND speculative_retry = '99.0PERCENTILE'; > > > And I see this is how we are querying it. It's been a long time I worked > on cassandra so this “PER PARTITION LIMIT” is new thing to me (looks like > recently added). Can someone explain what does this do with some example in > a layman language? I couldn't find any good doc on that which explains > easily. > > > SELECT * FROM test WHERE client_id IN ? PER PARTITION LIMIT 1; > The "PER PARTITION LIMIT" option is documented here, although I do agree it's a rather terse explanation: https://cassandra.apache.org/doc/latest/cql/dml.html#limiting-results What it does is it limits the number of returned rows *per partition*. So, for example, with your schema, if you have the following data: cqlsh:ks1> SELECT client_id, when FROM test; client_id | when -----------+--------------------------------- 1 | 2020-01-01 22:00:00.000000+0000 1 | 2019-12-31 22:00:00.000000+0000 2 | 2020-02-12 22:00:00.000000+0000 2 | 2020-02-11 22:00:00.000000+0000 2 | 2020-02-10 22:00:00.000000+0000 (5 rows) You can ask the query to limit the number of rows returned for each "client_id". For example, with limit of "1", you'd have: cqlsh:ks1> SELECT client_id, when FROM test PER PARTITION LIMIT 1; client_id | when -----------+--------------------------------- 1 | 2020-01-01 22:00:00.000000+0000 2 | 2020-02-12 22:00:00.000000+0000 (2 rows) Increasing limit to "2", would yield: cqlsh:ks1> SELECT client_id, when FROM test PER PARTITION LIMIT 2; client_id | when -----------+--------------------------------- 1 | 2020-01-01 22:00:00.000000+0000 1 | 2019-12-31 22:00:00.000000+0000 2 | 2020-02-12 22:00:00.000000+0000 2 | 2020-02-11 22:00:00.000000+0000 (4 rows) Hope this helps! Regards, - Pekka