"Is it to be expected that select count(*) from ... and select distinct partition-key-columns from ... to yield inconsistent results between executions even though the table at hand isn't written to?"
Actually, depending on the definition of your primary key, select count(*) and select distinct partition-key-column may not yield the same result and it is perfectly normal. The "select distinct partition-key-column" effectively return distinct tuples of partition key columns WHEREAS select count(*) return distinct tuples of primary-key-columns (e.g. partition keys + clustering columns) In your example table, PRIMARY KEY ((id, bucket), offset) so SELECT DISTINCT id,bucket FROM tbl will return all distinct couples of (id,bucket) SELECT COUNT(*) FROM tbl will count all distinct triplets of (id,bucket,offset) The trick is that SELECT COUNT(*) return the total number of CQL rows, not the count of partitions .... On Wed, Mar 4, 2015 at 3:54 AM, Rumph, Frens Jan <m...@frensjan.nl> wrote: > Hi, > > Is it to be expected that select count(*) from ... and select distinct > partition-key-columns from ... to yield inconsistent results between > executions even though the table at hand isn't written to? > > I have a table in a keyspace with replication_factor = 1 which is > something like: > > CREATE TABLE tbl ( > id frozen<id_type>, > bucket bigint, > offset int, > value double, > PRIMARY KEY ((id, bucket), offset) > ) > > The frozen udt is: > > CREATE TYPE id_type ( > tags map<text, text> > ); > > When I do select count(*) from tbl several times the actual count varies > with 5 to 10%. Also when performing select distinct id, bucket from tbl the > results aren't consistent over several query executions. The table is not > being written to at the time I performed the queries. > > Is this to be expected? Or is this a bug? Is there a alternative method / > workaround? > > I'm using cqlsh 5.0.1 with Cassandra 2.1.2 on 64bit fedora 21 with Oracle > Java 1.8.0_31. > > Thanks in advance, > Frens Jan >