Re: Inconsistent count(*) and distinct results from Cassandra

DuyHai Doan Wed, 04 Mar 2015 14:12:35 -0800

"Is it to be expected that select count(*) from ... and select distinct
partition-key-columns from ... to yield inconsistent results between
executions even though the table at hand isn't written to?"

Actually, depending on the definition of your primary key, select count(*)
and select distinct partition-key-column may not yield the same result and
it is perfectly normal.

The "select distinct partition-key-column" effectively return distinct
tuples of partition key columns WHEREAS select count(*) return distinct
tuples of primary-key-columns (e.g. partition keys + clustering columns)

In your example table, PRIMARY KEY ((id, bucket), offset) so

SELECT DISTINCT id,bucket FROM tbl will return all distinct couples of
(id,bucket)

SELECT COUNT(*) FROM tbl will count all distinct triplets of
(id,bucket,offset)

The trick is that SELECT COUNT(*) return the total number of CQL rows, not
the count of partitions ....

On Wed, Mar 4, 2015 at 3:54 AM, Rumph, Frens Jan <m...@frensjan.nl> wrote:

> Hi,
>
> Is it to be expected that select count(*) from ... and select distinct
> partition-key-columns from ... to yield inconsistent results between
> executions even though the table at hand isn't written to?
>
> I have a table in a keyspace with replication_factor = 1 which is
> something like:
>
> CREATE TABLE tbl (
>     id frozen<id_type>,
>     bucket bigint,
>     offset int,
>     value double,
>     PRIMARY KEY ((id, bucket), offset)
> )
>
> The frozen udt is:
>
> CREATE TYPE id_type (
>     tags map<text, text>
> );
>
> When I do select count(*) from tbl several times the actual count varies
> with 5 to 10%. Also when performing select distinct id, bucket from tbl the
> results aren't consistent over several query executions. The table is not
> being written to at the time I performed the queries.
>
> Is this to be expected? Or is this a bug? Is there a alternative method /
> workaround?
>
> I'm using cqlsh 5.0.1 with Cassandra 2.1.2 on 64bit fedora 21 with Oracle
> Java 1.8.0_31.
>
> Thanks in advance,
> Frens Jan
>

Re: Inconsistent count(*) and distinct results from Cassandra

Reply via email to