Re: Inconsistent count(*) and distinct results from Cassandra

Mikhail Strebkov Wed, 04 Mar 2015 10:11:16 -0800

We have observed the same issue in our production Cassandra cluster (5 nodes in 
one DC). We use Cassandra 2.1.3 (I joined the list too late to realize we 
shouldn’t user 2.1.x yet) on Amazon machines (created from community AMI).

In addition to count variations with 5 to 10% we observe variations for the 
query “select * from table1 where time > '$fromDate' and time < '$toDate' allow 
filtering” results. We iterated through the results multiple times using 
official Java driver. We used that query for a huge data migration and were 
unpleasantly surprised that it is unreliable. In our case “nodetool repair” 
didn’t fix the issue.

So I echo Frens questions.

Thanks,

Mikhail

On Wed, Mar 4, 2015 at 3:55 AM, Rumph, Frens Jan <m...@frensjan.nl> wrote:

> Hi,
> Is it to be expected that select count(*) from ... and select distinct
> partition-key-columns from ... to yield inconsistent results between
> executions even though the table at hand isn't written to?
> I have a table in a keyspace with replication_factor = 1 which is something
> like:
> CREATE TABLE tbl (
>     id frozen<id_type>,
>     bucket bigint,
>     offset int,
>     value double,
>     PRIMARY KEY ((id, bucket), offset)
> )
> The frozen udt is:
> CREATE TYPE id_type (
>     tags map<text, text>
> );
> When I do select count(*) from tbl several times the actual count varies
> with 5 to 10%. Also when performing select distinct id, bucket from tbl the
> results aren't consistent over several query executions. The table is not
> being written to at the time I performed the queries.
> Is this to be expected? Or is this a bug? Is there a alternative method /
> workaround?
> I'm using cqlsh 5.0.1 with Cassandra 2.1.2 on 64bit fedora 21 with Oracle
> Java 1.8.0_31.
> Thanks in advance,
> Frens Jan

Re: Inconsistent count(*) and distinct results from Cassandra

Reply via email to