Re: Cluster not working after upgrade from 2.1.12 to 3.5.0

Julien Anguenot Tue, 21 Jun 2016 09:28:26 -0700

I have experienced similar duplicate primary keys behavior with couple
of tables after upgrading from 2.2.x to 3.0.x.


See comments on the Jira issue I opened at the time over there:
https://issues.apache.org/jira/browse/CASSANDRA-11887


On Tue, Jun 21, 2016 at 10:47 AM, Oskar Kjellin <oskar.kjel...@gmail.com> wrote:
> Hi,
>
> We've done this upgrade in both dev and stage before and we did not see
> similar issues.
> After upgrading production today we have a lot issues tho.
>
> The main issue is that the Datastax client quite often does not get the data
> (even though it's the same query). I see similar flakyness by simply running
> cqlsh, although it does return it returns broken data.
>
> We are running a 3 node cluster with RF 3.
>
> I have this table
>
> CREATE TABLE keyspace.table (
>
>   a text,
>
>     b text,
>
>     c text,
>
>     d list<text>,
>
>     e text,
>
>     f timestamp,
>
>     g list<text>,
>
>     h timestamp,
>
>     PRIMARY KEY (a, b, c)
>
> )
>
>
> Every other time I query (not exactly every other time, but random) I get:
>
>
> SELECT * from table where a = 'xxx' and b = 'xxx'
>
>  a             | b | c                                 | d | e | f
> | g            | h
>
> ---------------------+--------------+-----------------------------------------------+------------------+------------+---------------------------------+-----------------------+---------------------------------
>
>  xxx |          xxx | ccc |             null |       null | 2089-11-30
> 23:00:00.000000+0000 | ['fff'] | 2014-12-31 23:00:00.000000+0000
>
>  xxx |          xxx |                           ddd |             null |
> null | 2099-01-01 00:00:00.000000+0000 | ['fff'] | 2016-06-17
> 13:29:36.000000+0000
>
>
> Which is the expected output.
>
>
> But I also get:
>
>  a             | b | c                                 | d | e | f
> | g            | h
>
> ---------------------+--------------+-----------------------------------------------+------------------+------------+---------------------------------+-----------------------+---------------------------------
>
>  xxx |          xxx | ccc |             null |       null |
> null |                  null |                            null
>
>  xxx |          xxx | ccc |             null |       null | 2089-11-30
> 23:00:00.000000+0000 | ['fff'] |                            null
>
>  xxx |          xxx | ccc |             null |       null |
> null |                  null | 2014-12-31 23:00:00.000000+0000
>
>  xxx |          xxx |                           ddd |             null |
> null |                            null |                  null |
> null
>
>  xxx |          xxx |                           ddd |             null |
> null | 2099-01-01 00:00:00.000000+0000 | ['fff'] |
> null
>
>  xxx |          xxx |                           ddd |             null |
> null |                            null |                  null | 2016-06-17
> 13:29:36.000000+0000
>
>
> Notice that the same PK is returned 3 times. With different parts of the
> data. I believe this is what's currently killing our production environment.
>
>
> I'm running upgradesstables as of this moment, but it's not finished yet. I
> started a repair before but nothing happened. The upgradesstables finished
> now on 2 out of 3 nodes, but production is still down :/
>
>
> We also see these in the logs, over and over again:
>
> DEBUG [ReadRepairStage:4] 2016-06-21 15:44:01,119 ReadCallback.java:235 -
> Digest mismatch:
>
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(-1566729966326640413, 336b35356c49537731797a4a5f64627a797236)
> (b3dcfcbeed6676eae7ff88cc1bd251fb vs 6e7e9225871374d68a7cdb54ae70726d)
>
> at
> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85)
> ~[apache-cassandra-3.5.0.jar:3.5.0]
>
> at
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:226)
> ~[apache-cassandra-3.5.0.jar:3.5.0]
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_72]
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_72]
>
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72]
>
>
> Any help is much appreciated



-- 
Julien Anguenot (@anguenot)

Re: Cluster not working after upgrade from 2.1.12 to 3.5.0

Reply via email to