Hmm, no way we can do that in prod :/ Sent from my iPhone
> On 21 juni 2016, at 18:50, Julien Anguenot <jul...@anguenot.org> wrote: > > See my comments on the issue: I had to truncate and reinsert data in > these corrupted tables. > > AFAIK, there is no evidence that UDTs are responsible of this bad behavior. > >> On Tue, Jun 21, 2016 at 11:45 AM, Oskar Kjellin <oskar.kjel...@gmail.com> >> wrote: >> Yea I saw that one. We're not using UDT in the affected tables tho. >> >> Did you resolve it? >> >> Sent from my iPhone >> >>> On 21 juni 2016, at 18:27, Julien Anguenot <jul...@anguenot.org> wrote: >>> >>> I have experienced similar duplicate primary keys behavior with couple >>> of tables after upgrading from 2.2.x to 3.0.x. >>> >>> See comments on the Jira issue I opened at the time over there: >>> https://issues.apache.org/jira/browse/CASSANDRA-11887 >>> >>> >>>> On Tue, Jun 21, 2016 at 10:47 AM, Oskar Kjellin <oskar.kjel...@gmail.com> >>>> wrote: >>>> Hi, >>>> >>>> We've done this upgrade in both dev and stage before and we did not see >>>> similar issues. >>>> After upgrading production today we have a lot issues tho. >>>> >>>> The main issue is that the Datastax client quite often does not get the >>>> data >>>> (even though it's the same query). I see similar flakyness by simply >>>> running >>>> cqlsh, although it does return it returns broken data. >>>> >>>> We are running a 3 node cluster with RF 3. >>>> >>>> I have this table >>>> >>>> CREATE TABLE keyspace.table ( >>>> >>>> a text, >>>> >>>> b text, >>>> >>>> c text, >>>> >>>> d list<text>, >>>> >>>> e text, >>>> >>>> f timestamp, >>>> >>>> g list<text>, >>>> >>>> h timestamp, >>>> >>>> PRIMARY KEY (a, b, c) >>>> >>>> ) >>>> >>>> >>>> Every other time I query (not exactly every other time, but random) I get: >>>> >>>> >>>> SELECT * from table where a = 'xxx' and b = 'xxx' >>>> >>>> a | b | c | d | e | f >>>> | g | h >>>> >>>> ---------------------+--------------+-----------------------------------------------+------------------+------------+---------------------------------+-----------------------+--------------------------------- >>>> >>>> xxx | xxx | ccc | null | null | 2089-11-30 >>>> 23:00:00.000000+0000 | ['fff'] | 2014-12-31 23:00:00.000000+0000 >>>> >>>> xxx | xxx | ddd | null | >>>> null | 2099-01-01 00:00:00.000000+0000 | ['fff'] | 2016-06-17 >>>> 13:29:36.000000+0000 >>>> >>>> >>>> Which is the expected output. >>>> >>>> >>>> But I also get: >>>> >>>> a | b | c | d | e | f >>>> | g | h >>>> >>>> ---------------------+--------------+-----------------------------------------------+------------------+------------+---------------------------------+-----------------------+--------------------------------- >>>> >>>> xxx | xxx | ccc | null | null | >>>> null | null | null >>>> >>>> xxx | xxx | ccc | null | null | 2089-11-30 >>>> 23:00:00.000000+0000 | ['fff'] | null >>>> >>>> xxx | xxx | ccc | null | null | >>>> null | null | 2014-12-31 23:00:00.000000+0000 >>>> >>>> xxx | xxx | ddd | null | >>>> null | null | null | >>>> null >>>> >>>> xxx | xxx | ddd | null | >>>> null | 2099-01-01 00:00:00.000000+0000 | ['fff'] | >>>> null >>>> >>>> xxx | xxx | ddd | null | >>>> null | null | null | 2016-06-17 >>>> 13:29:36.000000+0000 >>>> >>>> >>>> Notice that the same PK is returned 3 times. With different parts of the >>>> data. I believe this is what's currently killing our production >>>> environment. >>>> >>>> >>>> I'm running upgradesstables as of this moment, but it's not finished yet. I >>>> started a repair before but nothing happened. The upgradesstables finished >>>> now on 2 out of 3 nodes, but production is still down :/ >>>> >>>> >>>> We also see these in the logs, over and over again: >>>> >>>> DEBUG [ReadRepairStage:4] 2016-06-21 15:44:01,119 ReadCallback.java:235 - >>>> Digest mismatch: >>>> >>>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key >>>> DecoratedKey(-1566729966326640413, 336b35356c49537731797a4a5f64627a797236) >>>> (b3dcfcbeed6676eae7ff88cc1bd251fb vs 6e7e9225871374d68a7cdb54ae70726d) >>>> >>>> at >>>> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) >>>> ~[apache-cassandra-3.5.0.jar:3.5.0] >>>> >>>> at >>>> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:226) >>>> ~[apache-cassandra-3.5.0.jar:3.5.0] >>>> >>>> at >>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>> [na:1.8.0_72] >>>> >>>> at >>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>> [na:1.8.0_72] >>>> >>>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72] >>>> >>>> >>>> Any help is much appreciated >>> >>> >>> >>> -- >>> Julien Anguenot (@anguenot) > > > > -- > Julien Anguenot (@anguenot)