Thank you for testing, Christian What did you set commitlog_sync in cassandra.yaml? I set commitlog_sync batch (window 2ms) as below.
commitlog_sync: batch commitlog_sync_batch_window_in_ms: 2 The problem didn't occur by setting commitlog_sync periodic(default). regards, yuji On Thu, Aug 25, 2016 at 6:11 PM, horschi <hors...@gmail.com> wrote: > (running C* 2.2.7) > > On Thu, Aug 25, 2016 at 11:10 AM, horschi <hors...@gmail.com> wrote: > >> Hi Yuji, >> >> I tried your script a couple of times. I did not experience any stale >> values. (On my Linux laptop) >> >> regards, >> Ch >> >> On Mon, Aug 15, 2016 at 7:29 AM, Yuji Ito <y...@imagine-orb.com> wrote: >> >>> Hi, >>> >>> I can reproduce the problem with the following script. >>> I got rows which should be truncated. >>> If truncating is executed only once, the problem doesn't occur. >>> >>> The test for multi nodes (replication_factor:3, kill & restart C* >>> processes in all nodes) can also reproduce it. >>> >>> test script: >>> ---- >>> >>> ip=xxx.xxx.xxx.xxx >>> >>> echo "0. prepare a table" >>> cqlsh $ip -e "drop keyspace testdb;" >>> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class': >>> 'SimpleStrategy', 'replication_factor': '1'};" >>> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val >>> int);" >>> >>> echo "1. insert rows" >>> for key in $(seq 1 10) >>> do >>> cqlsh $ip -e "insert into testdb.testtbl (key, val) values($key, >>> 1000) IF NOT EXISTS;" >> /dev/null 2>&1 >>> done >>> >>> echo "2. truncate the table twice" >>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl" >>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl" >>> >>> echo "3. kill C* process" >>> ps auxww | grep "CassandraDaemon" | awk '{if ($13 ~ /cassand/) print >>> $2}' | xargs sudo kill -9 >>> >>> echo "4. restart C* process" >>> sudo /etc/init.d/cassandra start >>> sleep 20 >>> >>> echo "5. check the table" >>> cqlsh $ip -e "select * from testdb.testtbl;" >>> >>> ---- >>> >>> test result: >>> ---- >>> >>> 0. prepare a table >>> 1. insert rows >>> 2. truncate the table twice >>> Consistency level set to ALL. >>> Consistency level set to ALL. >>> 3. kill C* process >>> 4. restart C* process >>> Starting Cassandra: OK >>> 5. check the table >>> >>> key | val >>> -----+------ >>> 5 | 1000 >>> 10 | 1000 >>> 1 | 1000 >>> 8 | 1000 >>> 2 | 1000 >>> 4 | 1000 >>> 7 | 1000 >>> 6 | 1000 >>> 9 | 1000 >>> 3 | 1000 >>> >>> (10 rows) >>> >>> ---- >>> >>> >>> Thanks Christian, >>> >>> I tried with durable_writes=False. >>> It failed. I guessed this failure was caused by another problem. >>> I use SimpleStrategy. >>> A keyspace using the SimpleStrategy isn't permitted to use >>> durable_writes=False. >>> >>> >>> Regards, >>> Yuji >>> >>> On Thu, Aug 11, 2016 at 12:41 AM, horschi <hors...@gmail.com> wrote: >>> >>>> Hi Yuji, >>>> >>>> ok, perhaps you are seeing a different issue than I do. >>>> >>>> Have you tried with durable_writes=False? If the issue is caused by the >>>> commitlog, then it should work if you disable durable_writes. >>>> >>>> Cheers, >>>> Christian >>>> >>>> >>>> >>>> On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> wrote: >>>> >>>>> Thanks Christian >>>>> >>>>> can you reproduce the behaviour with a single node? >>>>> >>>>> I tried my test with a single node. But I can't. >>>>> >>>>> This behaviour is seems to be CQL only, or at least has gotten worse >>>>>> with CQL. I did not experience this with Thrift. >>>>> >>>>> I truncate tables with CQL. I've never tried with Thrift. >>>>> >>>>> I think that my problem can happen when truncating even succeeds. >>>>> That's because I check all records after truncating. >>>>> >>>>> I checked the source code. >>>>> ReplayPosition.segment and position become -1 and 0 >>>>> (ReplayPosition.NONE) in dscardSSTables() at truncating a table when there >>>>> is no SSTable. >>>>> I guess that ReplayPosition.segment shouldn't be -1 at truncating a >>>>> table in this case. >>>>> replayMutation() can request unexpected replay mutations because of >>>>> this segment's value. >>>>> >>>>> Is there anyone familiar with truncate and replay? >>>>> >>>>> Regards, >>>>> Yuji >>>>> >>>>> >>>>> On Mon, Aug 8, 2016 at 6:36 PM, horschi <hors...@gmail.com> wrote: >>>>> >>>>>> Hi Yuji, >>>>>> >>>>>> can you reproduce the behaviour with a single node? >>>>>> >>>>>> The reason I ask is because I probably have the same issue with my >>>>>> automated tests (which run truncate between every test), which run on my >>>>>> local laptop. >>>>>> >>>>>> Maybe around 5 tests randomly fail out of my 1800. I can see that the >>>>>> failed tests sometimes show data from other tests, which I think must be >>>>>> because of a failed truncate. This behaviour is seems to be CQL only, or >>>>>> at >>>>>> least has gotten worse with CQL. I did not experience this with Thrift. >>>>>> >>>>>> regards, >>>>>> Christian >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Aug 8, 2016 at 7:34 AM, Yuji Ito <y...@imagine-orb.com> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I have a question about clearing table and commit log replay. >>>>>>> After some tables were truncated consecutively, I got some stale >>>>>>> values. >>>>>>> This problem doesn't occur when I clear keyspaces with DROP (and >>>>>>> CREATE). >>>>>>> >>>>>>> I'm testing the following test with node failure. >>>>>>> Some stale values appear at checking phase. >>>>>>> >>>>>>> Test iteration: >>>>>>> 1. initialize tables as below >>>>>>> 2. request a lot of read/write concurrently >>>>>>> 3. check all records >>>>>>> 4. repeat from the beginning >>>>>>> >>>>>>> I use C* 2.2.6. There are 3 nodes (replication_factor: 3). >>>>>>> Each node kills cassandra process at random intervals and restarts >>>>>>> it immediately. >>>>>>> >>>>>>> My initialization: >>>>>>> 1. clear tables with TRUNCATE >>>>>>> 2. INSERT initial records >>>>>>> 3. check if all values are correct >>>>>>> >>>>>>> If any phase fails (because of node failure), the initialization >>>>>>> starts all over again. >>>>>>> So, tables are sometimes truncated consecutively. >>>>>>> Though the check in the initialization is OK, stale data appears >>>>>>> when I execute "SELECT * FROM mykeyspace.mytable;" after a lot of >>>>>>> requests >>>>>>> are completed. >>>>>>> >>>>>>> The problem is likely to occur when the ReplayPosition's value in >>>>>>> "truncated_at" is initialized as below after an empty table is >>>>>>> truncated. >>>>>>> >>>>>>> Column Family ID: truncated_at >>>>>>> XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: 0xffffffffffffffff000000000000 >>>>>>> 0156597cd4c7 >>>>>>> (this value was acquired just after phase 1 in my initialization) >>>>>>> >>>>>>> I guess some unexpected replays occur. >>>>>>> Does anyone know the behavior? >>>>>>> >>>>>>> Thanks, >>>>>>> Yuji >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >