Hi Christian, C* 2.2.7 doesn't cause this problem.
I can always reproduce it on some servers and my laptop by using 2.2.6. I reviewed the source code of 2.2.7. The above ReplayPosition updating was fixed. Thank you for your cooperation. yuji On Thu, Aug 25, 2016 at 11:40 PM, horschi <hors...@gmail.com> wrote: > Nop, still don't get stale values. (I just ran your script 3 times) > > On Thu, Aug 25, 2016 at 12:36 PM, Yuji Ito <y...@imagine-orb.com> wrote: > >> Thank you for testing, Christian >> >> What did you set commitlog_sync in cassandra.yaml? >> I set commitlog_sync batch (window 2ms) as below. >> >> commitlog_sync: batch >> commitlog_sync_batch_window_in_ms: 2 >> >> The problem didn't occur by setting commitlog_sync periodic(default). >> >> regards, >> yuji >> >> >> On Thu, Aug 25, 2016 at 6:11 PM, horschi <hors...@gmail.com> wrote: >> >>> (running C* 2.2.7) >>> >>> On Thu, Aug 25, 2016 at 11:10 AM, horschi <hors...@gmail.com> wrote: >>> >>>> Hi Yuji, >>>> >>>> I tried your script a couple of times. I did not experience any stale >>>> values. (On my Linux laptop) >>>> >>>> regards, >>>> Ch >>>> >>>> On Mon, Aug 15, 2016 at 7:29 AM, Yuji Ito <y...@imagine-orb.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I can reproduce the problem with the following script. >>>>> I got rows which should be truncated. >>>>> If truncating is executed only once, the problem doesn't occur. >>>>> >>>>> The test for multi nodes (replication_factor:3, kill & restart C* >>>>> processes in all nodes) can also reproduce it. >>>>> >>>>> test script: >>>>> ---- >>>>> >>>>> ip=xxx.xxx.xxx.xxx >>>>> >>>>> echo "0. prepare a table" >>>>> cqlsh $ip -e "drop keyspace testdb;" >>>>> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class': >>>>> 'SimpleStrategy', 'replication_factor': '1'};" >>>>> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val >>>>> int);" >>>>> >>>>> echo "1. insert rows" >>>>> for key in $(seq 1 10) >>>>> do >>>>> cqlsh $ip -e "insert into testdb.testtbl (key, val) values($key, >>>>> 1000) IF NOT EXISTS;" >> /dev/null 2>&1 >>>>> done >>>>> >>>>> echo "2. truncate the table twice" >>>>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl" >>>>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl" >>>>> >>>>> echo "3. kill C* process" >>>>> ps auxww | grep "CassandraDaemon" | awk '{if ($13 ~ /cassand/) print >>>>> $2}' | xargs sudo kill -9 >>>>> >>>>> echo "4. restart C* process" >>>>> sudo /etc/init.d/cassandra start >>>>> sleep 20 >>>>> >>>>> echo "5. check the table" >>>>> cqlsh $ip -e "select * from testdb.testtbl;" >>>>> >>>>> ---- >>>>> >>>>> test result: >>>>> ---- >>>>> >>>>> 0. prepare a table >>>>> 1. insert rows >>>>> 2. truncate the table twice >>>>> Consistency level set to ALL. >>>>> Consistency level set to ALL. >>>>> 3. kill C* process >>>>> 4. restart C* process >>>>> Starting Cassandra: OK >>>>> 5. check the table >>>>> >>>>> key | val >>>>> -----+------ >>>>> 5 | 1000 >>>>> 10 | 1000 >>>>> 1 | 1000 >>>>> 8 | 1000 >>>>> 2 | 1000 >>>>> 4 | 1000 >>>>> 7 | 1000 >>>>> 6 | 1000 >>>>> 9 | 1000 >>>>> 3 | 1000 >>>>> >>>>> (10 rows) >>>>> >>>>> ---- >>>>> >>>>> >>>>> Thanks Christian, >>>>> >>>>> I tried with durable_writes=False. >>>>> It failed. I guessed this failure was caused by another problem. >>>>> I use SimpleStrategy. >>>>> A keyspace using the SimpleStrategy isn't permitted to use >>>>> durable_writes=False. >>>>> >>>>> >>>>> Regards, >>>>> Yuji >>>>> >>>>> On Thu, Aug 11, 2016 at 12:41 AM, horschi <hors...@gmail.com> wrote: >>>>> >>>>>> Hi Yuji, >>>>>> >>>>>> ok, perhaps you are seeing a different issue than I do. >>>>>> >>>>>> Have you tried with durable_writes=False? If the issue is caused by >>>>>> the commitlog, then it should work if you disable durable_writes. >>>>>> >>>>>> Cheers, >>>>>> Christian >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks Christian >>>>>>> >>>>>>> can you reproduce the behaviour with a single node? >>>>>>> >>>>>>> I tried my test with a single node. But I can't. >>>>>>> >>>>>>> This behaviour is seems to be CQL only, or at least has gotten worse >>>>>>>> with CQL. I did not experience this with Thrift. >>>>>>> >>>>>>> I truncate tables with CQL. I've never tried with Thrift. >>>>>>> >>>>>>> I think that my problem can happen when truncating even succeeds. >>>>>>> That's because I check all records after truncating. >>>>>>> >>>>>>> I checked the source code. >>>>>>> ReplayPosition.segment and position become -1 and 0 >>>>>>> (ReplayPosition.NONE) in dscardSSTables() at truncating a table when >>>>>>> there >>>>>>> is no SSTable. >>>>>>> I guess that ReplayPosition.segment shouldn't be -1 at truncating a >>>>>>> table in this case. >>>>>>> replayMutation() can request unexpected replay mutations because of >>>>>>> this segment's value. >>>>>>> >>>>>>> Is there anyone familiar with truncate and replay? >>>>>>> >>>>>>> Regards, >>>>>>> Yuji >>>>>>> >>>>>>> >>>>>>> On Mon, Aug 8, 2016 at 6:36 PM, horschi <hors...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Yuji, >>>>>>>> >>>>>>>> can you reproduce the behaviour with a single node? >>>>>>>> >>>>>>>> The reason I ask is because I probably have the same issue with my >>>>>>>> automated tests (which run truncate between every test), which run on >>>>>>>> my >>>>>>>> local laptop. >>>>>>>> >>>>>>>> Maybe around 5 tests randomly fail out of my 1800. I can see that >>>>>>>> the failed tests sometimes show data from other tests, which I think >>>>>>>> must >>>>>>>> be because of a failed truncate. This behaviour is seems to be CQL >>>>>>>> only, or >>>>>>>> at least has gotten worse with CQL. I did not experience this with >>>>>>>> Thrift. >>>>>>>> >>>>>>>> regards, >>>>>>>> Christian >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Aug 8, 2016 at 7:34 AM, Yuji Ito <y...@imagine-orb.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I have a question about clearing table and commit log replay. >>>>>>>>> After some tables were truncated consecutively, I got some stale >>>>>>>>> values. >>>>>>>>> This problem doesn't occur when I clear keyspaces with DROP (and >>>>>>>>> CREATE). >>>>>>>>> >>>>>>>>> I'm testing the following test with node failure. >>>>>>>>> Some stale values appear at checking phase. >>>>>>>>> >>>>>>>>> Test iteration: >>>>>>>>> 1. initialize tables as below >>>>>>>>> 2. request a lot of read/write concurrently >>>>>>>>> 3. check all records >>>>>>>>> 4. repeat from the beginning >>>>>>>>> >>>>>>>>> I use C* 2.2.6. There are 3 nodes (replication_factor: 3). >>>>>>>>> Each node kills cassandra process at random intervals and restarts >>>>>>>>> it immediately. >>>>>>>>> >>>>>>>>> My initialization: >>>>>>>>> 1. clear tables with TRUNCATE >>>>>>>>> 2. INSERT initial records >>>>>>>>> 3. check if all values are correct >>>>>>>>> >>>>>>>>> If any phase fails (because of node failure), the initialization >>>>>>>>> starts all over again. >>>>>>>>> So, tables are sometimes truncated consecutively. >>>>>>>>> Though the check in the initialization is OK, stale data appears >>>>>>>>> when I execute "SELECT * FROM mykeyspace.mytable;" after a lot of >>>>>>>>> requests >>>>>>>>> are completed. >>>>>>>>> >>>>>>>>> The problem is likely to occur when the ReplayPosition's value in >>>>>>>>> "truncated_at" is initialized as below after an empty table is >>>>>>>>> truncated. >>>>>>>>> >>>>>>>>> Column Family ID: truncated_at >>>>>>>>> XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: >>>>>>>>> 0xffffffffffffffff0000000000000156597cd4c7 >>>>>>>>> (this value was acquired just after phase 1 in my initialization) >>>>>>>>> >>>>>>>>> I guess some unexpected replays occur. >>>>>>>>> Does anyone know the behavior? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Yuji >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >