Hi Yuji, I tried your script a couple of times. I did not experience any stale values. (On my Linux laptop)
regards, Ch On Mon, Aug 15, 2016 at 7:29 AM, Yuji Ito <y...@imagine-orb.com> wrote: > Hi, > > I can reproduce the problem with the following script. > I got rows which should be truncated. > If truncating is executed only once, the problem doesn't occur. > > The test for multi nodes (replication_factor:3, kill & restart C* > processes in all nodes) can also reproduce it. > > test script: > ---- > > ip=xxx.xxx.xxx.xxx > > echo "0. prepare a table" > cqlsh $ip -e "drop keyspace testdb;" > cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class': > 'SimpleStrategy', 'replication_factor': '1'};" > cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);" > > echo "1. insert rows" > for key in $(seq 1 10) > do > cqlsh $ip -e "insert into testdb.testtbl (key, val) values($key, 1000) > IF NOT EXISTS;" >> /dev/null 2>&1 > done > > echo "2. truncate the table twice" > cqlsh $ip -e "consistency all; truncate table testdb.testtbl" > cqlsh $ip -e "consistency all; truncate table testdb.testtbl" > > echo "3. kill C* process" > ps auxww | grep "CassandraDaemon" | awk '{if ($13 ~ /cassand/) print $2}' > | xargs sudo kill -9 > > echo "4. restart C* process" > sudo /etc/init.d/cassandra start > sleep 20 > > echo "5. check the table" > cqlsh $ip -e "select * from testdb.testtbl;" > > ---- > > test result: > ---- > > 0. prepare a table > 1. insert rows > 2. truncate the table twice > Consistency level set to ALL. > Consistency level set to ALL. > 3. kill C* process > 4. restart C* process > Starting Cassandra: OK > 5. check the table > > key | val > -----+------ > 5 | 1000 > 10 | 1000 > 1 | 1000 > 8 | 1000 > 2 | 1000 > 4 | 1000 > 7 | 1000 > 6 | 1000 > 9 | 1000 > 3 | 1000 > > (10 rows) > > ---- > > > Thanks Christian, > > I tried with durable_writes=False. > It failed. I guessed this failure was caused by another problem. > I use SimpleStrategy. > A keyspace using the SimpleStrategy isn't permitted to use > durable_writes=False. > > > Regards, > Yuji > > On Thu, Aug 11, 2016 at 12:41 AM, horschi <hors...@gmail.com> wrote: > >> Hi Yuji, >> >> ok, perhaps you are seeing a different issue than I do. >> >> Have you tried with durable_writes=False? If the issue is caused by the >> commitlog, then it should work if you disable durable_writes. >> >> Cheers, >> Christian >> >> >> >> On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> wrote: >> >>> Thanks Christian >>> >>> can you reproduce the behaviour with a single node? >>> >>> I tried my test with a single node. But I can't. >>> >>> This behaviour is seems to be CQL only, or at least has gotten worse >>>> with CQL. I did not experience this with Thrift. >>> >>> I truncate tables with CQL. I've never tried with Thrift. >>> >>> I think that my problem can happen when truncating even succeeds. >>> That's because I check all records after truncating. >>> >>> I checked the source code. >>> ReplayPosition.segment and position become -1 and 0 >>> (ReplayPosition.NONE) in dscardSSTables() at truncating a table when there >>> is no SSTable. >>> I guess that ReplayPosition.segment shouldn't be -1 at truncating a >>> table in this case. >>> replayMutation() can request unexpected replay mutations because of this >>> segment's value. >>> >>> Is there anyone familiar with truncate and replay? >>> >>> Regards, >>> Yuji >>> >>> >>> On Mon, Aug 8, 2016 at 6:36 PM, horschi <hors...@gmail.com> wrote: >>> >>>> Hi Yuji, >>>> >>>> can you reproduce the behaviour with a single node? >>>> >>>> The reason I ask is because I probably have the same issue with my >>>> automated tests (which run truncate between every test), which run on my >>>> local laptop. >>>> >>>> Maybe around 5 tests randomly fail out of my 1800. I can see that the >>>> failed tests sometimes show data from other tests, which I think must be >>>> because of a failed truncate. This behaviour is seems to be CQL only, or at >>>> least has gotten worse with CQL. I did not experience this with Thrift. >>>> >>>> regards, >>>> Christian >>>> >>>> >>>> >>>> On Mon, Aug 8, 2016 at 7:34 AM, Yuji Ito <y...@imagine-orb.com> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I have a question about clearing table and commit log replay. >>>>> After some tables were truncated consecutively, I got some stale >>>>> values. >>>>> This problem doesn't occur when I clear keyspaces with DROP (and >>>>> CREATE). >>>>> >>>>> I'm testing the following test with node failure. >>>>> Some stale values appear at checking phase. >>>>> >>>>> Test iteration: >>>>> 1. initialize tables as below >>>>> 2. request a lot of read/write concurrently >>>>> 3. check all records >>>>> 4. repeat from the beginning >>>>> >>>>> I use C* 2.2.6. There are 3 nodes (replication_factor: 3). >>>>> Each node kills cassandra process at random intervals and restarts it >>>>> immediately. >>>>> >>>>> My initialization: >>>>> 1. clear tables with TRUNCATE >>>>> 2. INSERT initial records >>>>> 3. check if all values are correct >>>>> >>>>> If any phase fails (because of node failure), the initialization >>>>> starts all over again. >>>>> So, tables are sometimes truncated consecutively. >>>>> Though the check in the initialization is OK, stale data appears when >>>>> I execute "SELECT * FROM mykeyspace.mytable;" after a lot of requests are >>>>> completed. >>>>> >>>>> The problem is likely to occur when the ReplayPosition's value in >>>>> "truncated_at" is initialized as below after an empty table is truncated. >>>>> >>>>> Column Family ID: truncated_at >>>>> XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: 0xffffffffffffffff000000000000 >>>>> 0156597cd4c7 >>>>> (this value was acquired just after phase 1 in my initialization) >>>>> >>>>> I guess some unexpected replays occur. >>>>> Does anyone know the behavior? >>>>> >>>>> Thanks, >>>>> Yuji >>>>> >>>> >>>> >>> >> >