Re: Stale value appears after consecutive TRUNCATE

Yuji Ito Thu, 25 Aug 2016 03:37:01 -0700

Thank you for testing, Christian

What did you set commitlog_sync in cassandra.yaml?
I set commitlog_sync batch (window 2ms) as below.


commitlog_sync: batch
commitlog_sync_batch_window_in_ms: 2

The problem didn't occur by setting  commitlog_sync periodic(default).

regards,
yuji


On Thu, Aug 25, 2016 at 6:11 PM, horschi <hors...@gmail.com> wrote:

> (running C* 2.2.7)
>
> On Thu, Aug 25, 2016 at 11:10 AM, horschi <hors...@gmail.com> wrote:
>
>> Hi Yuji,
>>
>> I tried your script a couple of times. I did not experience any stale
>> values. (On my Linux laptop)
>>
>> regards,
>> Ch
>>
>> On Mon, Aug 15, 2016 at 7:29 AM, Yuji Ito <y...@imagine-orb.com> wrote:
>>
>>> Hi,
>>>
>>> I can reproduce the problem with the following script.
>>> I got rows which should be truncated.
>>> If truncating is executed only once, the problem doesn't occur.
>>>
>>> The test for multi nodes (replication_factor:3, kill & restart C*
>>> processes in all nodes) can also reproduce it.
>>>
>>> test script:
>>> ----
>>>
>>> ip=xxx.xxx.xxx.xxx
>>>
>>> echo "0. prepare a table"
>>> cqlsh $ip -e "drop keyspace testdb;"
>>> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class':
>>> 'SimpleStrategy', 'replication_factor': '1'};"
>>> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val
>>> int);"
>>>
>>> echo "1. insert rows"
>>> for key in $(seq 1 10)
>>> do
>>>     cqlsh $ip -e "insert into testdb.testtbl (key, val) values($key,
>>> 1000) IF NOT EXISTS;" >> /dev/null 2>&1
>>> done
>>>
>>> echo "2. truncate the table twice"
>>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl"
>>> cqlsh $ip -e "consistency all; truncate table testdb.testtbl"
>>>
>>> echo "3. kill C* process"
>>> ps auxww | grep "CassandraDaemon" | awk '{if ($13 ~ /cassand/) print
>>> $2}' | xargs sudo kill -9
>>>
>>> echo "4. restart C* process"
>>> sudo /etc/init.d/cassandra start
>>> sleep 20
>>>
>>> echo "5. check the table"
>>> cqlsh $ip -e "select * from testdb.testtbl;"
>>>
>>> ----
>>>
>>> test result:
>>> ----
>>>
>>> 0. prepare a table
>>> 1. insert rows
>>> 2. truncate the table twice
>>> Consistency level set to ALL.
>>> Consistency level set to ALL.
>>> 3. kill C* process
>>> 4. restart C* process
>>> Starting Cassandra: OK
>>> 5. check the table
>>>
>>>  key | val
>>> -----+------
>>>    5 | 1000
>>>   10 | 1000
>>>    1 | 1000
>>>    8 | 1000
>>>    2 | 1000
>>>    4 | 1000
>>>    7 | 1000
>>>    6 | 1000
>>>    9 | 1000
>>>    3 | 1000
>>>
>>> (10 rows)
>>>
>>> ----
>>>
>>>
>>> Thanks Christian,
>>>
>>> I tried with durable_writes=False.
>>> It failed. I guessed this failure was caused by another problem.
>>> I use SimpleStrategy.
>>> A keyspace using the SimpleStrategy isn't permitted to use
>>> durable_writes=False.
>>>
>>>
>>> Regards,
>>> Yuji
>>>
>>> On Thu, Aug 11, 2016 at 12:41 AM, horschi <hors...@gmail.com> wrote:
>>>
>>>> Hi Yuji,
>>>>
>>>> ok, perhaps you are seeing a different issue than I do.
>>>>
>>>> Have you tried with durable_writes=False? If the issue is caused by the
>>>> commitlog, then it should work if you disable durable_writes.
>>>>
>>>> Cheers,
>>>> Christian
>>>>
>>>>
>>>>
>>>> On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>>>>
>>>>> Thanks Christian
>>>>>
>>>>> can you reproduce the behaviour with a single node?
>>>>>
>>>>> I tried my test with a single node. But I can't.
>>>>>
>>>>> This behaviour is seems to be CQL only, or at least has gotten worse
>>>>>> with CQL. I did not experience this with Thrift.
>>>>>
>>>>> I truncate tables with CQL. I've never tried with Thrift.
>>>>>
>>>>> I think that my problem can happen when truncating even succeeds.
>>>>> That's because I check all records after truncating.
>>>>>
>>>>> I checked the source code.
>>>>> ReplayPosition.segment and position become -1 and 0
>>>>> (ReplayPosition.NONE) in dscardSSTables() at truncating a table when there
>>>>> is no SSTable.
>>>>> I guess that ReplayPosition.segment shouldn't be -1 at truncating a
>>>>> table in this case.
>>>>> replayMutation() can request unexpected replay mutations because of
>>>>> this segment's value.
>>>>>
>>>>> Is there anyone familiar with truncate and replay?
>>>>>
>>>>> Regards,
>>>>> Yuji
>>>>>
>>>>>
>>>>> On Mon, Aug 8, 2016 at 6:36 PM, horschi <hors...@gmail.com> wrote:
>>>>>
>>>>>> Hi Yuji,
>>>>>>
>>>>>> can you reproduce the behaviour with a single node?
>>>>>>
>>>>>> The reason I ask is because I probably have the same issue with my
>>>>>> automated tests (which run truncate between every test), which run on my
>>>>>> local laptop.
>>>>>>
>>>>>> Maybe around 5 tests randomly fail out of my 1800. I can see that the
>>>>>> failed tests sometimes show data from other tests, which I think must be
>>>>>> because of a failed truncate. This behaviour is seems to be CQL only, or 
>>>>>> at
>>>>>> least has gotten worse with CQL. I did not experience this with Thrift.
>>>>>>
>>>>>> regards,
>>>>>> Christian
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Aug 8, 2016 at 7:34 AM, Yuji Ito <y...@imagine-orb.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a question about clearing table and commit log replay.
>>>>>>> After some tables were truncated consecutively, I got some stale
>>>>>>> values.
>>>>>>> This problem doesn't occur when I clear keyspaces with DROP (and
>>>>>>> CREATE).
>>>>>>>
>>>>>>> I'm testing the following test with node failure.
>>>>>>> Some stale values appear at checking phase.
>>>>>>>
>>>>>>> Test iteration:
>>>>>>> 1. initialize tables as below
>>>>>>> 2. request a lot of read/write concurrently
>>>>>>> 3. check all records
>>>>>>> 4. repeat from the beginning
>>>>>>>
>>>>>>> I use C* 2.2.6. There are 3 nodes (replication_factor: 3).
>>>>>>> Each node kills cassandra process at random intervals and restarts
>>>>>>> it immediately.
>>>>>>>
>>>>>>> My initialization:
>>>>>>> 1. clear tables with TRUNCATE
>>>>>>> 2. INSERT initial records
>>>>>>> 3. check if all values are correct
>>>>>>>
>>>>>>> If any phase fails (because of node failure), the initialization
>>>>>>> starts all over again.
>>>>>>> So, tables are sometimes truncated consecutively.
>>>>>>> Though the check in the initialization is OK, stale data appears
>>>>>>> when I execute "SELECT * FROM mykeyspace.mytable;" after a lot of 
>>>>>>> requests
>>>>>>> are completed.
>>>>>>>
>>>>>>> The problem is likely to occur when the ReplayPosition's value in
>>>>>>> "truncated_at" is initialized as below after an empty table is 
>>>>>>> truncated.
>>>>>>>
>>>>>>> Column Family ID: truncated_at
>>>>>>> XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: 0xffffffffffffffff000000000000
>>>>>>> 0156597cd4c7
>>>>>>> (this value was acquired just after phase 1 in my initialization)
>>>>>>>
>>>>>>> I guess some unexpected replays occur.
>>>>>>> Does anyone know the behavior?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Yuji
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stale value appears after consecutive TRUNCATE

Reply via email to