Re: Stale value appears after consecutive TRUNCATE

horschi Thu, 25 Aug 2016 02:11:47 -0700

Hi Yuji,

I tried your script a couple of times. I did not experience any stale
values. (On my Linux laptop)


regards,
Ch

On Mon, Aug 15, 2016 at 7:29 AM, Yuji Ito <y...@imagine-orb.com> wrote:

> Hi,
>
> I can reproduce the problem with the following script.
> I got rows which should be truncated.
> If truncating is executed only once, the problem doesn't occur.
>
> The test for multi nodes (replication_factor:3, kill & restart C*
> processes in all nodes) can also reproduce it.
>
> test script:
> ----
>
> ip=xxx.xxx.xxx.xxx
>
> echo "0. prepare a table"
> cqlsh $ip -e "drop keyspace testdb;"
> cqlsh $ip -e "CREATE KEYSPACE testdb WITH replication = {'class':
> 'SimpleStrategy', 'replication_factor': '1'};"
> cqlsh $ip -e "CREATE TABLE testdb.testtbl (key int PRIMARY KEY, val int);"
>
> echo "1. insert rows"
> for key in $(seq 1 10)
> do
>     cqlsh $ip -e "insert into testdb.testtbl (key, val) values($key, 1000)
> IF NOT EXISTS;" >> /dev/null 2>&1
> done
>
> echo "2. truncate the table twice"
> cqlsh $ip -e "consistency all; truncate table testdb.testtbl"
> cqlsh $ip -e "consistency all; truncate table testdb.testtbl"
>
> echo "3. kill C* process"
> ps auxww | grep "CassandraDaemon" | awk '{if ($13 ~ /cassand/) print $2}'
> | xargs sudo kill -9
>
> echo "4. restart C* process"
> sudo /etc/init.d/cassandra start
> sleep 20
>
> echo "5. check the table"
> cqlsh $ip -e "select * from testdb.testtbl;"
>
> ----
>
> test result:
> ----
>
> 0. prepare a table
> 1. insert rows
> 2. truncate the table twice
> Consistency level set to ALL.
> Consistency level set to ALL.
> 3. kill C* process
> 4. restart C* process
> Starting Cassandra: OK
> 5. check the table
>
>  key | val
> -----+------
>    5 | 1000
>   10 | 1000
>    1 | 1000
>    8 | 1000
>    2 | 1000
>    4 | 1000
>    7 | 1000
>    6 | 1000
>    9 | 1000
>    3 | 1000
>
> (10 rows)
>
> ----
>
>
> Thanks Christian,
>
> I tried with durable_writes=False.
> It failed. I guessed this failure was caused by another problem.
> I use SimpleStrategy.
> A keyspace using the SimpleStrategy isn't permitted to use
> durable_writes=False.
>
>
> Regards,
> Yuji
>
> On Thu, Aug 11, 2016 at 12:41 AM, horschi <hors...@gmail.com> wrote:
>
>> Hi Yuji,
>>
>> ok, perhaps you are seeing a different issue than I do.
>>
>> Have you tried with durable_writes=False? If the issue is caused by the
>> commitlog, then it should work if you disable durable_writes.
>>
>> Cheers,
>> Christian
>>
>>
>>
>> On Tue, Aug 9, 2016 at 3:04 PM, Yuji Ito <y...@imagine-orb.com> wrote:
>>
>>> Thanks Christian
>>>
>>> can you reproduce the behaviour with a single node?
>>>
>>> I tried my test with a single node. But I can't.
>>>
>>> This behaviour is seems to be CQL only, or at least has gotten worse
>>>> with CQL. I did not experience this with Thrift.
>>>
>>> I truncate tables with CQL. I've never tried with Thrift.
>>>
>>> I think that my problem can happen when truncating even succeeds.
>>> That's because I check all records after truncating.
>>>
>>> I checked the source code.
>>> ReplayPosition.segment and position become -1 and 0
>>> (ReplayPosition.NONE) in dscardSSTables() at truncating a table when there
>>> is no SSTable.
>>> I guess that ReplayPosition.segment shouldn't be -1 at truncating a
>>> table in this case.
>>> replayMutation() can request unexpected replay mutations because of this
>>> segment's value.
>>>
>>> Is there anyone familiar with truncate and replay?
>>>
>>> Regards,
>>> Yuji
>>>
>>>
>>> On Mon, Aug 8, 2016 at 6:36 PM, horschi <hors...@gmail.com> wrote:
>>>
>>>> Hi Yuji,
>>>>
>>>> can you reproduce the behaviour with a single node?
>>>>
>>>> The reason I ask is because I probably have the same issue with my
>>>> automated tests (which run truncate between every test), which run on my
>>>> local laptop.
>>>>
>>>> Maybe around 5 tests randomly fail out of my 1800. I can see that the
>>>> failed tests sometimes show data from other tests, which I think must be
>>>> because of a failed truncate. This behaviour is seems to be CQL only, or at
>>>> least has gotten worse with CQL. I did not experience this with Thrift.
>>>>
>>>> regards,
>>>> Christian
>>>>
>>>>
>>>>
>>>> On Mon, Aug 8, 2016 at 7:34 AM, Yuji Ito <y...@imagine-orb.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I have a question about clearing table and commit log replay.
>>>>> After some tables were truncated consecutively, I got some stale
>>>>> values.
>>>>> This problem doesn't occur when I clear keyspaces with DROP (and
>>>>> CREATE).
>>>>>
>>>>> I'm testing the following test with node failure.
>>>>> Some stale values appear at checking phase.
>>>>>
>>>>> Test iteration:
>>>>> 1. initialize tables as below
>>>>> 2. request a lot of read/write concurrently
>>>>> 3. check all records
>>>>> 4. repeat from the beginning
>>>>>
>>>>> I use C* 2.2.6. There are 3 nodes (replication_factor: 3).
>>>>> Each node kills cassandra process at random intervals and restarts it
>>>>> immediately.
>>>>>
>>>>> My initialization:
>>>>> 1. clear tables with TRUNCATE
>>>>> 2. INSERT initial records
>>>>> 3. check if all values are correct
>>>>>
>>>>> If any phase fails (because of node failure), the initialization
>>>>> starts all over again.
>>>>> So, tables are sometimes truncated consecutively.
>>>>> Though the check in the initialization is OK, stale data appears when
>>>>> I execute "SELECT * FROM mykeyspace.mytable;" after a lot of requests are
>>>>> completed.
>>>>>
>>>>> The problem is likely to occur when the ReplayPosition's value in
>>>>> "truncated_at" is initialized as below after an empty table is truncated.
>>>>>
>>>>> Column Family ID: truncated_at
>>>>> XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX: 0xffffffffffffffff000000000000
>>>>> 0156597cd4c7
>>>>> (this value was acquired just after phase 1 in my initialization)
>>>>>
>>>>> I guess some unexpected replays occur.
>>>>> Does anyone know the behavior?
>>>>>
>>>>> Thanks,
>>>>> Yuji
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Stale value appears after consecutive TRUNCATE

Reply via email to