Re: A cluster (RF=3) not recovering after two nodes are stopped

Hiroyuki Yamada Thu, 25 Apr 2019 23:18:55 -0700

Hello,

Thank you for some feedbacks.


>Ben
Thank you.
I've tested with lower concurrency in my side, the issue still occurs.
We are using 3 x T3.xlarge instances for C* and small and separate instance
for the client program.
But if we tried with 1 host with 3 C* nodes, the issue didn't occur.

> Alok
We also thought so and tested with hints disabled, but it doesn't make any
difference. (the issue still occurs)

Thanks,
Hiro




On Fri, Apr 26, 2019 at 8:19 AM Alok Dwivedi <alok.dwiv...@instaclustr.com>
wrote:

> Could it be related to hinted hand offs being stored in Node1 and then
> attempted to be replayed in Node2 when it comes back causing more load as
> new mutations are also being applied from cassandra-stress at same time?
>
> Alok Dwivedi
> Senior Consultant
> https://www.instaclustr.com/
>
>
>
>
> On 26 Apr 2019, at 09:04, Ben Slater <ben.sla...@instaclustr.com> wrote:
>
> In the absence of anyone else having any bright ideas - it still sounds to
> me like the kind of scenario that can occur in a heavily overloaded
> cluster. I would try again with a lower load.
>
> What size machines are you using for stress client and the nodes? Are they
> all on separate machines?
>
> Cheers
> Ben
>
> ---
>
>
> *Ben Slater**Chief Product Officer*
>
> <https://www.instaclustr.com/platform/>
>
> <https://www.facebook.com/instaclustr>   <https://twitter.com/instaclustr>
>    <https://www.linkedin.com/company/instaclustr>
>
> Read our latest technical blog posts here
> <https://www.instaclustr.com/blog/>.
>
> This email has been sent on behalf of Instaclustr Pty. Limited (Australia)
> and Instaclustr Inc (USA).
>
> This email and any attachments may contain confidential and legally
> privileged information.  If you are not the intended recipient, do not copy
> or disclose its content, but please reply to this email immediately and
> highlight the error to the sender and then immediately delete the message.
>
>
> On Thu, 25 Apr 2019 at 17:26, Hiroyuki Yamada <mogwa...@gmail.com> wrote:
>
>> Hello,
>>
>> Sorry again.
>> We found yet another weird thing in this.
>> If we stop nodes with systemctl or just kill (TERM), it causes the
>> problem,
>> but if we kill -9, it doesn't cause the problem.
>>
>> Thanks,
>> Hiro
>>
>> On Wed, Apr 24, 2019 at 11:31 PM Hiroyuki Yamada <mogwa...@gmail.com>
>> wrote:
>>
>>> Sorry, I didn't write the version and the configurations.
>>> I've tested with C* 3.11.4, and
>>> the configurations are mostly set to default except for the replication
>>> factor and listen_address for proper networking.
>>>
>>> Thanks,
>>> Hiro
>>>
>>> On Wed, Apr 24, 2019 at 5:12 PM Hiroyuki Yamada <mogwa...@gmail.com>
>>> wrote:
>>>
>>>> Hello Ben,
>>>>
>>>> Thank you for the quick reply.
>>>> I haven't tried that case, but it does't recover even if I stopped the
>>>> stress.
>>>>
>>>> Thanks,
>>>> Hiro
>>>>
>>>> On Wed, Apr 24, 2019 at 3:36 PM Ben Slater <ben.sla...@instaclustr.com>
>>>> wrote:
>>>>
>>>>> Is it possible that stress is overloading node 1 so it’s not
>>>>> recovering state properly when node 2 comes up? Have you tried running 
>>>>> with
>>>>> a lower load (say 2 or 3 threads)?
>>>>>
>>>>> Cheers
>>>>> Ben
>>>>>
>>>>> ---
>>>>>
>>>>>
>>>>> *Ben Slater*
>>>>> *Chief Product Officer*
>>>>>
>>>>>
>>>>> <https://www.facebook.com/instaclustr>
>>>>> <https://twitter.com/instaclustr>
>>>>> <https://www.linkedin.com/company/instaclustr>
>>>>>
>>>>> Read our latest technical blog posts here
>>>>> <https://www.instaclustr.com/blog/>.
>>>>>
>>>>> This email has been sent on behalf of Instaclustr Pty. Limited
>>>>> (Australia) and Instaclustr Inc (USA).
>>>>>
>>>>> This email and any attachments may contain confidential and legally
>>>>> privileged information.  If you are not the intended recipient, do not 
>>>>> copy
>>>>> or disclose its content, but please reply to this email immediately and
>>>>> highlight the error to the sender and then immediately delete the message.
>>>>>
>>>>>
>>>>> On Wed, 24 Apr 2019 at 16:28, Hiroyuki Yamada <mogwa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I faced a weird issue when recovering a cluster after two nodes are
>>>>>> stopped.
>>>>>> It is easily reproduce-able and looks like a bug or an issue to fix,
>>>>>> so let me write down the steps to reproduce.
>>>>>>
>>>>>> === STEPS TO REPRODUCE ===
>>>>>> * Create a 3-node cluster with RF=3
>>>>>>    - node1(seed), node2, node3
>>>>>> * Start requests to the cluster with cassandra-stress (it continues
>>>>>> until the end)
>>>>>>    - what we did: cassandra-stress mixed cl=QUORUM duration=10m
>>>>>> -errors ignore -node node1,node2,node3 -rate threads\>=16
>>>>>> threads\<=256
>>>>>> * Stop node3 normally (with systemctl stop)
>>>>>>    - the system is still available because the quorum of nodes is
>>>>>> still available
>>>>>> * Stop node2 normally (with systemctl stop)
>>>>>>    - the system is NOT available after it's stopped.
>>>>>>    - the client gets `UnavailableException: Not enough replicas
>>>>>> available for query at consistency QUORUM`
>>>>>>    - the client gets errors right away (so few ms)
>>>>>>    - so far it's all expected
>>>>>> * Wait for 1 mins
>>>>>> * Bring up node2
>>>>>>    - The issue happens here.
>>>>>>    - the client gets ReadTimeoutException` or WriteTimeoutException
>>>>>> depending on if the request is read or write even after the node2 is
>>>>>> up
>>>>>>    - the client gets errors after about 5000ms or 2000ms, which are
>>>>>> request timeout for write and read request
>>>>>>    - what node1 reports with `nodetool status` and what node2 reports
>>>>>> are not consistent. (node2 thinks node1 is down)
>>>>>>    - It takes very long time to recover from its state
>>>>>> === STEPS TO REPRODUCE ===
>>>>>>
>>>>>> Is it supposed to happen ?
>>>>>> If we don't start cassandra-stress, it's all fine.
>>>>>>
>>>>>> Some workarounds we found to recover the state are the followings:
>>>>>> * Restarting node1 and it recovers its state right after it's
>>>>>> restarted
>>>>>> * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 60000
>>>>>> or something)
>>>>>>
>>>>>> I don't think either of them is a really good solution.
>>>>>> Can anyone explain what is going on and what is the best way to make
>>>>>> it not happen or recover ?
>>>>>>
>>>>>> Thanks,
>>>>>> Hiro
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>>>>
>>>>>>
>

Re: A cluster (RF=3) not recovering after two nodes are stopped

Reply via email to