The reality of modern distributed systems is that connectivity between
nodes is never guaranteed and distributed software must be able to cope
with occasional absence of connectivity. GC and network connectivity are
the two issues that a lot of us are most familiar with. There may be others
- but most technical problems on a node would be clearly logged on that
node. If you see a lapse of connectivity no more than once or twice a day,
consider yourselves lucky.

Is it only one node at a time that goes down, and at widely dispersed times?

How many nodes?

-- Jack Krupansky

On Tue, Feb 23, 2016 at 11:01 AM, Joel Samuelsson <samuelsson.j...@gmail.com
> wrote:

> Hi,
>
> Version is 2.0.17.
> Yes, these are VMs in the cloud though I'm fairly certain they are on a
> LAN rather than WAN. They are both in the same data centre physically. The
> phi_convict_threshold is set to default. I'd rather find the root cause of
> the problem than just hiding it by not convicting a node if it isn't
> responding though. If pings are <2 ms without a single ping missed in
> several days, I highly doubt that network is the reason for the downtime.
>
> Best regards,
> Joel
>
> 2016-02-23 16:39 GMT+01:00 <sean_r_dur...@homedepot.com>:
>
>> You didn’t mention version, but I saw this kind of thing very often in
>> the 1.1 line. Often this is connected to network flakiness. Are these VMs?
>> In the cloud? Connected over a WAN? You mention that ping seems fine. Take
>> a look at the phi_convict_threshold in c assandra.yaml. You may need to
>> increase it to reduce the UP/DOWN flapping behavior.
>>
>>
>>
>>
>>
>> Sean Durity
>>
>>
>>
>> *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com]
>> *Sent:* Tuesday, February 23, 2016 9:41 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Nodes go down periodically
>>
>>
>>
>> Hi,
>>
>>
>>
>> Thanks for your reply.
>>
>>
>>
>> I have debug logging on and see no GC pauses that are that long. GC
>> pauses are all well below 1s and 99 times out of 100 below 100ms.
>>
>> Do I need to enable GC log options to see the pauses?
>>
>> I see plenty of these lines:
>> DEBUG [ScheduledTasks:1] 2016-02-22 10:43:02,891 GCInspector.java (line
>> 118) GC for ParNew: 24 ms for 1 collections
>>
>> as well as a few CMS GC log lines.
>>
>>
>>
>> Best regards,
>>
>> Joel
>>
>>
>>
>> 2016-02-23 15:14 GMT+01:00 Hannu Kröger <hkro...@gmail.com>:
>>
>> Hi,
>>
>>
>>
>> Those are probably GC pauses. Memory tuning is probably needed. Check the
>> parameters that you already have customised if they make sense.
>>
>>
>>
>> http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html
>>
>>
>>
>> Hannu
>>
>>
>>
>>
>>
>> On 23 Feb 2016, at 16:08, Joel Samuelsson <samuelsson.j...@gmail.com>
>> wrote:
>>
>>
>>
>> Our nodes go down periodically, around 1-2 times each day. Downtime is
>> from <1 second to 30 or so seconds.
>>
>>
>>
>> INFO [GossipTasks:1] 2016-02-22 10:05:14,896 Gossiper.java (line 992)
>> InetAddress /109.74.13.67 is now DOWN
>>
>>  INFO [RequestResponseStage:8844] 2016-02-22 10:05:38,331 Gossiper.java
>> (line 978) InetAddress /109.74.13.67 is now UP
>>
>>
>>
>> I find nothing odd in the logs around the same time. I logged a ping with
>> timestamp and checked during the same time and saw nothing weird (ping is
>> less than 2ms at all times).
>>
>>
>>
>> Does anyone have any suggestions as to why this might happen?
>>
>>
>>
>> Best regards,
>> Joel
>>
>>
>>
>>
>>
>> ------------------------------
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>
>

Reply via email to