The most common cause of a segmented cluster is not the network but your
Java garbage collection configuration. Do you see any "Long JVM pause"
warnings in your logs before the problem occurs?

On Wed, 8 Nov 2023 at 08:48, Alan Rose <alan_r...@trimble.com> wrote:

>
> I am hoping someone can help me understand some log entries better.
> I have two ignite nodes A & B running in linux containers that appear to
> have a network issue that result in node A restarting approx 5 seconds
> later.
> From the logs Node B states about Node A
>    "Previous node alive status [alive=false,
> checkPreviousNodeId=fb9c943e-aa4a-4e6c-ae00-1df5212a3f3f,
> actualPreviousNode=TcpDiscoveryNode
> [id=58424f0b-e77f-4127-835a-4274f57955a1,
> consistentId=5faca106-0c39-45ab-8c64-f38df8910238, etc.
> What is this line telling me about Node A?
>
> I then get
>          "Node FAILED: TcpDiscoveryNode ..etc"
> and  "Close incoming connection, unknown node..?" I think talking about
> node A
>
> Node A log states
>    Failed to send message to remote node [node=TcpDiscoveryNode [id= etc
>    but it does appear to be able to ping node B Ok
> within 5 second I see in Node A log
>   Node is out of topology (probably, due to short-time network problems).
>    Local node SEGMENTED: TcpDiscoveryNode [id=58 etc
>  finally there is a restart of the node A.
> I see no other evidence of a network issue. Is there something I can
> configure, so it is not so quick to timeout
> The only thing I see in the log at startup around 5 seconds
> is netTimeout=5000
>
>
>
>
>
> --
> *Alan Rose*
> *Senior Software Engineer. *
>
> *CCSS Team Merino*
> *Trimble Navigation New Zealand Limited*
> P O Box 8729, Riccarton, Christchurch 8440 , New Zealand
> +64 3 9635616 Ext 604016
>
>

Reply via email to