Re: [2.4.0] Cluster unrecoverable after node failure

joseheitor Thu, 22 Mar 2018 04:34:47 -0700

I do apologise for the long-winded post earlier (with error stack-traces,
etc.).


And hope that someone can assist me with this issue - it is a basic,
real-world scenario that tests the fundamental integrity of the clustering
system!

Am I perhaps missing something? Or mismanaging the cluster in such an
occurrence? What is the 'best-practice' to recover from such a scenario?

Here is the condensed version of the problem, which is hopefully easier to
read (without the stack-traces):

*Scenario: Secondary (Node-B) Failure*

Environment:
  - 2 nodes (Node-A, Node-B)
  - Ignite native persistence enabled
  - static IP discovery - both node IPs listed
  - JDBC (Client) - DBeaver
  - manual cluster activation

Steps:
  1 - start  both nodes with no data
  2 - activate cluster on same machine as Node-A
  3 - load data via SQL JDBC (...WITH template=replicated, backups=1)
  4 - simulate power-failure ... all components down; Node-B with
unrecoverable damage (hardware)
  5 - start new Node-B instance (with no data)
  6 - attempt to start Node-A (undamaged, with good data)...

PROBLEM: Unable to start Node-A. (Error in previous post below...) 

In an attempt to recover the cluster and data:
  7 - stop Node-B
  8 - start Node-A - first
  9 - start Node-B (starts)
  10 - attempt to activate cluster

PROBLEM: Cluster activation operation Freezes.

Additional notes:
- All data is lost and cannot be recovered.
- This did not occur with Ignite 2.3.0, although data consistency was
unpredictable but would sometimes align after some period of time.
- If Node-A (with data) is started before Node-B (new instance, empty data),
both nodes start, but cluster fails to activate. (See below post for details
of the error observed on Node-A ouput)

...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: [2.4.0] Cluster unrecoverable after node failure

Reply via email to