Dear,
One of the "Core Features" listed by ignite.apache.org is the capability
of Ignite to be a Multi-Tier Storage. However, unless I have
misunderstood something, I am worried that this storage is not reliable...
I currently have an application that uses an Ignite cluster as a DB. The
cluster contains two nodes at the moment: the second node backs up the
first. Each Ignite node is on a VPS server at OVH.
Lately OVH had a series of issues which apparently brought down the
communication between those VPS servers. The consequence was that the
Ignite nodes couldn't talk to each other and therefore split, each node
upgrading to a new Baseline Topology and each one seeing the other node
as being offline.
Restarting the nodes would result in an error on one of them:
Caused by: class org.apache.ignite.spi.IgniteSpiException:
BaselineTopology of joining node (72fbc939-bf09-42cb-a7e4-12896046cfc0)
is not compatible with BaselineTopology in the cluster. Branching
history of cluster BlT ([1060612220]) doesn't contain branching point
hash of joining node BlT (173037243). Consider cleaning persistent
storage of the node and adding it to the cluster again.
at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:2052)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1197)
at
org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:472)
at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2154)
at
org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:278)
At this point, the only option I found was to destroy my backup node
(deleting the "/work" folder"), restart it and add it back to the cluster.
Obviously this becomes a real problem when scaling up and having one's
data distributed among multiple Ignite nodes. If a major network issue
occurs (such as days of network outage) Ignite nodes might (will?) end
up in the same state than my backup node in my example, therefore losing
data.
So, is my theory above correct or have I misunderstood Ignite's
capabilities as a Multi-Tier storage solution?
Is a cluster node able to re-join a cluster it's been disconnected from
for a significant amount of time?
And if a node is disconnected and starts giving a "BaselineTopology of
joining node is not compatible with BaselineTopology in the cluster"
then how can I recover my data ?
Thanks for your help,
Diego