Dear,

One of the "Core Features" listed by ignite.apache.org is the capability of Ignite to be a Multi-Tier Storage. However, unless I have misunderstood something, I am worried that this storage is not reliable...

I currently have an application that uses an Ignite cluster as a DB. The cluster contains two nodes at the moment: the second node backs up the first. Each Ignite node is on a VPS server at OVH.

Lately OVH had a series of issues which apparently brought down the communication between those VPS servers. The consequence was that the Ignite nodes couldn't talk to each other and therefore split, each node upgrading to a new Baseline Topology and each one seeing the other node as being offline.

Restarting the nodes would result in an error on one of them:
Caused by: class org.apache.ignite.spi.IgniteSpiException: BaselineTopology of joining node (72fbc939-bf09-42cb-a7e4-12896046cfc0) is not compatible with BaselineTopology in the cluster. Branching history of cluster BlT ([1060612220]) doesn't contain branching point hash of joining node BlT (173037243). Consider cleaning persistent storage of the node and adding it to the cluster again.         at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:2052)         at org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1197)         at org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:472)         at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2154)         at org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:278)

At this point, the only option I found was to destroy my backup node (deleting the "/work" folder"), restart it and add it back to the cluster.

Obviously this becomes a real problem when scaling up and having one's data distributed among multiple Ignite nodes. If a major network issue occurs (such as days of network outage) Ignite nodes might (will?) end up in the same state than my backup node in my example, therefore losing data.

So, is my theory above correct or have I misunderstood Ignite's capabilities as a Multi-Tier storage solution? Is a cluster node able to re-join a cluster it's been disconnected from for a significant amount of time? And if a node is disconnected and starts giving a "BaselineTopology of joining node is not compatible with BaselineTopology in the cluster" then how can I recover my data ?

Thanks for your help,

Diego

Reply via email to