As far as I know, “multi-tier” refers to the ability to store some data in memory, the rest on disk.
You experienced “split brain,” which is a difficult problem to solve in any distributed system. From your description my guess is that you’ve enabled baseline auto-adjust, which is generally not a good idea when you have persistence turned on. With a catastrophic network failure like that, I would expect that you would need to restart some nodes. With a correctly configured cluster, you shouldn’t need to “destroy” a node. > On 28 Oct 2021, at 14:15, privacyfi...@codesandnotes.be wrote: > > Dear, > > One of the "Core Features" listed by ignite.apache.org is the capability of > Ignite to be a Multi-Tier Storage. However, unless I have misunderstood > something, I am worried that this storage is not reliable... > > I currently have an application that uses an Ignite cluster as a DB. The > cluster contains two nodes at the moment: the second node backs up the first. > Each Ignite node is on a VPS server at OVH. > > Lately OVH had a series of issues which apparently brought down the > communication between those VPS servers. The consequence was that the Ignite > nodes couldn't talk to each other and therefore split, each node upgrading to > a new Baseline Topology and each one seeing the other node as being offline. > > Restarting the nodes would result in an error on one of them: > Caused by: class org.apache.ignite.spi.IgniteSpiException: BaselineTopology > of joining node (72fbc939-bf09-42cb-a7e4-12896046cfc0) is not compatible with > BaselineTopology in the cluster. Branching history of cluster BlT > ([1060612220]) doesn't contain branching point hash of joining node BlT > (173037243). Consider cleaning persistent storage of the node and adding it > to the cluster again. > at > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:2052) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1197) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:472) > at > org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2154) > at > org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:278) > > > At this point, the only option I found was to destroy my backup node > (deleting the "/work" folder"), restart it and add it back to the cluster. > > Obviously this becomes a real problem when scaling up and having one's data > distributed among multiple Ignite nodes. If a major network issue occurs > (such as days of network outage) Ignite nodes might (will?) end up in the > same state than my backup node in my example, therefore losing data. > > So, is my theory above correct or have I misunderstood Ignite's capabilities > as a Multi-Tier storage solution? > Is a cluster node able to re-join a cluster it's been disconnected from for a > significant amount of time? > And if a node is disconnected and starts giving a "BaselineTopology of > joining node is not compatible with BaselineTopology in the cluster" then how > can I recover my data ? > > Thanks for your help, > > Diego >