As far as I know, “multi-tier” refers to the ability to store some data in 
memory, the rest on disk.

You experienced “split brain,” which is a difficult problem to solve in any 
distributed system. From your description my guess is that you’ve enabled 
baseline auto-adjust, which is generally not a good idea when you have 
persistence turned on.

With a catastrophic network failure like that, I would expect that you would 
need to restart some nodes. With a correctly configured cluster, you shouldn’t 
need to “destroy” a node.

> On 28 Oct 2021, at 14:15, privacyfi...@codesandnotes.be wrote:
> 
> Dear,
> 
> One of the "Core Features" listed by ignite.apache.org is the capability of 
> Ignite to be a Multi-Tier Storage. However, unless I have misunderstood 
> something, I am worried that this storage is not reliable...
> 
> I currently have an application that uses an Ignite cluster as a DB. The 
> cluster contains two nodes at the moment: the second node backs up the first. 
> Each Ignite node is on a VPS server at OVH.
> 
> Lately OVH had a series of issues which apparently brought down the 
> communication between those VPS servers. The consequence was that the Ignite 
> nodes couldn't talk to each other and therefore split, each node upgrading to 
> a new Baseline Topology and each one seeing the other node as being offline.
> 
> Restarting the nodes would result in an error on one of them:
> Caused by: class org.apache.ignite.spi.IgniteSpiException: BaselineTopology 
> of joining node (72fbc939-bf09-42cb-a7e4-12896046cfc0) is not compatible with 
> BaselineTopology in the cluster. Branching history of cluster BlT 
> ([1060612220]) doesn't contain branching point hash of joining node BlT 
> (173037243). Consider cleaning persistent storage of the node and adding it 
> to the cluster again.
>         at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.checkFailedError(TcpDiscoverySpi.java:2052)
>         at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:1197)
>         at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:472)
>         at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:2154)
>         at 
> org.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:278)
>  
> 
> At this point, the only option I found was to destroy my backup node 
> (deleting the "/work" folder"), restart it and add it back to the cluster.
> 
> Obviously this becomes a real problem when scaling up and having one's data 
> distributed among multiple Ignite nodes. If a major network issue occurs 
> (such as days of network outage) Ignite nodes might (will?) end up in the 
> same state than my backup node in my example, therefore losing data.
> 
> So, is my theory above correct or have I misunderstood Ignite's capabilities 
> as a Multi-Tier storage solution?
> Is a cluster node able to re-join a cluster it's been disconnected from for a 
> significant amount of time?
> And if a node is disconnected and starts giving a "BaselineTopology of 
> joining node is not compatible with BaselineTopology in the cluster" then how 
> can I recover my data ?
> 
> Thanks for your help,
> 
> Diego
> 


Reply via email to