We encountered a situation after a node unexpectedly went down and came
back up.
After it came back, none of our transactions were going through (due to
rollbacks), and we started getting a lot of exceptions in the logs. (I've
added the exceptions at the bottom of this message.)
We were getting "Failed to execute the cache operation (all partition
owners have left the grid, partition data has been lost)", so we tried to
reset the partitions (since these are persistent caches), and the commands
succeeded, but we kept seeing errors.

We checked the cluster state, and it looks like we have two nodes that came
up with different IDs.

Cluster state: active
Current topology version: 1170
Baseline auto adjustment disabled: softTimeout=300000
Current topology version: 1170 (Coordinator:
ConsistentId=1455b414-5389-454a-9609-8dd1d15a2430, Order=1)
Baseline nodes:
    ConsistentId=1a0aa611-58b7-479a-b1e6-735e31f87ed9, State=ONLINE,
Order=1169
    ConsistentId=92bc8407-30f1-433d-9c32-5eeb759c73be, State=OFFLINE
    ConsistentId=b5875ab9-7923-46c9-b3f3-1550455a24e5, State=OFFLINE
--------------------------------------------------------------------------------
Number of baseline nodes: 3
Other nodes:
    ConsistentId=1455b414-5389-454a-9609-8dd1d15a2430, Order=1
    ConsistentId=5e8f3b03-aa20-45aa-892a-37e988e3741f, Order=2


Could this be a split-brain scenario?

Here's the more complete logs:

    javax.cache.CacheException: class
org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
Failed to execute the cache operation (all partition owners have left the
grid, partition data has been lost) [cacheName=propensity-customer,
partition=430, key=com.company.PropensityKey [idHash=362342248,
hash=42458921, customerId=142045188, variant=MODEL_A]]
        at
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1270)
        at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:2083)
        at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:1110)
        at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:676)
        at
org.apache.ignite.internal.processors.platform.client.cache.ClientCacheGetRequest.process(ClientCacheGetRequest.java:41)
        at
org.apache.ignite.internal.processors.platform.client.ClientRequestHandler.handle(ClientRequestHandler.java:99)
        at
org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:202)
        at
org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:56)
        at
org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279)
        at
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
        at
org.apache.ignite.internal.util.nio.GridNioAsyncNotifyFilter$3.body(GridNioAsyncNotifyFilter.java:97)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at
org.apache.ignite.internal.util.worker.GridWorkerPool$1.run(GridWorkerPool.java:70)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: class
org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
Failed to execute the cache operation (all partition owners have left the
grid, partition data has been lost) [cacheName=propensity-customer,
partition=430, key=com.company.PropensityKey [idHash=362342248,
hash=42458921, customerId=142045188, variant=MODEL_A]]
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateKey(GridDhtTopologyFutureAdapter.java:209)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:128)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.validate(GridPartitionedSingleGetFuture.java:859)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.map(GridPartitionedSingleGetFuture.java:277)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:244)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.getAsync(GridDhtColocatedCache.java:297)
        at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4844)
        at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGet(GridCacheAdapter.java:4810)
        at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1469)
        at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:1107)
        ... 13 more

[21:48:07,376][SEVERE][client-connector-#212][ClientListenerNioListener]
Failed to process client request
[req=o.a.i.i.processors.platform.client.cache.ClientCacheGetRequest@78739a4f
]
    javax.cache.CacheException: class
org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
Failed to execute the cache operation (all partition owners have left the
grid, partition data has been lost) [cacheName=propensity-customer,
partition=343, key=com.company.PropensityKey [idHash=1459588566,
hash=-1494388806, customerId=156017972, variant=MODEL_A]]
        at
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1270)
        at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:2083)
        at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:1110)
        at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:676)
        at
org.apache.ignite.internal.processors.platform.client.cache.ClientCacheGetRequest.process(ClientCacheGetRequest.java:41)
        at
org.apache.ignite.internal.processors.platform.client.ClientRequestHandler.handle(ClientRequestHandler.java:99)
        at
org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:202)
        at
org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:56)
        at
org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279)
        at
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
        at
org.apache.ignite.internal.util.nio.GridNioAsyncNotifyFilter$3.body(GridNioAsyncNotifyFilter.java:97)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at
org.apache.ignite.internal.util.worker.GridWorkerPool$1.run(GridWorkerPool.java:70)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: class
org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
Failed to execute the cache operation (all partition owners have left the
grid, partition data has been lost) [cacheName=propensity-customer,
partition=343, key=com.company.PropensityKey [idHash=1459588566,
hash=-1494388806, customerId=156017972, variant=MODEL_A]]
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateKey(GridDhtTopologyFutureAdapter.java:209)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:128)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.validate(GridPartitionedSingleGetFuture.java:859)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.map(GridPartitionedSingleGetFuture.java:277)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:244)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.getAsync(GridDhtColocatedCache.java:297)
        at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4844)
        at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGet(GridCacheAdapter.java:4810)
        at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1469)
        at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:1107)
        ... 13 more

[21:48:07,388][SEVERE][client-connector-#245][ClientListenerNioListener]
Failed to process client request
[req=o.a.i.i.processors.platform.client.cache.ClientCacheGetRequest@32d3379d
]
    javax.cache.CacheException: class
org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
Failed to execute the cache operation (all partition owners have left the
grid, partition data has been lost) [cacheName=propensity-customer,
partition=343, key=com.company.PropensityKey [idHash=1589948377,
hash=-1494388806, customerId=156017972, variant=MODEL_A]]
        at
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1270)
        at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:2083)
        at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:1110)
        at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:676)
        at
org.apache.ignite.internal.processors.platform.client.cache.ClientCacheGetRequest.process(ClientCacheGetRequest.java:41)
        at
org.apache.ignite.internal.processors.platform.client.ClientRequestHandler.handle(ClientRequestHandler.java:99)
        at
org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:202)
        at
org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:56)
        at
org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279)
        at
org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessageReceived(GridNioFilterAdapter.java:109)
        at
org.apache.ignite.internal.util.nio.GridNioAsyncNotifyFilter$3.body(GridNioAsyncNotifyFilter.java:97)
        at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
        at
org.apache.ignite.internal.util.worker.GridWorkerPool$1.run(GridWorkerPool.java:70)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
    Caused by: class
org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
Failed to execute the cache operation (all partition owners have left the
grid, partition data has been lost) [cacheName=propensity-customer,
partition=343, key=com.company.PropensityKey [idHash=1589948377,
hash=-1494388806, customerId=156017972, variant=MODEL_A]]
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateKey(GridDhtTopologyFutureAdapter.java:209)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:128)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.validate(GridPartitionedSingleGetFuture.java:859)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.map(GridPartitionedSingleGetFuture.java:277)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:244)
        at
org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.getAsync(GridDhtColocatedCache.java:297)
        at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:4844)
        at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGet(GridCacheAdapter.java:4810)
        at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(GridCacheAdapter.java:1469)
        at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:1107)
        ... 13 more

[21:48:07,410][SEVERE][client-connector-#226][ClientListenerNioListener]
Failed to process client request
[req=o.a.i.i.processors.platform.client.cache.ClientCacheGetRequest@254ae7a]
    javax.cache.CacheException: class
org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
Failed to execute the cache operation (all partition owners have left the
grid, partition data has been lost) [cacheName=propensity-customer,
partition=752, key=com.company.PropensityKey [idHash=224478459,
hash=-1842810664, customerId=165875208, variant=MODEL_A]]
        at
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1270)
        at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:2083)
        at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(IgniteCacheProxyImpl.java:1110)
        at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:676)
        at
org.apache.ignite.internal.processors.platform.client.cache.ClientCacheGetRequest.process(ClientCacheGetRequest.java:41)
        at
org.apache.ignite.internal.processors.platform.client.ClientRequestHandler.handle(ClientRequestHandler.java:99)
        at
org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:202)
        at
org.apache.ignite.internal.processors.odbc.ClientListenerNioListener.onMessage(ClientListenerNioListener.java:56)
        at
org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onMessageReceived(GridNioFilterChain.java:279)


Devin G. Bost

Reply via email to