Re: What does all partition owners have left the grid on the client side mean?

John Smith Thu, 25 Jun 2020 08:02:25 -0700

Because in between it's all the business logs. Let me make sure I didn't
filter anything relevant. So maybe in those 13 hours nothing happened?



On Thu, 25 Jun 2020 at 10:53, Evgenii Zhuravlev <[email protected]>
wrote:

> This doesn't seem to be a full log. There is a gap for more than 13 hours
> in the log :
> {"appTimestamp":"2020-06-23T23:06:41.658+00:00","threadName":"ignite-update-notifier-timer","level":"WARN","loggerName":"org.apache.ignite.internal.processors.cluster.GridUpdateNotifier","message":"New
> version is available at ignite.apache.org: 2.8.1"}
> {"appTimestamp":"2020-06-24T12:58:42.294+00:00","threadName":"disco-event-worker-#35%xxxxxx%","level":"INFO","loggerName":"org.apache.ignite.internal.managers.discovery.GridDiscoveryManager","message":"Node
> left topology: TcpDiscoveryNode [id=02949ae0-4eea-4dc9-8aed-b3f50e8d7238,
> addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, xxx.xxx.xxx.73],
> sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0,
> xxxxxx-task-0003/xxx.xxx.xxx.73:0], discPort=0, order=1258, intOrder=632,
> lastExchangeTime=1592890182021, loc=false,
> ver=2.7.0#20181130-sha1:256ae401, isClient=true]"}
>
> I don't see any exceptions in the log. When did the issue happen? Can you
> share the full log?
>
> Evgenii
>
> чт, 25 июн. 2020 г. в 07:36, John Smith <[email protected]>:
>
>> Hi Evgenii, same folder shared stdout.copy
>>
>> Just in case:
>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>>
>> On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <[email protected]>
>> wrote:
>>
>>> No, it's not. It's not clear when it happened and what was with the
>>> cluster and the client node itself at this moment.
>>>
>>> Evgenii
>>>
>>> ср, 24 июн. 2020 г. в 18:16, John Smith <[email protected]>:
>>>
>>>> Ok I'll try... The stack trace isn't enough?
>>>>
>>>> On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <
>>>> [email protected]> wrote:
>>>>
>>>>> John, right, didn't notice them before. Can you share the full log for
>>>>> the client node with an issue?
>>>>>
>>>>> Evgenii
>>>>>
>>>>> ср, 24 июн. 2020 г. в 12:29, John Smith <[email protected]>:
>>>>>
>>>>>> I thought I did! The link doesn't have them?
>>>>>>
>>>>>> On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Can you share full log files from server nodes?
>>>>>>>
>>>>>>> ср, 24 июн. 2020 г. в 10:47, John Smith <[email protected]>:
>>>>>>>
>>>>>>>> The logs for server are here:
>>>>>>>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>>>>>>>>
>>>>>>>> The error from the client:
>>>>>>>>
>>>>>>>> javax.cache.CacheException: class
>>>>>>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>>>>>>>> Failed to execute cache operation (all partition owners have left the 
>>>>>>>> grid,
>>>>>>>> partition data has been lost) [cacheName=cache1, part=580,
>>>>>>>> key=UserKeyCacheObjectImpl [part=580, val=14385045508, 
>>>>>>>> hasValBytes=false]]
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137)
>>>>>>>> at
>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
>>>>>>>> at
>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
>>>>>>>> at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
>>>>>>>> at
>>>>>>>> io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35)
>>>>>>>> at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
>>>>>>>> at
>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>>>> at
>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>>>> at
>>>>>>>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>>>>>>>> at java.lang.Thread.run(Thread.java:748)
>>>>>>>> Caused by:
>>>>>>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>>>>>>>> Failed to execute cache operation (all partition owners have left the 
>>>>>>>> grid,
>>>>>>>> partition data has been lost) [cacheName=cache1, part=580,
>>>>>>>> key=UserKeyCacheObjectImpl [part=580, val=14385045508, 
>>>>>>>> hasValBytes=false]]
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937)
>>>>>>>> at
>>>>>>>> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652)
>>>>>>>> at
>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28)
>>>>>>>> at
>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51)
>>>>>>>> at
>>>>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28)
>>>>>>>> at
>>>>>>>> com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65)
>>>>>>>> at
>>>>>>>> com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39)
>>>>>>>> at
>>>>>>>> com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46)
>>>>>>>> at
>>>>>>>> com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39)
>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53)
>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51)
>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41)
>>>>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32)
>>>>>>>> at io.reactivex.Flowable.subscribe(Flowable.java:14918)
>>>>>>>> at io.reactivex.Flowable.subscribe(Flowable.java:14865)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87)
>>>>>>>> at
>>>>>>>> io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64)
>>>>>>>> at
>>>>>>>> com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86)
>>>>>>>> at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105)
>>>>>>>> at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150)
>>>>>>>> at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157)
>>>>>>>> at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118)
>>>>>>>> at
>>>>>>>> com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83)
>>>>>>>> at
>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310)
>>>>>>>> at
>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297)
>>>>>>>> at
>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272)
>>>>>>>> at
>>>>>>>> io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69)
>>>>>>>> at
>>>>>>>> io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32)
>>>>>>>> at
>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269)
>>>>>>>> at
>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279)
>>>>>>>> at
>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240)
>>>>>>>> at
>>>>>>>> io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370)
>>>>>>>> ... 7 common frames omitted
>>>>>>>>
>>>>>>>> On Wed, 24 Jun 2020 at 13:28, John Smith <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Not sure about the wrong configuration... All the apps work this
>>>>>>>>> seems to happen every few weeks. We don't have any particular heavy 
>>>>>>>>> load.
>>>>>>>>>
>>>>>>>>> I just bounced the client application and the errors went away.
>>>>>>>>>
>>>>>>>>> On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> It means that there are no nodes in the cluster that holds
>>>>>>>>>> certain partitions. So, probably you have a wrong configuration or 
>>>>>>>>>> some of
>>>>>>>>>> the nodes left the cluster and you don't have backups in the cluster 
>>>>>>>>>> for
>>>>>>>>>> these partitions. I believe more can be found from logs.
>>>>>>>>>>
>>>>>>>>>> Evgenii
>>>>>>>>>>
>>>>>>>>>> ср, 24 июн. 2020 г. в 09:52, John Smith <[email protected]>:
>>>>>>>>>>
>>>>>>>>>>> Also I'm assuming that the thin client wouldn't be susceptible
>>>>>>>>>>> to this error?
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 24 Jun 2020 at 12:38, John Smith <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The cluster is showing active when running control.sh
>>>>>>>>>>>>
>>>>>>>>>>>> But the client is showing "all partition owners have left the
>>>>>>>>>>>> grid"
>>>>>>>>>>>>
>>>>>>>>>>>> The client node is marked as client=true so it's not a server
>>>>>>>>>>>> node.
>>>>>>>>>>>>
>>>>>>>>>>>> Is this split brain as well?
>>>>>>>>>>>>
>>>>>>>>>>>

Re: What does all partition owners have left the grid on the client side mean?

Reply via email to