Hi Evgenii, same folder shared stdout.copy Just in case: https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <[email protected]> wrote: > No, it's not. It's not clear when it happened and what was with the > cluster and the client node itself at this moment. > > Evgenii > > ср, 24 июн. 2020 г. в 18:16, John Smith <[email protected]>: > >> Ok I'll try... The stack trace isn't enough? >> >> On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, < >> [email protected]> wrote: >> >>> John, right, didn't notice them before. Can you share the full log for >>> the client node with an issue? >>> >>> Evgenii >>> >>> ср, 24 июн. 2020 г. в 12:29, John Smith <[email protected]>: >>> >>>> I thought I did! The link doesn't have them? >>>> >>>> On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, < >>>> [email protected]> wrote: >>>> >>>>> Can you share full log files from server nodes? >>>>> >>>>> ср, 24 июн. 2020 г. в 10:47, John Smith <[email protected]>: >>>>> >>>>>> The logs for server are here: >>>>>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0 >>>>>> >>>>>> The error from the client: >>>>>> >>>>>> javax.cache.CacheException: class >>>>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException: >>>>>> Failed to execute cache operation (all partition owners have left the >>>>>> grid, >>>>>> partition data has been lost) [cacheName=cache1, part=580, >>>>>> key=UserKeyCacheObjectImpl [part=580, val=14385045508, >>>>>> hasValBytes=false]] >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62) >>>>>> at >>>>>> org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:137) >>>>>> at >>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$d94e711a$1(IgniteCacheRepository.java:55) >>>>>> at >>>>>> org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53) >>>>>> at >>>>>> com.xxxxxx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18) >>>>>> at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369) >>>>>> at >>>>>> io.vertx.core.impl.WorkerContext.lambda$wrapTask$0(WorkerContext.java:35) >>>>>> at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>>>>> at >>>>>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) >>>>>> at java.lang.Thread.run(Thread.java:748) >>>>>> Caused by: >>>>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException: >>>>>> Failed to execute cache operation (all partition owners have left the >>>>>> grid, >>>>>> partition data has been lost) [cacheName=cache1, part=580, >>>>>> key=UserKeyCacheObjectImpl [part=580, val=14385045508, >>>>>> hasValBytes=false]] >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:169) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridPartitionedSingleGetFuture.init(GridPartitionedSingleGetFuture.java:208) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync0(GridDhtAtomicCache.java:1428) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$1600(GridDhtAtomicCache.java:135) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:474) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$16.apply(GridDhtAtomicCache.java:472) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.getAsync(GridDhtAtomicCache.java:472) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:4749) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAsync(GridCacheAdapter.java:1477) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.getAsync(IgniteCacheProxyImpl.java:937) >>>>>> at >>>>>> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.getAsync(GatewayProtectedCacheProxy.java:652) >>>>>> at >>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$get$1(IgniteCacheRepository.java:28) >>>>>> at >>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:51) >>>>>> at >>>>>> com.xxxxxx.common.vertx.ext.data.impl.IgniteCacheRepository.get(IgniteCacheRepository.java:28) >>>>>> at >>>>>> com.xxxxxx.impl.CarrierCodeServiceImpl.getCarrierIdOfPhone(CarrierCodeServiceImpl.java:65) >>>>>> at >>>>>> com.xxxxxx.impl.SmppGatewayServiceImpl.sendSms(SmppGatewayServiceImpl.java:39) >>>>>> at com.xxxxxx.impl.MtEventProcessor.process(MtEventProcessor.java:46) >>>>>> at >>>>>> com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$4(KafkaProcessorImpl.java:83) >>>>>> at >>>>>> io.reactivex.internal.operators.completable.CompletableCreate.subscribeActual(CompletableCreate.java:39) >>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309) >>>>>> at >>>>>> io.reactivex.internal.operators.completable.CompletableTimeout.subscribeActual(CompletableTimeout.java:53) >>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309) >>>>>> at >>>>>> io.reactivex.internal.operators.completable.CompletablePeek.subscribeActual(CompletablePeek.java:51) >>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309) >>>>>> at >>>>>> io.reactivex.internal.operators.completable.CompletableResumeNext.subscribeActual(CompletableResumeNext.java:41) >>>>>> at io.reactivex.Completable.subscribe(Completable.java:2309) >>>>>> at >>>>>> io.reactivex.internal.operators.completable.CompletableToFlowable.subscribeActual(CompletableToFlowable.java:32) >>>>>> at io.reactivex.Flowable.subscribe(Flowable.java:14918) >>>>>> at io.reactivex.Flowable.subscribe(Flowable.java:14865) >>>>>> at >>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.onNext(FlowableFlatMap.java:163) >>>>>> at >>>>>> io.reactivex.internal.operators.flowable.FlowableFromIterable$IteratorSubscription.slowPath(FlowableFromIterable.java:236) >>>>>> at >>>>>> io.reactivex.internal.operators.flowable.FlowableFromIterable$BaseRangeSubscription.request(FlowableFromIterable.java:124) >>>>>> at >>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drainLoop(FlowableFlatMap.java:546) >>>>>> at >>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$MergeSubscriber.drain(FlowableFlatMap.java:366) >>>>>> at >>>>>> io.reactivex.internal.operators.flowable.FlowableFlatMap$InnerSubscriber.onComplete(FlowableFlatMap.java:678) >>>>>> at >>>>>> io.reactivex.internal.observers.SubscriberCompletableObserver.onComplete(SubscriberCompletableObserver.java:33) >>>>>> at >>>>>> io.reactivex.internal.operators.completable.CompletableResumeNext$ResumeNextObserver.onComplete(CompletableResumeNext.java:68) >>>>>> at >>>>>> io.reactivex.internal.operators.completable.CompletablePeek$CompletableObserverImplementation.onComplete(CompletablePeek.java:115) >>>>>> at >>>>>> io.reactivex.internal.operators.completable.CompletableTimeout$TimeOutObserver.onComplete(CompletableTimeout.java:87) >>>>>> at >>>>>> io.reactivex.internal.operators.completable.CompletableCreate$Emitter.onComplete(CompletableCreate.java:64) >>>>>> at >>>>>> com.xxxxxx.common.vertx.ext.kafka.impl.KafkaProcessorImpl.lambda$null$3(KafkaProcessorImpl.java:86) >>>>>> at io.vertx.core.impl.FutureImpl.dispatch(FutureImpl.java:105) >>>>>> at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:150) >>>>>> at io.vertx.core.impl.FutureImpl.tryComplete(FutureImpl.java:157) >>>>>> at io.vertx.core.impl.FutureImpl.complete(FutureImpl.java:118) >>>>>> at >>>>>> com.xxxxxx.impl.MtEventProcessor.lambda$process$0(MtEventProcessor.java:83) >>>>>> at >>>>>> io.vertx.ext.web.client.impl.HttpContext.handleDispatchResponse(HttpContext.java:310) >>>>>> at >>>>>> io.vertx.ext.web.client.impl.HttpContext.execute(HttpContext.java:297) >>>>>> at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:272) >>>>>> at >>>>>> io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:69) >>>>>> at >>>>>> io.vertx.ext.web.client.impl.predicate.PredicateInterceptor.handle(PredicateInterceptor.java:32) >>>>>> at io.vertx.ext.web.client.impl.HttpContext.next(HttpContext.java:269) >>>>>> at io.vertx.ext.web.client.impl.HttpContext.fire(HttpContext.java:279) >>>>>> at >>>>>> io.vertx.ext.web.client.impl.HttpContext.dispatchResponse(HttpContext.java:240) >>>>>> at >>>>>> io.vertx.ext.web.client.impl.HttpContext.lambda$null$2(HttpContext.java:370) >>>>>> ... 7 common frames omitted >>>>>> >>>>>> On Wed, 24 Jun 2020 at 13:28, John Smith <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Not sure about the wrong configuration... All the apps work this >>>>>>> seems to happen every few weeks. We don't have any particular heavy >>>>>>> load. >>>>>>> >>>>>>> I just bounced the client application and the errors went away. >>>>>>> >>>>>>> On Wed, 24 Jun 2020 at 12:57, Evgenii Zhuravlev < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> It means that there are no nodes in the cluster that holds certain >>>>>>>> partitions. So, probably you have a wrong configuration or some of the >>>>>>>> nodes left the cluster and you don't have backups in the cluster for >>>>>>>> these >>>>>>>> partitions. I believe more can be found from logs. >>>>>>>> >>>>>>>> Evgenii >>>>>>>> >>>>>>>> ср, 24 июн. 2020 г. в 09:52, John Smith <[email protected]>: >>>>>>>> >>>>>>>>> Also I'm assuming that the thin client wouldn't be susceptible to >>>>>>>>> this error? >>>>>>>>> >>>>>>>>> On Wed, 24 Jun 2020 at 12:38, John Smith <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> The cluster is showing active when running control.sh >>>>>>>>>> >>>>>>>>>> But the client is showing "all partition owners have left the >>>>>>>>>> grid" >>>>>>>>>> >>>>>>>>>> The client node is marked as client=true so it's not a server >>>>>>>>>> node. >>>>>>>>>> >>>>>>>>>> Is this split brain as well? >>>>>>>>>> >>>>>>>>>
