Ok, additionally there is info about node segmentation: Node FAILED: TcpDiscoveryNode [id=fb67a5fd-f1ab-441d-a38e-bab975cd1037, consistentId=0:0:0:0:0:0:0:1%lo,XX.XX.XX.XX,127.0.0.1:47500, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, XX.XX.XX.XX, 127.0.0.1], sockAddrs=HashSet [qagmscore02.xyz.com/XX.XX.XX.XX:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=25, intOrder=16, lastExchangeTime=1633426750418, loc=false, ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false] Local node SEGMENTED: TcpDiscoveryNode [id=7f357ca2-0ae2-4af0-bfa4-d18e7bcb3797 Possible too long JVM pause: 1052 milliseconds.
Are you changed default settings networking timeouts ? If no — try to recheck setting of failureDetectionTimeout If you have GC pause longer than 10 seconds, node will be dropped from the cluster(by default). >This is the codebase of AgmsCacheJdbcStoreSessionListner.java >This null pointer occurs due to a datasource bean not found. >The cluster was working fine but what could be the reason for unavailability >of datasource bean in between running cluster. > > >public class AgmsCacheJdbcStoreSessionListener extends >CacheJdbcStoreSessionListener { > > > @SpringApplicationContextResource > public void setupDataSourceFromSpringContext(Object appCtx) { > ApplicationContext appContext = (ApplicationContext) appCtx; > setDataSource((DataSource) appContext.getBean("dataSource")); > } >} > >I can see one log line that tells us about a problem on the network side. Is >this the possible reason? > >2021-10-07 16:28:22,889 197776202 [tcp-disco-msg-worker-[fb67a5fd >XX.XX.XX.XX:47500 crd]-#2%springDataNode%-#69%springDataNode%] WARN >o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably, due to >short-time network problems). > >On Mon, Oct 11, 2021 at 7:15 PM stanilovsky evgeny < >estanilovs...@gridgain.com > wrote: >>may be this ? >> >>Caused by: java.lang.NullPointerException: null >>at >>com.xyz.agms.grid.cache.loader.AgmsCacheJdbcStoreSessionListener.setupDataSourceFromSpringContext(AgmsCacheJdbcStoreSessionListener.java:14) >>... 23 common frames omitted >> >> >>>Hi Zhenya, >>>CacheStoppedException occurred again on our ignite cluster. I have captured >>>logs with IGNITE_QUIET = false. >>>There are four core nodes in the cluster and two nodes gone down. I am >>>attaching the logs for two failed nodes. >>>Please let me know if you need any further details. >>> >>>Thanks, >>>Akash >>>On Tue, Sep 7, 2021 at 12:19 PM Zhenya Stanilovsky < arzamas...@mail.ru > >>>wrote: >>>>plz share somehow these logs, if you have no ideas how to share, you can >>>>send it directly to arzamas...@mail.ru >>>> >>>>>Meanwhile I grep the logs with the next occurrence of cache stopped >>>>>exception,can someone highlight if there is any known bug related to this? >>>>>I want to check the possible reason for this cache stop exception. >>>>>On Mon, Sep 6, 2021 at 6:27 PM Akash Shinde < akashshi...@gmail.com > >>>>>wrote: >>>>>>Hi Zhenya, >>>>>>Thanks for the quick response. >>>>>>I believe you are talking about ignite instances. There is single ignite >>>>>>using in application. >>>>>>I also want to point out that I am not using destroyCache() method >>>>>>anywhere in application. >>>>>> >>>>>>I will set IGNITE_QUIET = false and try to grep the required logs. >>>>>>This issue occurs by random and there is no way reproduce it. >>>>>> >>>>>>Thanks, >>>>>>Akash >>>>>> >>>>>> >>>>>>On Mon, Sep 6, 2021 at 5:33 PM Zhenya Stanilovsky < arzamas...@mail.ru > >>>>>>wrote: >>>>>>>Hi, Akash >>>>>>>You can obtain such a case, for example when you have several instances >>>>>>>and : >>>>>>>inst1: >>>>>>>cache = inst1.getOrCreateCache("cache1"); >>>>>>> >>>>>>>after inst2 destroy calling: >>>>>>> >>>>>>>cache._some_method_call_ >>>>>>> >>>>>>>inst2: >>>>>>> inst2.destroyCache("cache1"); >>>>>>> >>>>>>>or shorter: you still use instance that already destroyed, you can >>>>>>>simple grep your logs and found the time when cache has been stopped. >>>>>>>probably you need to set IGNITE_QUIET = false. >>>>>>>[1] https://ignite.apache.org/docs/latest/logging >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>>Hi, >>>>>>>>>>I have four server nodes and six client nodes on ignite cluster. I am >>>>>>>>>>using ignite 2.10 version. >>>>>>>>>>Some operations are failing due to the CacheStoppedException >>>>>>>>>>exception on the server nodes. This has become a blocker issue. >>>>>>>>>>Could someone please help me to resolve this issue. >>>>>>>>>> >>>>>>>>>>Cache Configuration >>>>>>>>>>CacheConfiguration subscriptionCacheCfg = new >>>>>>>>>>CacheConfiguration<>(CacheName.SUBSCRIPTION_CACHE.name()); >>>>>>>>>>subscriptionCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL); >>>>>>>>>>subscriptionCacheCfg.setWriteThrough(false); >>>>>>>>>>subscriptionCacheCfg.setReadThrough(true); >>>>>>>>>>subscriptionCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC); >>>>>>>>>>subscriptionCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC); >>>>>>>>>>subscriptionCacheCfg.setBackups(2); >>>>>>>>>>Factory<SubscriptionDataLoader> storeFactory = >>>>>>>>>>FactoryBuilder.factoryOf(SubscriptionDataLoader.class); >>>>>>>>>>subscriptionCacheCfg.setCacheStoreFactory(storeFactory); >>>>>>>>>>subscriptionCacheCfg.setIndexedTypes(DefaultDataKey.class, >>>>>>>>>>SubscriptionData.class); >>>>>>>>>>subscriptionCacheCfg.setSqlIndexMaxInlineSize(47); >>>>>>>>>>RendezvousAffinityFunction affinityFunction = new >>>>>>>>>>RendezvousAffinityFunction(); >>>>>>>>>>affinityFunction.setExcludeNeighbors(true); >>>>>>>>>>subscriptionCacheCfg.setAffinity(affinityFunction); >>>>>>>>>>subscriptionCacheCfg.setStatisticsEnabled(true); >>>>>>>>>>subscriptionCacheCfg.setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE); >>>>>>>>>> >>>>>>>>>>Exception stack trace >>>>>>>>>> >>>>>>>>>>ERROR c.q.dgms.kafka.TaskRequestListener - Error occurred while >>>>>>>>>>consuming the object >>>>>>>>>>com.baidu.unbiz.fluentvalidator.exception.RuntimeValidateException: >>>>>>>>>>java.lang.IllegalStateException: class >>>>>>>>>>org.apache.ignite.internal.processors.cache.CacheStoppedException: >>>>>>>>>>Failed to perform cache operation (cache is stopped): >>>>>>>>>>SUBSCRIPTION_CACHE >>>>>>>>>>at >>>>>>>>>>com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:506) >>>>>>>>>>at >>>>>>>>>>com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:461) >>>>>>>>>>at >>>>>>>>>>com.xyz.dgms.service.UserManagementServiceImpl.deleteUser(UserManagementServiceImpl.java:710) >>>>>>>>>>at >>>>>>>>>>com.xyz.dgms.kafka.TaskRequestListener.processRequest(TaskRequestListener.java:190) >>>>>>>>>>at >>>>>>>>>>com.xyz.dgms.kafka.TaskRequestListener.process(TaskRequestListener.java:89) >>>>>>>>>>at >>>>>>>>>>com.xyz.libraries.mom.kafka.consumer.TopicConsumer.lambda$run$3(TopicConsumer.java:162) >>>>>>>>>>at net.jodah.failsafe.Functions$12.call(Functions.java:274) >>>>>>>>>>at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145) >>>>>>>>>>at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:93) >>>>>>>>>>at >>>>>>>>>>com.xyz.libraries.mom.kafka.consumer.TopicConsumer.run(TopicConsumer.java:159) >>>>>>>>>>at >>>>>>>>>>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>>>>>>>>>at >>>>>>>>>>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>>>>>>>>>at java.lang.Thread.run(Thread.java:748) >>>>>>>>>>Caused by: java.lang.IllegalStateException: class >>>>>>>>>>org.apache.ignite.internal.processors.cache.CacheStoppedException: >>>>>>>>>>Failed to perform cache operation (cache is stopped): >>>>>>>>>>SUBSCRIPTION_CACHE >>>>>>>>>>at >>>>>>>>>>org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:166) >>>>>>>>>>at >>>>>>>>>>org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1625) >>>>>>>>>>at >>>>>>>>>>org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:673) >>>>>>>>>>at >>>>>>>>>>com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:39) >>>>>>>>>>at >>>>>>>>>>com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:28) >>>>>>>>>>at >>>>>>>>>>com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:22) >>>>>>>>>>at >>>>>>>>>>com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:10) >>>>>>>>>>at >>>>>>>>>>com.xyz.dgms.validators.common.validators.UserDataValidator.validateSubscription(UserDataValidator.java:226) >>>>>>>>>>at >>>>>>>>>>com.xyz.dgms.validators.common.validators.UserDataValidator.validateRequest(UserDataValidator.java:124) >>>>>>>>>>at >>>>>>>>>>com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:346) >>>>>>>>>>at >>>>>>>>>>com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:41) >>>>>>>>>>at >>>>>>>>>>com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:490) >>>>>>>>>>... 12 common frames omitted >>>>>>>>>>Caused by: >>>>>>>>>>org.apache.ignite.internal.processors.cache.CacheStoppedException: >>>>>>>>>>Failed to perform cache operation (cache is stopped): >>>>>>>>>>SUBSCRIPTION_CACHE >>>>>>>>>>... 24 common frames omitted >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>Thanks, >>>>>>>>>>Akash >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>> >>>> >>>> >>>> >> >>