Yes, I have set failureDetectionTimeout = 60000. There is no long GC pause. Core1 GC report <https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMTAvMTMvLS1jb3JlXzFfZ2MtMjAyMS0xMC0wNV8wOS0zMS0yNi5sb2cuMC5jdXJyZW50LS05LTU3LTE3&channel=WEB> Core 2 GC Report <https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMTAvMTMvLS1DT1JFXzJfZ2MtMjAyMS0xMC0wNV8wOS00MC0xMC5sb2cuMC0tMTAtMjAtNTQ=&channel=WEB>
On Wed, Oct 13, 2021 at 1:16 PM Zhenya Stanilovsky <arzamas...@mail.ru> wrote: > > Ok, additionally there is info about node segmentation: > Node FAILED: TcpDiscoveryNode [id=fb67a5fd-f1ab-441d-a38e-bab975cd1037, > consistentId=0:0:0:0:0:0:0:1%lo,XX.XX.XX.XX,127.0.0.1:47500, > addrs=ArrayList [0:0:0:0:0:0:0:1%lo, XX.XX.XX.XX, 127.0.0.1], > sockAddrs=HashSet [qagmscore02.xyz.com/XX.XX.XX.XX:47500, > /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=25, > intOrder=16, lastExchangeTime=1633426750418, loc=false, > ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false] > > Local node SEGMENTED: TcpDiscoveryNode > [id=7f357ca2-0ae2-4af0-bfa4-d18e7bcb3797 > > Possible too long JVM pause: 1052 milliseconds. > > Are you changed default settings networking timeouts ? If no — try to > recheck setting of failureDetectionTimeout > If you have GC pause longer than 10 seconds, node will be dropped from the > cluster(by default). > > > This is the codebase of AgmsCacheJdbcStoreSessionListner.java > This null pointer occurs due to a datasource bean not found. > The cluster was working fine but what could be the reason for > unavailability of datasource bean in between running cluster. > > > > public class AgmsCacheJdbcStoreSessionListener extends > CacheJdbcStoreSessionListener { > > > @SpringApplicationContextResource public void > setupDataSourceFromSpringContext(Object appCtx) { > ApplicationContext appContext = (ApplicationContext) appCtx; > setDataSource((DataSource) appContext.getBean("dataSource")); > } > } > > > > I can see one log line that tells us about a problem on the network side. > Is this the possible reason? > > 2021-10-07 16:28:22,889 197776202 [tcp-disco-msg-worker-[fb67a5fd > XX.XX.XX.XX:47500 crd]-#2%springDataNode%-#69%springDataNode%] WARN > o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably, due to > short-time network problems). > > > > > On Mon, Oct 11, 2021 at 7:15 PM stanilovsky evgeny < > estanilovs...@gridgain.com > <//e.mail.ru/compose/?mailto=mailto%3aestanilovs...@gridgain.com>> wrote: > > may be this ? > > Caused by: java.lang.NullPointerException: null > at > com.xyz.agms.grid.cache.loader.AgmsCacheJdbcStoreSessionListener.setupDataSourceFromSpringContext(AgmsCacheJdbcStoreSessionListener.java:14) > ... 23 common frames omitted > > > > Hi Zhenya, > CacheStoppedException occurred again on our ignite cluster. I have > captured logs with IGNITE_QUIET = false. > There are four core nodes in the cluster and two nodes gone down. I am > attaching the logs for two failed nodes. > Please let me know if you need any further details. > > Thanks, > Akash > > On Tue, Sep 7, 2021 at 12:19 PM Zhenya Stanilovsky <arzamas...@mail.ru > <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote: > > plz share somehow these logs, if you have no ideas how to share, you can > send it directly to arzamas...@mail.ru > <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru> > > > > Meanwhile I grep the logs with the next occurrence of cache stopped > exception,can someone highlight if there is any known bug related to this? > I want to check the possible reason for this cache stop exception. > > On Mon, Sep 6, 2021 at 6:27 PM Akash Shinde <akashshi...@gmail.com > <http://e.mail.ru/compose/?mailto=mailto%3aakashshi...@gmail.com>> wrote: > > Hi Zhenya, > Thanks for the quick response. > I believe you are talking about ignite instances. There is > single ignite using in application. > I also want to point out that I am not using destroyCache() method > anywhere in application. > > I will set IGNITE_QUIET = false and try to grep the required logs. > This issue occurs by random and there is no way reproduce it. > > Thanks, > Akash > > > > On Mon, Sep 6, 2021 at 5:33 PM Zhenya Stanilovsky <arzamas...@mail.ru > <http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote: > > Hi, Akash > You can obtain such a case, for example when you have several instances > and : > inst1: > cache = inst1.getOrCreateCache("cache1"); > > after inst2 destroy calling: > > cache._some_method_call_ > > inst2: > inst2.destroyCache("cache1"); > > or shorter: you still use instance that already destroyed, you can simple > grep your logs and found the time when cache has been stopped. > probably you need to set IGNITE_QUIET = false. > [1] https://ignite.apache.org/docs/latest/logging > > > > > > > Hi, > I have four server nodes and six client nodes on ignite cluster. I am > using ignite 2.10 version. > Some operations are failing due to the CacheStoppedException exception on > the server nodes. This has become a blocker issue. > Could someone please help me to resolve this issue. > > *Cache Configuration* > > CacheConfiguration subscriptionCacheCfg = new > CacheConfiguration<>(CacheName.SUBSCRIPTION_CACHE.name()); > subscriptionCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL); > subscriptionCacheCfg.setWriteThrough(false); > subscriptionCacheCfg.setReadThrough(true); > subscriptionCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC); > subscriptionCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC); > subscriptionCacheCfg.setBackups(*2*); > Factory<SubscriptionDataLoader> storeFactory = > FactoryBuilder.factoryOf(SubscriptionDataLoader.class); > subscriptionCacheCfg.setCacheStoreFactory(storeFactory); > subscriptionCacheCfg.setIndexedTypes(DefaultDataKey.class, > SubscriptionData.class); > subscriptionCacheCfg.setSqlIndexMaxInlineSize(47); > RendezvousAffinityFunction affinityFunction = new > RendezvousAffinityFunction(); > affinityFunction.setExcludeNeighbors(true); > subscriptionCacheCfg.setAffinity(affinityFunction); > subscriptionCacheCfg.setStatisticsEnabled(true); > subscriptionCacheCfg.setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE); > > > *Exception stack trace* > > ERROR c.q.dgms.kafka.TaskRequestListener - Error occurred while consuming > the object > com.baidu.unbiz.fluentvalidator.exception.RuntimeValidateException: > java.lang.IllegalStateException: class > org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed > to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE > at > com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:506) > at > com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:461) > at > com.xyz.dgms.service.UserManagementServiceImpl.deleteUser(UserManagementServiceImpl.java:710) > at > com.xyz.dgms.kafka.TaskRequestListener.processRequest(TaskRequestListener.java:190) > at > com.xyz.dgms.kafka.TaskRequestListener.process(TaskRequestListener.java:89) > at > com.xyz.libraries.mom.kafka.consumer.TopicConsumer.lambda$run$3(TopicConsumer.java:162) > at net.jodah.failsafe.Functions$12.call(Functions.java:274) > at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145) > at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:93) > at > com.xyz.libraries.mom.kafka.consumer.TopicConsumer.run(TopicConsumer.java:159) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalStateException: class > org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed > to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE > at > org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:166) > at > org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1625) > at > org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:673) > at > com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:39) > at > com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:28) > at > com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:22) > at > com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:10) > at > com.xyz.dgms.validators.common.validators.UserDataValidator.validateSubscription(UserDataValidator.java:226) > at > com.xyz.dgms.validators.common.validators.UserDataValidator.validateRequest(UserDataValidator.java:124) > at > com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:346) > at > com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:41) > at > com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:490) > ... 12 common frames omitted > Caused by: > org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed > to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE > ... 24 common frames omitted > > > Thanks, > Akash > > > > > > > > > > > > > > > > > > > > > >