Could someone please help me out here to find out the root cause of this problem? This is now happening so frequently.
On Wed, Oct 13, 2021 at 3:56 PM Akash Shinde <akashshi...@gmail.com> wrote: > Yes, I have set failureDetectionTimeout = 60000. > There is no long GC pause. > Core1 GC report > <https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMTAvMTMvLS1jb3JlXzFfZ2MtMjAyMS0xMC0wNV8wOS0zMS0yNi5sb2cuMC5jdXJyZW50LS05LTU3LTE3&channel=WEB> > Core 2 GC Report > <https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMTAvMTMvLS1DT1JFXzJfZ2MtMjAyMS0xMC0wNV8wOS00MC0xMC5sb2cuMC0tMTAtMjAtNTQ=&channel=WEB> > > On Wed, Oct 13, 2021 at 1:16 PM Zhenya Stanilovsky <arzamas...@mail.ru> > wrote: > >> >> Ok, additionally there is info about node segmentation: >> Node FAILED: TcpDiscoveryNode [id=fb67a5fd-f1ab-441d-a38e-bab975cd1037, >> consistentId=0:0:0:0:0:0:0:1%lo,XX.XX.XX.XX,127.0.0.1:47500, >> addrs=ArrayList [0:0:0:0:0:0:0:1%lo, XX.XX.XX.XX, 127.0.0.1], >> sockAddrs=HashSet [qagmscore02.xyz.com/XX.XX.XX.XX:47500, >> /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=25, >> intOrder=16, lastExchangeTime=1633426750418, loc=false, >> ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false] >> >> Local node SEGMENTED: TcpDiscoveryNode >> [id=7f357ca2-0ae2-4af0-bfa4-d18e7bcb3797 >> >> Possible too long JVM pause: 1052 milliseconds. >> >> Are you changed default settings networking timeouts ? If no — try to >> recheck setting of failureDetectionTimeout >> If you have GC pause longer than 10 seconds, node will be dropped from >> the cluster(by default). >> >> >> This is the codebase of AgmsCacheJdbcStoreSessionListner.java >> This null pointer occurs due to a datasource bean not found. >> The cluster was working fine but what could be the reason for >> unavailability of datasource bean in between running cluster. >> >> >> >> public class AgmsCacheJdbcStoreSessionListener extends >> CacheJdbcStoreSessionListener { >> >> >> @SpringApplicationContextResource public void >> setupDataSourceFromSpringContext(Object appCtx) { >> ApplicationContext appContext = (ApplicationContext) appCtx; >> setDataSource((DataSource) appContext.getBean("dataSource")); >> } >> } >> >> >> >> I can see one log line that tells us about a problem on the network side. >> Is this the possible reason? >> >> 2021-10-07 16:28:22,889 197776202 [tcp-disco-msg-worker-[fb67a5fd >> XX.XX.XX.XX:47500 crd]-#2%springDataNode%-#69%springDataNode%] WARN >> o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably, due to >> short-time network problems). >> >> >> >> >> On Mon, Oct 11, 2021 at 7:15 PM stanilovsky evgeny < >> estanilovs...@gridgain.com >> <//e.mail.ru/compose/?mailto=mailto%3aestanilovs...@gridgain.com>> wrote: >> >> may be this ? >> >> Caused by: java.lang.NullPointerException: null >> at >> com.xyz.agms.grid.cache.loader.AgmsCacheJdbcStoreSessionListener.setupDataSourceFromSpringContext(AgmsCacheJdbcStoreSessionListener.java:14) >> ... 23 common frames omitted >> >> >> >> Hi Zhenya, >> CacheStoppedException occurred again on our ignite cluster. I have >> captured logs with IGNITE_QUIET = false. >> There are four core nodes in the cluster and two nodes gone down. I am >> attaching the logs for two failed nodes. >> Please let me know if you need any further details. >> >> Thanks, >> Akash >> >> On Tue, Sep 7, 2021 at 12:19 PM Zhenya Stanilovsky <arzamas...@mail.ru >> <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote: >> >> plz share somehow these logs, if you have no ideas how to share, you can >> send it directly to arzamas...@mail.ru >> <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru> >> >> >> >> Meanwhile I grep the logs with the next occurrence of cache stopped >> exception,can someone highlight if there is any known bug related to this? >> I want to check the possible reason for this cache stop exception. >> >> On Mon, Sep 6, 2021 at 6:27 PM Akash Shinde <akashshi...@gmail.com >> <http://e.mail.ru/compose/?mailto=mailto%3aakashshi...@gmail.com>> wrote: >> >> Hi Zhenya, >> Thanks for the quick response. >> I believe you are talking about ignite instances. There is >> single ignite using in application. >> I also want to point out that I am not using destroyCache() method >> anywhere in application. >> >> I will set IGNITE_QUIET = false and try to grep the required logs. >> This issue occurs by random and there is no way reproduce it. >> >> Thanks, >> Akash >> >> >> >> On Mon, Sep 6, 2021 at 5:33 PM Zhenya Stanilovsky <arzamas...@mail.ru >> <http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote: >> >> Hi, Akash >> You can obtain such a case, for example when you have several instances >> and : >> inst1: >> cache = inst1.getOrCreateCache("cache1"); >> >> after inst2 destroy calling: >> >> cache._some_method_call_ >> >> inst2: >> inst2.destroyCache("cache1"); >> >> or shorter: you still use instance that already destroyed, you can simple >> grep your logs and found the time when cache has been stopped. >> probably you need to set IGNITE_QUIET = false. >> [1] https://ignite.apache.org/docs/latest/logging >> >> >> >> >> >> >> Hi, >> I have four server nodes and six client nodes on ignite cluster. I am >> using ignite 2.10 version. >> Some operations are failing due to the CacheStoppedException exception on >> the server nodes. This has become a blocker issue. >> Could someone please help me to resolve this issue. >> >> *Cache Configuration* >> >> CacheConfiguration subscriptionCacheCfg = new >> CacheConfiguration<>(CacheName.SUBSCRIPTION_CACHE.name()); >> subscriptionCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL); >> subscriptionCacheCfg.setWriteThrough(false); >> subscriptionCacheCfg.setReadThrough(true); >> subscriptionCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC); >> subscriptionCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC); >> subscriptionCacheCfg.setBackups(*2*); >> Factory<SubscriptionDataLoader> storeFactory = >> FactoryBuilder.factoryOf(SubscriptionDataLoader.class); >> subscriptionCacheCfg.setCacheStoreFactory(storeFactory); >> subscriptionCacheCfg.setIndexedTypes(DefaultDataKey.class, >> SubscriptionData.class); >> subscriptionCacheCfg.setSqlIndexMaxInlineSize(47); >> RendezvousAffinityFunction affinityFunction = new >> RendezvousAffinityFunction(); >> affinityFunction.setExcludeNeighbors(true); >> subscriptionCacheCfg.setAffinity(affinityFunction); >> subscriptionCacheCfg.setStatisticsEnabled(true); >> subscriptionCacheCfg.setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE); >> >> >> *Exception stack trace* >> >> ERROR c.q.dgms.kafka.TaskRequestListener - Error occurred while consuming >> the object >> com.baidu.unbiz.fluentvalidator.exception.RuntimeValidateException: >> java.lang.IllegalStateException: class >> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed >> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE >> at >> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:506) >> at >> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:461) >> at >> com.xyz.dgms.service.UserManagementServiceImpl.deleteUser(UserManagementServiceImpl.java:710) >> at >> com.xyz.dgms.kafka.TaskRequestListener.processRequest(TaskRequestListener.java:190) >> at >> com.xyz.dgms.kafka.TaskRequestListener.process(TaskRequestListener.java:89) >> at >> com.xyz.libraries.mom.kafka.consumer.TopicConsumer.lambda$run$3(TopicConsumer.java:162) >> at net.jodah.failsafe.Functions$12.call(Functions.java:274) >> at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145) >> at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:93) >> at >> com.xyz.libraries.mom.kafka.consumer.TopicConsumer.run(TopicConsumer.java:159) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> Caused by: java.lang.IllegalStateException: class >> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed >> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE >> at >> org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:166) >> at >> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1625) >> at >> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:673) >> at >> com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:39) >> at >> com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:28) >> at >> com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:22) >> at >> com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:10) >> at >> com.xyz.dgms.validators.common.validators.UserDataValidator.validateSubscription(UserDataValidator.java:226) >> at >> com.xyz.dgms.validators.common.validators.UserDataValidator.validateRequest(UserDataValidator.java:124) >> at >> com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:346) >> at >> com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:41) >> at >> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:490) >> ... 12 common frames omitted >> Caused by: >> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed >> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE >> ... 24 common frames omitted >> >> >> Thanks, >> Akash >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >