Re: Re[4]: Failed to perform cache operation (cache is stopped)

Akash Shinde Sun, 17 Oct 2021 21:28:35 -0700

Could someone please help me out here to find out the root cause of
this problem?
This is now happening so frequently.


On Wed, Oct 13, 2021 at 3:56 PM Akash Shinde <akashshi...@gmail.com> wrote:

> Yes, I have set  failureDetectionTimeout  = 60000.
> There is no long GC pause.
> Core1 GC report
> <https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMTAvMTMvLS1jb3JlXzFfZ2MtMjAyMS0xMC0wNV8wOS0zMS0yNi5sb2cuMC5jdXJyZW50LS05LTU3LTE3&channel=WEB>
> Core 2 GC Report
> <https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMTAvMTMvLS1DT1JFXzJfZ2MtMjAyMS0xMC0wNV8wOS00MC0xMC5sb2cuMC0tMTAtMjAtNTQ=&channel=WEB>
>
> On Wed, Oct 13, 2021 at 1:16 PM Zhenya Stanilovsky <arzamas...@mail.ru>
> wrote:
>
>>
>> Ok, additionally there is info about node segmentation:
>> Node FAILED: TcpDiscoveryNode [id=fb67a5fd-f1ab-441d-a38e-bab975cd1037,
>> consistentId=0:0:0:0:0:0:0:1%lo,XX.XX.XX.XX,127.0.0.1:47500,
>> addrs=ArrayList [0:0:0:0:0:0:0:1%lo, XX.XX.XX.XX, 127.0.0.1],
>> sockAddrs=HashSet [qagmscore02.xyz.com/XX.XX.XX.XX:47500,
>> /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=25,
>> intOrder=16, lastExchangeTime=1633426750418, loc=false,
>> ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false]
>>
>> Local node SEGMENTED: TcpDiscoveryNode
>> [id=7f357ca2-0ae2-4af0-bfa4-d18e7bcb3797
>>
>> Possible too long JVM pause: 1052 milliseconds.
>>
>> Are you changed default settings networking timeouts ? If no — try to
>> recheck setting of failureDetectionTimeout
>> If you have GC pause longer than 10 seconds, node will be dropped from
>> the cluster(by default).
>>
>>
>> This is the codebase of AgmsCacheJdbcStoreSessionListner.java
>> This null pointer occurs due to a datasource bean not found.
>> The cluster was working fine but what could be the reason for
>> unavailability of datasource bean in between running cluster.
>>
>>
>>
>> public class AgmsCacheJdbcStoreSessionListener extends 
>> CacheJdbcStoreSessionListener {
>>
>>
>>   @SpringApplicationContextResource  public void 
>> setupDataSourceFromSpringContext(Object appCtx) {
>>     ApplicationContext appContext = (ApplicationContext) appCtx;
>>     setDataSource((DataSource) appContext.getBean("dataSource"));
>>   }
>> }
>>
>>
>>
>> I can see one log line that tells us about a problem on the network side.
>> Is this the possible reason?
>>
>> 2021-10-07 16:28:22,889 197776202 [tcp-disco-msg-worker-[fb67a5fd
>> XX.XX.XX.XX:47500 crd]-#2%springDataNode%-#69%springDataNode%] WARN
>>  o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably, due to
>> short-time network problems).
>>
>>
>>
>>
>> On Mon, Oct 11, 2021 at 7:15 PM stanilovsky evgeny <
>> estanilovs...@gridgain.com
>> <//e.mail.ru/compose/?mailto=mailto%3aestanilovs...@gridgain.com>> wrote:
>>
>> may be this ?
>>
>> Caused by: java.lang.NullPointerException: null
>> at
>> com.xyz.agms.grid.cache.loader.AgmsCacheJdbcStoreSessionListener.setupDataSourceFromSpringContext(AgmsCacheJdbcStoreSessionListener.java:14)
>> ... 23 common frames omitted
>>
>>
>>
>> Hi Zhenya,
>> CacheStoppedException occurred again on our ignite cluster. I have
>> captured logs with  IGNITE_QUIET = false.
>> There are four core nodes in the cluster and two nodes gone down. I am
>> attaching the logs for two failed nodes.
>> Please let me know if you need any further details.
>>
>> Thanks,
>> Akash
>>
>> On Tue, Sep 7, 2021 at 12:19 PM Zhenya Stanilovsky <arzamas...@mail.ru
>> <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>>
>> plz share somehow these logs, if you have no ideas how to share, you can
>> send it directly to arzamas...@mail.ru
>> <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>
>>
>>
>>
>> Meanwhile I grep the logs with the next occurrence of cache stopped
>> exception,can someone highlight if there is any known bug related to this?
>> I want to check the possible reason for this cache stop exception.
>>
>> On Mon, Sep 6, 2021 at 6:27 PM Akash Shinde <akashshi...@gmail.com
>> <http://e.mail.ru/compose/?mailto=mailto%3aakashshi...@gmail.com>> wrote:
>>
>> Hi Zhenya,
>> Thanks for the quick response.
>> I believe you are talking about ignite instances. There is
>> single ignite using in application.
>> I also want to point out that I am not using destroyCache()  method
>> anywhere in application.
>>
>> I will set  IGNITE_QUIET = false and try to grep the required logs.
>> This issue occurs by random and there is no way reproduce it.
>>
>> Thanks,
>> Akash
>>
>>
>>
>> On Mon, Sep 6, 2021 at 5:33 PM Zhenya Stanilovsky <arzamas...@mail.ru
>> <http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>>
>> Hi, Akash
>> You can obtain such a case, for example when you have several instances
>> and :
>> inst1:
>> cache = inst1.getOrCreateCache("cache1");
>>
>> after inst2 destroy calling:
>>
>> cache._some_method_call_
>>
>> inst2:
>>  inst2.destroyCache("cache1");
>>
>> or shorter: you still use instance that already destroyed, you can simple
>> grep your logs and found the time when cache has been stopped.
>> probably you need to set IGNITE_QUIET = false.
>> [1] https://ignite.apache.org/docs/latest/logging
>>
>>
>>
>>
>>
>>
>> Hi,
>> I have four server nodes and six client nodes on ignite cluster. I am
>> using ignite 2.10 version.
>> Some operations are failing due to the CacheStoppedException exception on
>> the server nodes. This has become a blocker issue.
>> Could someone please help me to resolve this issue.
>>
>> *Cache Configuration*
>>
>> CacheConfiguration subscriptionCacheCfg = new 
>> CacheConfiguration<>(CacheName.SUBSCRIPTION_CACHE.name());
>> subscriptionCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
>> subscriptionCacheCfg.setWriteThrough(false);
>> subscriptionCacheCfg.setReadThrough(true);
>> subscriptionCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
>> subscriptionCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
>> subscriptionCacheCfg.setBackups(*2*);
>> Factory<SubscriptionDataLoader> storeFactory = 
>> FactoryBuilder.factoryOf(SubscriptionDataLoader.class);
>> subscriptionCacheCfg.setCacheStoreFactory(storeFactory);
>> subscriptionCacheCfg.setIndexedTypes(DefaultDataKey.class, 
>> SubscriptionData.class);
>> subscriptionCacheCfg.setSqlIndexMaxInlineSize(47);
>> RendezvousAffinityFunction affinityFunction = new 
>> RendezvousAffinityFunction();
>> affinityFunction.setExcludeNeighbors(true);
>> subscriptionCacheCfg.setAffinity(affinityFunction);
>> subscriptionCacheCfg.setStatisticsEnabled(true);
>> subscriptionCacheCfg.setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE);
>>
>>
>> *Exception stack trace*
>>
>> ERROR c.q.dgms.kafka.TaskRequestListener - Error occurred while consuming
>> the object
>> com.baidu.unbiz.fluentvalidator.exception.RuntimeValidateException:
>> java.lang.IllegalStateException: class
>> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
>> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
>> at
>> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:506)
>> at
>> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:461)
>> at
>> com.xyz.dgms.service.UserManagementServiceImpl.deleteUser(UserManagementServiceImpl.java:710)
>> at
>> com.xyz.dgms.kafka.TaskRequestListener.processRequest(TaskRequestListener.java:190)
>> at
>> com.xyz.dgms.kafka.TaskRequestListener.process(TaskRequestListener.java:89)
>> at
>> com.xyz.libraries.mom.kafka.consumer.TopicConsumer.lambda$run$3(TopicConsumer.java:162)
>> at net.jodah.failsafe.Functions$12.call(Functions.java:274)
>> at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145)
>> at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:93)
>> at
>> com.xyz.libraries.mom.kafka.consumer.TopicConsumer.run(TopicConsumer.java:159)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.lang.IllegalStateException: class
>> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
>> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
>> at
>> org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:166)
>> at
>> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1625)
>> at
>> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:673)
>> at
>> com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:39)
>> at
>> com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:28)
>> at
>> com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:22)
>> at
>> com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:10)
>> at
>> com.xyz.dgms.validators.common.validators.UserDataValidator.validateSubscription(UserDataValidator.java:226)
>> at
>> com.xyz.dgms.validators.common.validators.UserDataValidator.validateRequest(UserDataValidator.java:124)
>> at
>> com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:346)
>> at
>> com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:41)
>> at
>> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:490)
>> ... 12 common frames omitted
>> Caused by:
>> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
>> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
>> ... 24 common frames omitted
>>
>>
>> Thanks,
>> Akash
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Re[4]: Failed to perform cache operation (cache is stopped)

Reply via email to