Yes, I have set  failureDetectionTimeout  = 60000.
There is no long GC pause.
Core1 GC report
<https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMTAvMTMvLS1jb3JlXzFfZ2MtMjAyMS0xMC0wNV8wOS0zMS0yNi5sb2cuMC5jdXJyZW50LS05LTU3LTE3&channel=WEB>
Core 2 GC Report
<https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMTAvMTMvLS1DT1JFXzJfZ2MtMjAyMS0xMC0wNV8wOS00MC0xMC5sb2cuMC0tMTAtMjAtNTQ=&channel=WEB>

On Wed, Oct 13, 2021 at 1:16 PM Zhenya Stanilovsky <arzamas...@mail.ru>
wrote:

>
> Ok, additionally there is info about node segmentation:
> Node FAILED: TcpDiscoveryNode [id=fb67a5fd-f1ab-441d-a38e-bab975cd1037,
> consistentId=0:0:0:0:0:0:0:1%lo,XX.XX.XX.XX,127.0.0.1:47500,
> addrs=ArrayList [0:0:0:0:0:0:0:1%lo, XX.XX.XX.XX, 127.0.0.1],
> sockAddrs=HashSet [qagmscore02.xyz.com/XX.XX.XX.XX:47500,
> /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=25,
> intOrder=16, lastExchangeTime=1633426750418, loc=false,
> ver=2.10.0#20210310-sha1:bc24f6ba, isClient=false]
>
> Local node SEGMENTED: TcpDiscoveryNode
> [id=7f357ca2-0ae2-4af0-bfa4-d18e7bcb3797
>
> Possible too long JVM pause: 1052 milliseconds.
>
> Are you changed default settings networking timeouts ? If no — try to
> recheck setting of failureDetectionTimeout
> If you have GC pause longer than 10 seconds, node will be dropped from the
> cluster(by default).
>
>
> This is the codebase of AgmsCacheJdbcStoreSessionListner.java
> This null pointer occurs due to a datasource bean not found.
> The cluster was working fine but what could be the reason for
> unavailability of datasource bean in between running cluster.
>
>
>
> public class AgmsCacheJdbcStoreSessionListener extends 
> CacheJdbcStoreSessionListener {
>
>
>   @SpringApplicationContextResource  public void 
> setupDataSourceFromSpringContext(Object appCtx) {
>     ApplicationContext appContext = (ApplicationContext) appCtx;
>     setDataSource((DataSource) appContext.getBean("dataSource"));
>   }
> }
>
>
>
> I can see one log line that tells us about a problem on the network side.
> Is this the possible reason?
>
> 2021-10-07 16:28:22,889 197776202 [tcp-disco-msg-worker-[fb67a5fd
> XX.XX.XX.XX:47500 crd]-#2%springDataNode%-#69%springDataNode%] WARN
>  o.a.i.s.d.tcp.TcpDiscoverySpi - Node is out of topology (probably, due to
> short-time network problems).
>
>
>
>
> On Mon, Oct 11, 2021 at 7:15 PM stanilovsky evgeny <
> estanilovs...@gridgain.com
> <//e.mail.ru/compose/?mailto=mailto%3aestanilovs...@gridgain.com>> wrote:
>
> may be this ?
>
> Caused by: java.lang.NullPointerException: null
> at
> com.xyz.agms.grid.cache.loader.AgmsCacheJdbcStoreSessionListener.setupDataSourceFromSpringContext(AgmsCacheJdbcStoreSessionListener.java:14)
> ... 23 common frames omitted
>
>
>
> Hi Zhenya,
> CacheStoppedException occurred again on our ignite cluster. I have
> captured logs with  IGNITE_QUIET = false.
> There are four core nodes in the cluster and two nodes gone down. I am
> attaching the logs for two failed nodes.
> Please let me know if you need any further details.
>
> Thanks,
> Akash
>
> On Tue, Sep 7, 2021 at 12:19 PM Zhenya Stanilovsky <arzamas...@mail.ru
> <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>
> plz share somehow these logs, if you have no ideas how to share, you can
> send it directly to arzamas...@mail.ru
> <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>
>
>
>
> Meanwhile I grep the logs with the next occurrence of cache stopped
> exception,can someone highlight if there is any known bug related to this?
> I want to check the possible reason for this cache stop exception.
>
> On Mon, Sep 6, 2021 at 6:27 PM Akash Shinde <akashshi...@gmail.com
> <http://e.mail.ru/compose/?mailto=mailto%3aakashshi...@gmail.com>> wrote:
>
> Hi Zhenya,
> Thanks for the quick response.
> I believe you are talking about ignite instances. There is
> single ignite using in application.
> I also want to point out that I am not using destroyCache()  method
> anywhere in application.
>
> I will set  IGNITE_QUIET = false and try to grep the required logs.
> This issue occurs by random and there is no way reproduce it.
>
> Thanks,
> Akash
>
>
>
> On Mon, Sep 6, 2021 at 5:33 PM Zhenya Stanilovsky <arzamas...@mail.ru
> <http://e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>
> Hi, Akash
> You can obtain such a case, for example when you have several instances
> and :
> inst1:
> cache = inst1.getOrCreateCache("cache1");
>
> after inst2 destroy calling:
>
> cache._some_method_call_
>
> inst2:
>  inst2.destroyCache("cache1");
>
> or shorter: you still use instance that already destroyed, you can simple
> grep your logs and found the time when cache has been stopped.
> probably you need to set IGNITE_QUIET = false.
> [1] https://ignite.apache.org/docs/latest/logging
>
>
>
>
>
>
> Hi,
> I have four server nodes and six client nodes on ignite cluster. I am
> using ignite 2.10 version.
> Some operations are failing due to the CacheStoppedException exception on
> the server nodes. This has become a blocker issue.
> Could someone please help me to resolve this issue.
>
> *Cache Configuration*
>
> CacheConfiguration subscriptionCacheCfg = new 
> CacheConfiguration<>(CacheName.SUBSCRIPTION_CACHE.name());
> subscriptionCacheCfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> subscriptionCacheCfg.setWriteThrough(false);
> subscriptionCacheCfg.setReadThrough(true);
> subscriptionCacheCfg.setRebalanceMode(CacheRebalanceMode.ASYNC);
> subscriptionCacheCfg.setWriteSynchronizationMode(CacheWriteSynchronizationMode.FULL_SYNC);
> subscriptionCacheCfg.setBackups(*2*);
> Factory<SubscriptionDataLoader> storeFactory = 
> FactoryBuilder.factoryOf(SubscriptionDataLoader.class);
> subscriptionCacheCfg.setCacheStoreFactory(storeFactory);
> subscriptionCacheCfg.setIndexedTypes(DefaultDataKey.class, 
> SubscriptionData.class);
> subscriptionCacheCfg.setSqlIndexMaxInlineSize(47);
> RendezvousAffinityFunction affinityFunction = new 
> RendezvousAffinityFunction();
> affinityFunction.setExcludeNeighbors(true);
> subscriptionCacheCfg.setAffinity(affinityFunction);
> subscriptionCacheCfg.setStatisticsEnabled(true);
> subscriptionCacheCfg.setPartitionLossPolicy(PartitionLossPolicy.READ_WRITE_SAFE);
>
>
> *Exception stack trace*
>
> ERROR c.q.dgms.kafka.TaskRequestListener - Error occurred while consuming
> the object
> com.baidu.unbiz.fluentvalidator.exception.RuntimeValidateException:
> java.lang.IllegalStateException: class
> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
> at
> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:506)
> at
> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:461)
> at
> com.xyz.dgms.service.UserManagementServiceImpl.deleteUser(UserManagementServiceImpl.java:710)
> at
> com.xyz.dgms.kafka.TaskRequestListener.processRequest(TaskRequestListener.java:190)
> at
> com.xyz.dgms.kafka.TaskRequestListener.process(TaskRequestListener.java:89)
> at
> com.xyz.libraries.mom.kafka.consumer.TopicConsumer.lambda$run$3(TopicConsumer.java:162)
> at net.jodah.failsafe.Functions$12.call(Functions.java:274)
> at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145)
> at net.jodah.failsafe.SyncFailsafe.run(SyncFailsafe.java:93)
> at
> com.xyz.libraries.mom.kafka.consumer.TopicConsumer.run(TopicConsumer.java:159)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: class
> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
> at
> org.apache.ignite.internal.processors.cache.GridCacheGateway.enter(GridCacheGateway.java:166)
> at
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(GatewayProtectedCacheProxy.java:1625)
> at
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(GatewayProtectedCacheProxy.java:673)
> at
> com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:39)
> at
> com.xyz.dgms.grid.dao.AbstractDataGridDAO.getData(AbstractDataGridDAO.java:28)
> at
> com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:22)
> at
> com.xyz.dgms.grid.dataservice.DefaultDataGridService.getData(DefaultDataGridService.java:10)
> at
> com.xyz.dgms.validators.common.validators.UserDataValidator.validateSubscription(UserDataValidator.java:226)
> at
> com.xyz.dgms.validators.common.validators.UserDataValidator.validateRequest(UserDataValidator.java:124)
> at
> com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:346)
> at
> com.xyz.dgms.validators.common.validators.UserDataValidator.validate(UserDataValidator.java:41)
> at
> com.baidu.unbiz.fluentvalidator.FluentValidator.doValidate(FluentValidator.java:490)
> ... 12 common frames omitted
> Caused by:
> org.apache.ignite.internal.processors.cache.CacheStoppedException: Failed
> to perform cache operation (cache is stopped): SUBSCRIPTION_CACHE
> ... 24 common frames omitted
>
>
> Thanks,
> Akash
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Reply via email to