Debugging long persistence recovery on restart

Courtney Robinson Wed, 15 Sep 2021 03:44:44 -0700

Hey all,
We're trying to debug an issue in production where Ignite 2.8.1 is taking 1
hour *per node* to start.
This cluster has 3 nodes and caches/tables have 2 backups i.e. each node
has a replica so it takes 3 hours to restart all nodes.
The nodes get stuck after outputting:


> 2021-09-15 10:21:16.889  INFO [ArcOS,,,] 8 --- [           main]
> o.a.i.i.p.cache.GridCacheProcessor      [285] :  Started cache in recovery
> mode [name=*cache1*, id=-1556141001, group=hypi, dataRegionName=hypi,
> mode=PARTITIONED, atomicity=ATOMIC, backups=2, mvcc=false]
>
then after it logs a similar message about *cache2* and carries on as if
nothing happened.
The log is always in this order and it is always these two caches.
I believe this log happens after the cache is recovered so the problem is
with cache2.

There is only about 1GB in this cache2 that appears to have the problem.

How can we find out what's causing Ignite to take an hour each on this
cache?

Regards,
Courtney Robinson
Founder and CEO, Hypi
Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>

<https://hypi.io>
https://hypi.io

Debugging long persistence recovery on restart

Reply via email to