Re: Debugging long persistence recovery on restart

Shishkov Ilya Fri, 24 Sep 2021 13:06:47 -0700

Hi  Courtney,
Have you looked at thread dumps in the moment of server nodes being stuck?


ср, 15 сент. 2021 г. в 13:44, Courtney Robinson <[email protected]>:

> Hey all,
> We're trying to debug an issue in production where Ignite 2.8.1 is taking
> 1 hour *per node* to start.
> This cluster has 3 nodes and caches/tables have 2 backups i.e. each node
> has a replica so it takes 3 hours to restart all nodes.
> The nodes get stuck after outputting:
>
>> 2021-09-15 10:21:16.889  INFO [ArcOS,,,] 8 --- [           main]
>> o.a.i.i.p.cache.GridCacheProcessor      [285] :  Started cache in recovery
>> mode [name=*cache1*, id=-1556141001, group=hypi, dataRegionName=hypi,
>> mode=PARTITIONED, atomicity=ATOMIC, backups=2, mvcc=false]
>>
> then after it logs a similar message about *cache2* and carries on as if
> nothing happened.
> The log is always in this order and it is always these two caches.
> I believe this log happens after the cache is recovered so the problem is
> with cache2.
>
> There is only about 1GB in this cache2 that appears to have the problem.
>
> How can we find out what's causing Ignite to take an hour each on this
> cache?
>
> Regards,
> Courtney Robinson
> Founder and CEO, Hypi
> Tel: ++44 208 123 2413 (GMT+0) <https://hypi.io>
>
> <https://hypi.io>
> https://hypi.io
>

Re: Debugging long persistence recovery on restart

Reply via email to