My cache config for distributed cache is as follows... The maintenance of a machine can be about 10-20 mins depending on what the maintenance is. I don't lose data. I just get "all partition owners have left" message and then I just use control script to reset the flag for that specific cache.
<bean id="cache-template-bean" abstract="true" class="org.apache.ignite.configuration.CacheConfiguration"> <!-- when you create a template via XML configuration, you must add an asterisk to the name of the template --> <property name="name" value="partitionedTpl*"/> <property name="cacheMode" value="PARTITIONED" /> <property name="backups" value="1" /> <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/> </bean> On Mon., Feb. 20, 2023, 7:03 a.m. Stephen Darlington, < stephen.darling...@gridgain.com> wrote: > How are your caches configured? If they have at least one backup, you > should be able to restart one node at a time without data loss. > > There is no automated way to reset lost partitions. Nor should there be > (IMHO). If you have lost partitions, you have probably lost data. That > should require manual intervention. > > On 14 Feb 2023, at 17:58, John Smith <java.dev....@gmail.com> wrote: > > Hello, does anyone have insights on this? > > On Thu., Feb. 9, 2023, 4:28 p.m. John Smith, <java.dev....@gmail.com> > wrote: > >> Any thoughts on this? >> >> On Mon., Feb. 6, 2023, 8:38 p.m. John Smith, <java.dev....@gmail.com> >> wrote: >> >>> That Jira doesn't look like the issue at all. That issue seems to >>> suggest that there is a "data loss" exception. In our case the grid sets >>> the cache in a "safe" mode... "all partition owners have left the grid" >>> which requires us to then manually reset the flag. >>> >>> On Mon, Feb 6, 2023 at 7:46 PM 18624049226 <18624049...@163.com> wrote: >>> >>>> https://issues.apache.org/jira/browse/IGNITE-17657 >>>> 在 2023/2/7 05:41, John Smith 写道: >>>> >>>> Hi, sometimes when we perform maintenance and reboot nodes we get "All >>>> partition owners have left the grid" and then we go and run ./control.sh >>>> --host ignite-xxxxxx --cache reset_lost_partitions some-cache and >>>> everything is fine again... >>>> >>>> This seems to happen with partitioned caches and we are running as >>>> READ_WRITE_SAFE. >>>> >>>> We have a few caches and instead of relying on a human to manually go >>>> run the command is there a way for this to happen automatically? >>>> >>>> And if there is an automatic way how do we enable it and what are the >>>> consequences? >>>> >>>> >