ok, not sure what happened but I'm pretty sure it was one machine at a time. But ok.
So just to be clear, with backup = 1 then we can lose 1 machine for any amount of time until it comes back fully online and rebalanced before going to the next machine? On Tue, Feb 21, 2023 at 10:01 AM Stephen Darlington < stephen.darling...@gridgain.com> wrote: > I think there is an argument that when you have persistence enabled and a > sensible partition loss policy, then you shouldn’t have to reset lost > partitions. As you note, the data is still consistent. You’ve just > temporarily lost some availability. > > However, that’s not how it currently works. If you shut down more nodes > than you have backups, then you have to reset lost partitions. > > On 20 Feb 2023, at 18:14, John Smith <java.dev....@gmail.com> wrote: > > My cache config for distributed cache is as follows... The maintenance of > a machine can be about 10-20 mins depending on what the maintenance is. I > don't lose data. I just get "all partition owners have left" message and > then I just use control script to reset the flag for that specific cache. > > <bean id="cache-template-bean" abstract="true" > class="org.apache.ignite.configuration.CacheConfiguration"> <!-- when you > create a template via XML configuration, you must add an asterisk to the > name of the template --> <property name="name" value="partitionedTpl*"/> > <property name="cacheMode" value="PARTITIONED" /> <property name="backups" > value="1" /> <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/> > </bean> > > On Mon., Feb. 20, 2023, 7:03 a.m. Stephen Darlington, < > stephen.darling...@gridgain.com> wrote: > >> How are your caches configured? If they have at least one backup, you >> should be able to restart one node at a time without data loss. >> >> There is no automated way to reset lost partitions. Nor should there be >> (IMHO). If you have lost partitions, you have probably lost data. That >> should require manual intervention. >> >> On 14 Feb 2023, at 17:58, John Smith <java.dev....@gmail.com> wrote: >> >> Hello, does anyone have insights on this? >> >> On Thu., Feb. 9, 2023, 4:28 p.m. John Smith, <java.dev....@gmail.com> >> wrote: >> >>> Any thoughts on this? >>> >>> On Mon., Feb. 6, 2023, 8:38 p.m. John Smith, <java.dev....@gmail.com> >>> wrote: >>> >>>> That Jira doesn't look like the issue at all. That issue seems to >>>> suggest that there is a "data loss" exception. In our case the grid sets >>>> the cache in a "safe" mode... "all partition owners have left the grid" >>>> which requires us to then manually reset the flag. >>>> >>>> On Mon, Feb 6, 2023 at 7:46 PM 18624049226 <18624049...@163.com> wrote: >>>> >>>>> https://issues.apache.org/jira/browse/IGNITE-17657 >>>>> 在 2023/2/7 05:41, John Smith 写道: >>>>> >>>>> Hi, sometimes when we perform maintenance and reboot nodes we get "All >>>>> partition owners have left the grid" and then we go and run ./control.sh >>>>> --host ignite-xxxxxx --cache reset_lost_partitions some-cache and >>>>> everything is fine again... >>>>> >>>>> This seems to happen with partitioned caches and we are running as >>>>> READ_WRITE_SAFE. >>>>> >>>>> We have a few caches and instead of relying on a human to manually go >>>>> run the command is there a way for this to happen automatically? >>>>> >>>>> And if there is an automatic way how do we enable it and what are the >>>>> consequences? >>>>> >>>>> >> >