ok, not sure what happened but I'm pretty sure it was one machine at a
time. But ok.

So just to be clear, with backup = 1 then we can lose 1 machine for any
amount of time until it comes back fully online and rebalanced before going
to the next machine?

On Tue, Feb 21, 2023 at 10:01 AM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> I think there is an argument that when you have persistence enabled and a
> sensible partition loss policy, then you shouldn’t have to reset lost
> partitions. As you note, the data is still consistent. You’ve just
> temporarily lost some availability.
>
> However, that’s not how it currently works. If you shut down more nodes
> than you have backups, then you have to reset lost partitions.
>
> On 20 Feb 2023, at 18:14, John Smith <java.dev....@gmail.com> wrote:
>
> My cache config for distributed cache is as follows... The maintenance of
> a machine can be about 10-20 mins depending on what the maintenance is. I
> don't lose data. I just get "all partition owners have left" message and
> then I just use control script to reset the  flag for that specific cache.
>
> <bean id="cache-template-bean" abstract="true"
> class="org.apache.ignite.configuration.CacheConfiguration"> <!-- when you
> create a template via XML configuration, you must add an asterisk to the
> name of the template --> <property name="name" value="partitionedTpl*"/>
> <property name="cacheMode" value="PARTITIONED" /> <property name="backups"
> value="1" /> <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
> </bean>
>
> On Mon., Feb. 20, 2023, 7:03 a.m. Stephen Darlington, <
> stephen.darling...@gridgain.com> wrote:
>
>> How are your caches configured? If they have at least one backup, you
>> should be able to restart one node at a time without data loss.
>>
>> There is no automated way to reset lost partitions. Nor should there be
>> (IMHO). If you have lost partitions, you have probably lost data. That
>> should require manual intervention.
>>
>> On 14 Feb 2023, at 17:58, John Smith <java.dev....@gmail.com> wrote:
>>
>> Hello, does anyone have insights on this?
>>
>> On Thu., Feb. 9, 2023, 4:28 p.m. John Smith, <java.dev....@gmail.com>
>> wrote:
>>
>>> Any thoughts on this?
>>>
>>> On Mon., Feb. 6, 2023, 8:38 p.m. John Smith, <java.dev....@gmail.com>
>>> wrote:
>>>
>>>> That Jira doesn't look like the issue at all. That issue seems to
>>>> suggest that there is a "data loss" exception. In our case the grid sets
>>>> the cache in a "safe" mode... "all partition owners have left the grid"
>>>> which requires us to then manually reset the flag.
>>>>
>>>> On Mon, Feb 6, 2023 at 7:46 PM 18624049226 <18624049...@163.com> wrote:
>>>>
>>>>> https://issues.apache.org/jira/browse/IGNITE-17657
>>>>> 在 2023/2/7 05:41, John Smith 写道:
>>>>>
>>>>> Hi, sometimes when we perform maintenance and reboot nodes we get "All
>>>>> partition owners have left the grid" and then we go and run ./control.sh
>>>>> --host ignite-xxxxxx --cache reset_lost_partitions some-cache and
>>>>> everything is fine again...
>>>>>
>>>>> This seems to happen with partitioned caches and we are running as
>>>>> READ_WRITE_SAFE.
>>>>>
>>>>> We have a few caches and instead of relying on a human to manually go
>>>>> run the command is there a way for this to happen automatically?
>>>>>
>>>>> And if there is an automatic way how do we enable it and what are the
>>>>> consequences?
>>>>>
>>>>>
>>
>

Reply via email to