Re: How to avoid "all partition owners have left the grid" or handle automatically.

Stephen Darlington Tue, 21 Feb 2023 07:01:48 -0800

I think there is an argument that when you have persistence enabled and a 
sensible partition loss policy, then you shouldn’t have to reset lost 
partitions. As you note, the data is still consistent. You’ve just temporarily 
lost some availability.


However, that’s not how it currently works. If you shut down more nodes than 
you have backups, then you have to reset lost partitions.

> On 20 Feb 2023, at 18:14, John Smith <[email protected]> wrote:
> 
> My cache config for distributed cache is as follows... The maintenance of a 
> machine can be about 10-20 mins depending on what the maintenance is. I don't 
> lose data. I just get "all partition owners have left" message and then I 
> just use control script to reset the  flag for that specific cache.
> 
>   <bean id="cache-template-bean" abstract="true" 
> class="org.apache.ignite.configuration.CacheConfiguration">
>     <!-- when you create a template via XML configuration,
>     you must add an asterisk to the name of the template -->
>     <property name="name" value="partitionedTpl*"/>
>     <property name="cacheMode" value="PARTITIONED" />
>     <property name="backups" value="1" />
>     <property name="partitionLossPolicy" value="READ_WRITE_SAFE"/>
>   </bean>
> 
> On Mon., Feb. 20, 2023, 7:03 a.m. Stephen Darlington, 
> <[email protected] <mailto:[email protected]>> 
> wrote:
>> How are your caches configured? If they have at least one backup, you should 
>> be able to restart one node at a time without data loss.
>> 
>> There is no automated way to reset lost partitions. Nor should there be 
>> (IMHO). If you have lost partitions, you have probably lost data. That 
>> should require manual intervention.
>> 
>>> On 14 Feb 2023, at 17:58, John Smith <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hello, does anyone have insights on this?
>>> 
>>> On Thu., Feb. 9, 2023, 4:28 p.m. John Smith, <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> Any thoughts on this?
>>>> 
>>>> On Mon., Feb. 6, 2023, 8:38 p.m. John Smith, <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>>> That Jira doesn't look like the issue at all. That issue seems to suggest 
>>>>> that there is a "data loss" exception. In our case the grid sets the 
>>>>> cache in a "safe" mode... "all partition owners have left the grid" which 
>>>>> requires us to then manually reset the flag.
>>>>> 
>>>>> On Mon, Feb 6, 2023 at 7:46 PM 18624049226 <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>>> https://issues.apache.org/jira/browse/IGNITE-17657
>>>>>> 
>>>>>> 在 2023/2/7 05:41, John Smith 写道:
>>>>>>> Hi, sometimes when we perform maintenance and reboot nodes we get "All 
>>>>>>> partition owners have left the grid" and then we go and run 
>>>>>>> ./control.sh --host ignite-xxxxxx --cache reset_lost_partitions 
>>>>>>> some-cache and everything is fine again...
>>>>>>> 
>>>>>>> This seems to happen with partitioned caches and we are running as 
>>>>>>> READ_WRITE_SAFE.
>>>>>>> 
>>>>>>> We have a few caches and instead of relying on a human to manually go 
>>>>>>> run the command is there a way for this to happen automatically?
>>>>>>> 
>>>>>>> And if there is an automatic way how do we enable it and what are the 
>>>>>>> consequences?
>>

Re: How to avoid "all partition owners have left the grid" or handle automatically.

Reply via email to