[jira] [Updated] (IGNITE-25653) Primary replica negotiation may fail on replica side due to raft server overload

Denis Chudov (Jira) Wed, 11 Jun 2025 03:02:41 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-25653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Denis Chudov updated IGNITE-25653:
----------------------------------
    Description: 
*Problem:*

When receiving LeaseGrantRequest, replica always performs a couple of raft 
operations:
 * read index
 * running PrimaryReplicaChangeCommand
 * and in the case when force flag is true - transferLeadership.

Raft internal disruptors are shared between groups. So, if the raft internals 
on Ignite node where the group leader of the given replica is located are 
overloaded then the mentioned raft commands would fail and the replica would 
not be able to process LeaseGrantRequest. This may look surprising because in 
the absence of primary replica there cannot be any load on this group, but 
there would be exception about overload.

Also, this failure on replica side will cause the creation of new lease (not a 
major issue because leases are batched).

*Proposals:*
 * introduce SystemWriteCommand which is never rejected by raft client by 
overload reason;
 * or add separate method to RaftGroupService (like #primaryReplicaChange() ) 
which works in the same way (never rejected).

  was:
*Problem:*

When receiving LeaseGrantRequest, replica always performs a couple of raft 
operations:
 * read index
 * running PrimaryReplicaChangeCommand
 * and in the case when force flag is true - transferLeadership.

Raft internal disruptors are shared between groups. So, if the raft internals 
on Ignite node where the group leader of the given replica is located are 
overloaded then the mentioned raft commands will fail and the replica would not 
be able to process LeaseGrantRequest. This may look surprising because in the 
absence of primary replica there cannot be any load on this group, but there 
would be exception about overload.

Also, this failure on replica side will cause the creation of new lease (not a 
major issue because leases are batched).

*Proposals:*
 * introduce SystemWriteCommand which is never rejected by raft client by 
overload reason;
 * or add separate method to RaftGroupService (like #primaryReplicaChange() ) 
which works in the same way (never rejected).


> Primary replica negotiation may fail on replica side due to raft server 
> overload
> --------------------------------------------------------------------------------
>
>                 Key: IGNITE-25653
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25653
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Denis Chudov
>            Priority: Major
>              Labels: ignite-3
>
> *Problem:*
> When receiving LeaseGrantRequest, replica always performs a couple of raft 
> operations:
>  * read index
>  * running PrimaryReplicaChangeCommand
>  * and in the case when force flag is true - transferLeadership.
> Raft internal disruptors are shared between groups. So, if the raft internals 
> on Ignite node where the group leader of the given replica is located are 
> overloaded then the mentioned raft commands would fail and the replica would 
> not be able to process LeaseGrantRequest. This may look surprising because in 
> the absence of primary replica there cannot be any load on this group, but 
> there would be exception about overload.
> Also, this failure on replica side will cause the creation of new lease (not 
> a major issue because leases are batched).
> *Proposals:*
>  * introduce SystemWriteCommand which is never rejected by raft client by 
> overload reason;
>  * or add separate method to RaftGroupService (like #primaryReplicaChange() ) 
> which works in the same way (never rejected).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-25653) Primary replica negotiation may fail on replica side due to raft server overload

Reply via email to