[ 
https://issues.apache.org/jira/browse/IGNITE-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-24960:
-------------------------------------
    Description: 
There are multiple reasons for tests hanging that may lead to test suites 
execution timeout. Three reasons became clear after the investigation and thus 
were fixed:
1. Updating the raft peer-set on the stable update to union(stable, pending), 
and not just stable, as it was before. This solves most 
{code:java}
All peers are unavailable...{code}
2. Fix hangs on index creation. The problem was that due to local node lag, the 
index creation procedure assumed that the primary was expiring and stopped the 
build process, expecting that a new primary would be selected and recover this 
process, while in fact the lease was extended, and the node just did not see it 
due to lag.

3. Partial fix for race of adding table processor and processing of raft 
command that touches the table processor. Unfortunately, it was not possible to 
solve this problem completely, as NPE of this nature still sometimes appear in 
logs, but they occur much less frequently. I do not want to keep the patch with 
this.

> Execution timeouts in case of enabled colocation
> ------------------------------------------------
>
>                 Key: IGNITE-24960
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24960
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Alexander Lapin
>            Priority: Major
>              Labels: ignite-3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are multiple reasons for tests hanging that may lead to test suites 
> execution timeout. Three reasons became clear after the investigation and 
> thus were fixed:
> 1. Updating the raft peer-set on the stable update to union(stable, pending), 
> and not just stable, as it was before. This solves most 
> {code:java}
> All peers are unavailable...{code}
> 2. Fix hangs on index creation. The problem was that due to local node lag, 
> the index creation procedure assumed that the primary was expiring and 
> stopped the build process, expecting that a new primary would be selected and 
> recover this process, while in fact the lease was extended, and the node just 
> did not see it due to lag.
> 3. Partial fix for race of adding table processor and processing of raft 
> command that touches the table processor. Unfortunately, it was not possible 
> to solve this problem completely, as NPE of this nature still sometimes 
> appear in logs, but they occur much less frequently. I do not want to keep 
> the patch with this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to