[ https://issues.apache.org/jira/browse/IGNITE-17793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644334#comment-17644334 ]
Ignite TC Bot commented on IGNITE-17793: ---------------------------------------- {panel:title=Branch: [pull/10396/head] Base: [master] : No blockers found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel} {panel:title=Branch: [pull/10396/head] Base: [master] : New Tests (5)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1} {color:#00008b}PDS 2{color} [[tests 5|https://ci2.ignite.apache.org/viewLog.html?buildId=6950220]] * {color:#013220}IgnitePdsTestSuite2: HistoricalRebalanceCheckpointTest.testDelayedToBackupsRequests2BackupsMorePuts - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: HistoricalRebalanceCheckpointTest.testDelayedToBackupsRequests1BackupMorePuts - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: HistoricalRebalanceCheckpointTest.testDelayedToBackupsRequests1Backup - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: HistoricalRebalanceCheckpointTest.testDelayed1PhaseCommitResponses - PASSED{color} * {color:#013220}IgnitePdsTestSuite2: HistoricalRebalanceCheckpointTest.testDelayedToBackupsRequests2Backups - PASSED{color} {panel} [TeamCity *--> Run :: All* Results|https://ci2.ignite.apache.org/viewLog.html?buildId=6940776&buildTypeId=IgniteTests24Java8_RunAll] > Historical rebalance must use HWM instead of LWM to seek the proper > checkpoint to avoid the data loss > ----------------------------------------------------------------------------------------------------- > > Key: IGNITE-17793 > URL: https://issues.apache.org/jira/browse/IGNITE-17793 > Project: Ignite > Issue Type: Sub-task > Reporter: Anton Vinogradov > Assignee: Vladimir Steshin > Priority: Major > Labels: iep-31, ise > Attachments: HistoricalRebalanceCheckpointTest.java > > > Currently, historical rebalance at > {{CheckpointHistory#searchEarliestWalPointer}} seeks for the newest > checkpoint with counter less that lowest entry has to be rebalanced. > Unfortunately, we may have more that one checkpoint with the same counter and > it's impossible to use the newest one as a rebalance start point. > For example, we have partition with LWM=100, some gaps and HWM=200. > Checkpoint will have the counter == 100. > Then we may close some gaps, exluding 101 (to keep LWM == 100). > And again, checkpoint will have counter == 100. > Newest checkpoint (marked with counter 100) will not cointain all committed > entries with counter > 100. > Then lets close the rest of the gaps to make historical rebalance possible. > And after the rebalance finish, we'll see a warning "Some partition entries > were missed during historical rebalance" and inconsistent cluster state. > See reproducer at [^HistoricalRebalanceCheckpointTest.java] > Possible solution is to use HWM instead of LWM during the search. -- This message was sent by Atlassian Jira (v8.20.10#820010)