[jira] [Updated] (IGNITE-11252) Docs: Index corruption recovery procedure

Alex Plehanov (Jira) Wed, 03 May 2023 09:27:10 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-11252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alex Plehanov updated IGNITE-11252:
-----------------------------------
    Fix Version/s: 2.16
                       (was: 2.15)

> Docs: Index corruption recovery procedure
> -----------------------------------------
>
>                 Key: IGNITE-11252
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11252
>             Project: Ignite
>          Issue Type: Task
>          Components: documentation
>    Affects Versions: 2.7
>            Reporter: Denis A. Magda
>            Assignee: Prachi Garg
>            Priority: Critical
>             Fix For: 2.16
>
>
> We need to document a recovery procedure if an index corruption happens. 
> Refer to this thread for details and examples of the exception dumped to the 
> logs if the issue occurs:
> http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-index-corruption-issue-gt-unrecoverable-cluster-td39869.html
> # Recovering from an index corruption
> ## Applicable if
> It is known that an index of a cache is corrupted, but the main data 
> (partition files and WAL) is fine. Show code snippets of possible examples. 
> Find via the references shared in the dev list discussion.
> ## Steps to recover
> 1. Stop the node
> 2. Delete index.bin of the affected caches (path is 
> db/<consistent_id>/cache-<cache_name>/index.bin)
> 3. Start the node
> - Note: At this point the node is active in the cluster but don’t have 
> indexes. 
> It means that it serves SQL queries but their performance can be low.
> Avoid running SQL queries on large tables at this point
> 4. Wait for message “Finished indexes rebuilding for cache <cache_name>” in 
> the Ignite log
> # Recovering from a persistent storage corruption
> ## Applicable if
> A part of the persistent storage (partition files, checkpoint markers or WAL) 
> was corrupted
> and there is no other way to recover it, but there are healthy copies of all 
> data on other nodes.
> ## Steps to recover
> 1. Stop the node
> 2. Delete all persistence files of the node (best to clear Ignite working 
> directory, storage directory, WAL and WAL archive directories)
> 3. Make sure consistentId is explicitly set in the configuration of the node
> - If it isn’t, lookup the generated consistentId using control.sh and set it 
> explicitly in the config or via IGNITE_CONSISTENT_ID (2.8+ only)
> 4. Start the node
> 5. Wait for messages <Finished rebalancing cache> for all caches



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-11252) Docs: Index corruption recovery procedure

Reply via email to