[ https://issues.apache.org/jira/browse/IGNITE-11252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Plehanov updated IGNITE-11252: ----------------------------------- Fix Version/s: 2.16 (was: 2.15) > Docs: Index corruption recovery procedure > ----------------------------------------- > > Key: IGNITE-11252 > URL: https://issues.apache.org/jira/browse/IGNITE-11252 > Project: Ignite > Issue Type: Task > Components: documentation > Affects Versions: 2.7 > Reporter: Denis A. Magda > Assignee: Prachi Garg > Priority: Critical > Fix For: 2.16 > > > We need to document a recovery procedure if an index corruption happens. > Refer to this thread for details and examples of the exception dumped to the > logs if the issue occurs: > http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-index-corruption-issue-gt-unrecoverable-cluster-td39869.html > # Recovering from an index corruption > ## Applicable if > It is known that an index of a cache is corrupted, but the main data > (partition files and WAL) is fine. Show code snippets of possible examples. > Find via the references shared in the dev list discussion. > ## Steps to recover > 1. Stop the node > 2. Delete index.bin of the affected caches (path is > db/<consistent_id>/cache-<cache_name>/index.bin) > 3. Start the node > - Note: At this point the node is active in the cluster but don’t have > indexes. > It means that it serves SQL queries but their performance can be low. > Avoid running SQL queries on large tables at this point > 4. Wait for message “Finished indexes rebuilding for cache <cache_name>” in > the Ignite log > # Recovering from a persistent storage corruption > ## Applicable if > A part of the persistent storage (partition files, checkpoint markers or WAL) > was corrupted > and there is no other way to recover it, but there are healthy copies of all > data on other nodes. > ## Steps to recover > 1. Stop the node > 2. Delete all persistence files of the node (best to clear Ignite working > directory, storage directory, WAL and WAL archive directories) > 3. Make sure consistentId is explicitly set in the configuration of the node > - If it isn’t, lookup the generated consistentId using control.sh and set it > explicitly in the config or via IGNITE_CONSISTENT_ID (2.8+ only) > 4. Start the node > 5. Wait for messages <Finished rebalancing cache> for all caches -- This message was sent by Atlassian Jira (v8.20.10#820010)