Late yesterday afternoon our DB server went down hard. we tried to re-start and it went into recovery mode to recover transaction history and failed. Notable error was:
FATAL: failed to re-find parent key in index "257969064" for split pages 8366/12375 If you look this error up, it indicates issues with the transaction logs and the inability to recover due to corrupt or missing transaction logs. The solution is to: 1. Back up the DB files 2. Run pg_resetxlogs (this might produce corruption due inconsistent data) 3. Dump the DB 4. Reload the DB Unfortunately this solution is not practical in our case for multiple reasons. - We do not have the space for steps 1 & 3 - The time required for steps 1, 3, and 4 is approximately 1 week per step (3 weeks total) since our database size is approximately 5.4TB If the data is in an inconsistent state, are there other alternative solutions, such as finding the index specified in the FATAL error and somehow dropping it? and does anyone know what circumstances/conditions might corrupt or cause a transaction log to go missing. thanks