[lustre-discuss] OST went back in time: no(?) hardware issue

Thomas Roth via lustre-discuss Tue, 03 Oct 2023 07:24:01 -0700

Hi all,

in our Lustre 2.12.5 system, we have "OST went back in time" after OST hardware 
replacement:
- hardware had reached EOL
- we set `max_create_count=0` for these OSTs, searched for and migrated off the 
files of these OSTs
- formatted the new OSTs with `--replace` and the old indices
- all OSTs are on ZFS
- set the OSTs `active=0` on our 3 MDTs
- moved in the new hardware, reused the old NIDs, old OST indices, mounted the 
OSTs
- set the OSTs `active=1`
- ran `lfsck` on all servers
- set `max_create_count=200` for these OSTs


Now the "OST went back in time" messages appeard in the MDS logs.

This doesn't quite fit the description in the manual. There were no crashes or 
power losses. I cannot understand how which cache might have been lost.
The transaction numbers quoted in the error are both large, eg. `transno 
55841088879 was previously committed, server now claims 4294992012`

What should we do? Give `lfsck` another try?

Regards,
Thomas


--
--------------------------------------------------------------------
Thomas Roth
Department: IT

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] OST went back in time: no(?) hardware issue

Reply via email to