As is to be expected, MDT no. 2 did not like the situation either:
:~# cat /proc/fs/lustre/mdt/hebe-MDT0002/recovery_status
status: WAITING
non-ready MDTs: 0001
recovery_start: 1579525859
time_waited: 23
I was already reading LU-9748 and chewing my nails about an ad-hoc upgrade (this is a Lustre 2.10.6
system), when MDT 1 finally relented, obviously getting the necessary logs now that MDT 2 had been
back and finished its recovery.
Then, of course, MDT 2 also recovered.
In such a situation, would 'lctl abort recovery' help?
Or shutting down all three servers and then restarting 0 - 1 - 2 ?
Regrads,
Thomas
On 20/01/2020 14.00, Thomas Roth wrote:
Hi all,
I had to restart our MDTs 1 and 2.
No.2 is still doing a file system check, no. 1 is mounted again and should be
in recovery, however:
:~# cat recovery_status
status: WAITING
non-ready MDTs: 0002
recovery_start: 1579524336
time_waited: 538
Seem I have misunderstood the organisation of multiple MDTs: I thought they were independent of each
other - execept that MDT 0 has the root of the filesystem, of course.
But the others, waiting for everybody to be online?
Regards,
Thomas
--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de
Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org