Thanks Andreas and Aurélien for your answers. They makes us confident that we 
are on the right track for our cluster update !


Also I have noticed that 2.15.4-RC1 was released two weeks ago, can we expect 
2.15.4 to be ready by the end of the year ?


Regards,


Martin

________________________________
From: Andreas Dilger <[email protected]>
Sent: December 7, 2023 6:02 AM
To: Aurelien Degremont
Cc: Audet, Martin; [email protected]
Subject: Re: [lustre-discuss] Error messages (ex: not available for connect 
from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1

***Attention*** This email originated from outside of the NRC. ***Attention*** 
Ce courriel provient de l'extérieur du CNRC.

Aurelien,
there have beeen a number of questions about this message.

> Lustre: lustrevm-OST0001: deleting orphan objects from 0x0:227 to 0x0:513

This is not marked LustreError, so it is just an advisory message.

This can sometimes be useful for debugging issues related to MDT->OST 
connections.
It is already printed with D_INFO level, so the lowest printk level available.
Would rewording the message make it more clear that this is a normal situation
when the MDT and OST are establishing connections?

Cheers, Andreas

On Dec 5, 2023, at 02:13, Aurelien Degremont <[email protected]> wrote:
>
> > Now what is the messages about "deleting orphaned objects" ? Is it normal 
> > also ?
>
> Yeah, this is kind of normal, and I'm even thinking we should lower the 
> message verbosity...
> Andreas, do you agree that could become a simple CDEBUG(D_HA, ...) instead of 
> LCONSOLE(D_INFO, ...)?
>
>
> Aurélien
>
> Audet, Martin wrote on lundi 4 décembre 2023 20:26:
>> Hello Andreas,
>>
>> Thanks for your response. Happy to learn that the "errors" I was reporting 
>> aren't really errors.
>>
>> I now understand that the 3 messages about LDISKFS were only normal messages 
>> resulting from mounting the file systems (I was fooled by vim showing this 
>> message in red, like important error messages, but this is simply a false 
>> positive result of its syntax highlight rules probably triggered by the 
>> "errors=" string which is only a mount option...).
>>
>> Now what is the messages about "deleting orphaned objects" ? Is it normal 
>> also ? We boot the clients VMs always after the server is ready and we 
>> shutdown clients cleanly well before the vlmf Lustre server is (also 
>> cleanly) shutdown. It is a sign of corruption ? How come this happen if 
>> shutdowns are clean ?
>>
>> Thanks (and sorry for the beginners questions),
>>
>> Martin
>>
>> Andreas Dilger <[email protected]> wrote on December 4, 2023 5:25 AM:
>>> It wasn't clear from your rail which message(s) are you concerned about?  
>>> These look like normal mount message(s) to me.
>>>
>>> The "error" is pretty normal, it just means there were multiple services 
>>> starting at once and one wasn't yet ready for the other.
>>>
>>>          LustreError: 137-5: lustrevm-MDT0000_UUID: not available for 
>>> connect
>>>          from 0@lo (no target). If you are running an HA pair check that 
>>> the target
>>>         is mounted on the other server.
>>>
>>> It probably makes sense to quiet this message right at mount time to avoid 
>>> this.
>>>
>>> Cheers, Andreas
>>>
>>>> On Dec 1, 2023, at 10:24, Audet, Martin via lustre-discuss 
>>>> <[email protected]> wrote:
>>>>
>>>> 
>>>> Hello Lustre community,
>>>>
>>>> Have someone ever seen messages like these on in "/var/log/messages" on a 
>>>> Lustre server ?
>>>>
>>>> Dec  1 11:26:30 vlfs kernel: Lustre: Lustre: Build Version: 2.15.4_RC1
>>>> Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdd): mounted filesystem with 
>>>> ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
>>>> Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdc): mounted filesystem with 
>>>> ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc
>>>> Dec  1 11:26:30 vlfs kernel: LDISKFS-fs (sdb): mounted filesystem with 
>>>> ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc
>>>> Dec  1 11:26:36 vlfs kernel: LustreError: 137-5: lustrevm-MDT0000_UUID: 
>>>> not available for connect from 0@lo (no target). If you are running an HA 
>>>> pair check that the target is mounted on the other server.
>>>> Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: Imperative Recovery 
>>>> not enabled, recovery window 300-900
>>>> Dec  1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: deleting orphan 
>>>> objects from 0x0:227 to 0x0:513
>>>>
>>>> This happens on every boot on a Lustre server named vlfs (a AlmaLinux 8.9 
>>>> VM hosted on a VMware) playing the role of both MGS and OSS (it hosts an 
>>>> MDT two OST using "virtual" disks). We chose LDISKFS and not ZFS. Note 
>>>> that this happens at every boot, well before the clients (AlmaLinux 9.3 or 
>>>> 8.9 VMs) connect and even when the clients are powered off. The network 
>>>> connecting the clients and the server is a "virtual" 10GbE network (of 
>>>> course there is no virtual IB). Also we had the same messages previously 
>>>> with Lustre 2.15.3 using an AlmaLinux 8.8 server and AlmaLinux 8.8 / 9.2 
>>>> clients (also using VMs). Note also that we compile ourselves the Lustre 
>>>> RPMs from the sources from the git repository. We also chose to use a 
>>>> patched kernel. Our build procedure for RPMs seems to work well because 
>>>> our real cluster run fine on CentOS 7.9 with Lustre 2.12.9 and IB (MOFED) 
>>>> networking.
>>>>
>>>> So has anyone seen these messages ?
>>>>
>>>> Are they problematic ? If yes, how do we avoid them ?
>>>>
>>>> We would like to make sure our small test system using VMs works well 
>>>> before we upgrade our real cluster.
>>>>
>>>> Thanks in advance !
>>>>
>>>> Martin Audet
>>>>

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to