Thanks Andreas and Aurélien for your answers. They makes us confident that we are on the right track for our cluster update !
Also I have noticed that 2.15.4-RC1 was released two weeks ago, can we expect 2.15.4 to be ready by the end of the year ? Regards, Martin ________________________________ From: Andreas Dilger <[email protected]> Sent: December 7, 2023 6:02 AM To: Aurelien Degremont Cc: Audet, Martin; [email protected] Subject: Re: [lustre-discuss] Error messages (ex: not available for connect from 0@lo) on server boot with Lustre 2.15.3 and 2.15.4-RC1 ***Attention*** This email originated from outside of the NRC. ***Attention*** Ce courriel provient de l'extérieur du CNRC. Aurelien, there have beeen a number of questions about this message. > Lustre: lustrevm-OST0001: deleting orphan objects from 0x0:227 to 0x0:513 This is not marked LustreError, so it is just an advisory message. This can sometimes be useful for debugging issues related to MDT->OST connections. It is already printed with D_INFO level, so the lowest printk level available. Would rewording the message make it more clear that this is a normal situation when the MDT and OST are establishing connections? Cheers, Andreas On Dec 5, 2023, at 02:13, Aurelien Degremont <[email protected]> wrote: > > > Now what is the messages about "deleting orphaned objects" ? Is it normal > > also ? > > Yeah, this is kind of normal, and I'm even thinking we should lower the > message verbosity... > Andreas, do you agree that could become a simple CDEBUG(D_HA, ...) instead of > LCONSOLE(D_INFO, ...)? > > > Aurélien > > Audet, Martin wrote on lundi 4 décembre 2023 20:26: >> Hello Andreas, >> >> Thanks for your response. Happy to learn that the "errors" I was reporting >> aren't really errors. >> >> I now understand that the 3 messages about LDISKFS were only normal messages >> resulting from mounting the file systems (I was fooled by vim showing this >> message in red, like important error messages, but this is simply a false >> positive result of its syntax highlight rules probably triggered by the >> "errors=" string which is only a mount option...). >> >> Now what is the messages about "deleting orphaned objects" ? Is it normal >> also ? We boot the clients VMs always after the server is ready and we >> shutdown clients cleanly well before the vlmf Lustre server is (also >> cleanly) shutdown. It is a sign of corruption ? How come this happen if >> shutdowns are clean ? >> >> Thanks (and sorry for the beginners questions), >> >> Martin >> >> Andreas Dilger <[email protected]> wrote on December 4, 2023 5:25 AM: >>> It wasn't clear from your rail which message(s) are you concerned about? >>> These look like normal mount message(s) to me. >>> >>> The "error" is pretty normal, it just means there were multiple services >>> starting at once and one wasn't yet ready for the other. >>> >>> LustreError: 137-5: lustrevm-MDT0000_UUID: not available for >>> connect >>> from 0@lo (no target). If you are running an HA pair check that >>> the target >>> is mounted on the other server. >>> >>> It probably makes sense to quiet this message right at mount time to avoid >>> this. >>> >>> Cheers, Andreas >>> >>>> On Dec 1, 2023, at 10:24, Audet, Martin via lustre-discuss >>>> <[email protected]> wrote: >>>> >>>> >>>> Hello Lustre community, >>>> >>>> Have someone ever seen messages like these on in "/var/log/messages" on a >>>> Lustre server ? >>>> >>>> Dec 1 11:26:30 vlfs kernel: Lustre: Lustre: Build Version: 2.15.4_RC1 >>>> Dec 1 11:26:30 vlfs kernel: LDISKFS-fs (sdd): mounted filesystem with >>>> ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc >>>> Dec 1 11:26:30 vlfs kernel: LDISKFS-fs (sdc): mounted filesystem with >>>> ordered data mode. Opts: errors=remount-ro,no_mbcache,nodelalloc >>>> Dec 1 11:26:30 vlfs kernel: LDISKFS-fs (sdb): mounted filesystem with >>>> ordered data mode. Opts: user_xattr,errors=remount-ro,no_mbcache,nodelalloc >>>> Dec 1 11:26:36 vlfs kernel: LustreError: 137-5: lustrevm-MDT0000_UUID: >>>> not available for connect from 0@lo (no target). If you are running an HA >>>> pair check that the target is mounted on the other server. >>>> Dec 1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: Imperative Recovery >>>> not enabled, recovery window 300-900 >>>> Dec 1 11:26:36 vlfs kernel: Lustre: lustrevm-OST0001: deleting orphan >>>> objects from 0x0:227 to 0x0:513 >>>> >>>> This happens on every boot on a Lustre server named vlfs (a AlmaLinux 8.9 >>>> VM hosted on a VMware) playing the role of both MGS and OSS (it hosts an >>>> MDT two OST using "virtual" disks). We chose LDISKFS and not ZFS. Note >>>> that this happens at every boot, well before the clients (AlmaLinux 9.3 or >>>> 8.9 VMs) connect and even when the clients are powered off. The network >>>> connecting the clients and the server is a "virtual" 10GbE network (of >>>> course there is no virtual IB). Also we had the same messages previously >>>> with Lustre 2.15.3 using an AlmaLinux 8.8 server and AlmaLinux 8.8 / 9.2 >>>> clients (also using VMs). Note also that we compile ourselves the Lustre >>>> RPMs from the sources from the git repository. We also chose to use a >>>> patched kernel. Our build procedure for RPMs seems to work well because >>>> our real cluster run fine on CentOS 7.9 with Lustre 2.12.9 and IB (MOFED) >>>> networking. >>>> >>>> So has anyone seen these messages ? >>>> >>>> Are they problematic ? If yes, how do we avoid them ? >>>> >>>> We would like to make sure our small test system using VMs works well >>>> before we upgrade our real cluster. >>>> >>>> Thanks in advance ! >>>> >>>> Martin Audet >>>> Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Whamcloud
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
