> Sorry for late. No worries. > The ceph qa teuthology test cases have already one similar test, which > will untar a kernel tarball, but never seen this yet. > > I will try this again tomorrow without the NFS client.
Great. In case you would like to use the archive I sent you a link for, please keep it confidential. It contains files not for publication. I will collect the log information you asked for. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Xiubo Li <xiu...@redhat.com> Sent: Monday, March 27, 2023 4:15 PM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system Frank, Sorry for late. On 24/03/2023 01:56, Frank Schilder wrote: > Hi Xiubo and Gregory, > > sorry for the slow reply, I did some more debugging and didn't have too much > time. First some questions to collecting logs, but please see also below for > reproducing the issue yourselves. > > I can reproduce it reliably but need some input for these: > >> enabling the kclient debug logs and > How do I do that? I thought the kclient ignores the ceph.conf and I'm not > aware of a mount option to this effect. Is there a "ceph config set ..." > setting I can change for a specific client (by host name/IP) and how exactly? > $ echo "module ceph +p" > /sys/kernel/debug/dynamic_debug/control This will enable the debug logs in kernel ceph. Then please provide the message logs. >> also the mds debug logs > I guess here I should set a higher loglevel for the MDS serving this > directory (it is pinned to a single rank) or is it something else? $ ceph daemon mds.X config set debug_mds 25 $ ceph daemon mds.X config set debug_ms 1 > > The issue seems to require a certain load to show up. I created a minimal tar > file mimicking the problem and having 2 directories with a hard link from a > file in the first to a new name in the second directory. This does not cause > any problems, so its not that easy to reproduce. > > How you can reproduce it: > > As an alternative to my limited skills of pulling logs out, I make the > tgz-archive available to you both. You will receive an e-mail from our > one-drive with a download link. If you un-tar the archive on an NFS client > dir that's a re-export of a kclient mount, after some time you should see the > errors showing up. > > I can reliably reproduce these errors on our production- as well as on our > test cluster. You should be able to reproduce it too with the tgz file. > > Here is a result on our set-up: > > - production cluster (executed in a sub-dir conda to make cleanup easy): > > $ time tar -xzf ../conda.tgz > tar: mambaforge/pkgs/libstdcxx-ng-9.3.0-h6de172a_18/lib/libstdc++.so.6.0.28: > Cannot hard link to ‘envs/satwindspy/lib/libstdc++.so.6.0.28’: Read-only file > system > [...] > tar: mambaforge/pkgs/boost-cpp-1.72.0-h9d3c048_4/lib/libboost_log.so.1.72.0: > Cannot hard link to ‘envs/satwindspy/lib/libboost_log.so.1.72.0’: Read-only > file system > ^C > > real 1m29.008s > user 0m0.612s > sys 0m6.870s > > By this time there are already hard links created, so it doesn't fail right > away: > $ find -type f -links +1 > ./mambaforge/pkgs/libev-4.33-h516909a_1/share/man/man3/ev.3 > ./mambaforge/pkgs/libev-4.33-h516909a_1/include/ev++.h > ./mambaforge/pkgs/libev-4.33-h516909a_1/include/ev.h > ... > > - test cluster (octopus latest stable, 3 OSD hosts with 3 HDD OSDs each, > simple ceph-fs): > > # ceph fs status > fs - 2 clients > == > RANK STATE MDS ACTIVITY DNS INOS > 0 active tceph-02 Reqs: 0 /s 1807k 1739k > POOL TYPE USED AVAIL > fs-meta1 metadata 18.3G 156G > fs-meta2 data 0 156G > fs-data data 1604G 312G > STANDBY MDS > tceph-01 > tceph-03 > MDS version: ceph version 15.2.17 (8a82819d84cf884bd39c17e3236e0632ac146dc4) > octopus (stable) > > Its the new recommended 3-pool layout with fs-data being a 4+2 EC pool. > > $ time tar -xzf / ... /conda.tgz > tar: mambaforge/ssl/cacert.pem: Cannot hard link to > ‘envs/satwindspy/ssl/cacert.pem’: Read-only file system > [...] > tar: mambaforge/lib/engines-1.1/padlock.so: Cannot hard link to > ‘envs/satwindspy/lib/engines-1.1/padlock.so’: Read-only file system > ^C > > real 6m23.522s > user 0m3.477s > sys 0m25.792s > > Same story here, a large number of hard links has already been created before > it starts failing: > > $ find -type f -links +1 > ./mambaforge/lib/liblzo2.so.2.0.0 > ... > > Looking at the output of find in both cases it also looks a bit > non-deterministic when it starts failing. > > It would be great if you can reproduce the issue on a similar test setup > using the archive conda.tgz. If not, I'm happy to collect any type of logs on > our test cluster. > > We have now one user who has problems with rsync to an NFS share and it would > be really appreciated if this could be sorted. The ceph qa teuthology test cases have already one similar test, which will untar a kernel tarball, but never seen this yet. I will try this again tomorrow without the NFS client. Thanks - Xiubo > Thanks for your help and best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Xiubo Li <xiu...@redhat.com> > Sent: Thursday, March 23, 2023 2:41 AM > To: Frank Schilder; Gregory Farnum > Cc: ceph-users@ceph.io > Subject: Re: [ceph-users] Re: ln: failed to create hard link 'file name': > Read-only file system > > Hi Frank, > > Could you reproduce it again by enabling the kclient debug logs and also > the mds debug logs ? > > I need to know what exactly has happened in kclient and mds side. > Locally I couldn't reproduce it. > > Thanks > > - Xiubo > > On 22/03/2023 23:27, Frank Schilder wrote: >> Hi Gregory, >> >> thanks for your reply. First a quick update. Here is how I get ln to work >> after it failed, there seems no timeout: >> >> $ ln envs/satwindspy/include/ffi.h >> mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h >> ln: failed to create hard link >> 'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h': Read-only file system >> $ ls -l envs/satwindspy/include mambaforge/pkgs/libffi-3.3-h58526e2_2 >> envs/satwindspy/include: >> total 7664 >> -rw-rw-r--. 1 rit rit 959 Mar 5 2021 ares_build.h >> [...] >> $ ln envs/satwindspy/include/ffi.h >> mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h >> >> After an ls -l on both directories ln works. >> >> To the question: How can I pull out a log from the nfs server? There is >> nothing in /var/log/messages. >> >> I can't reproduce it with simple commands on the NFS client. It seems to >> occur only when a large number of files/dirs is created. I can make the >> archive available to you if this helps. >> >> Best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> >> ________________________________________ >> From: Gregory Farnum <gfar...@redhat.com> >> Sent: Wednesday, March 22, 2023 4:14 PM >> To: Frank Schilder >> Cc: ceph-users@ceph.io >> Subject: Re: [ceph-users] Re: ln: failed to create hard link 'file name': >> Read-only file system >> >> Do you have logs of what the nfs server is doing? >> Managed to reproduce it in terms of direct CephFS ops? >> >> >> On Wed, Mar 22, 2023 at 8:05 AM Frank Schilder >> <fr...@dtu.dk<mailto:fr...@dtu.dk>> wrote: >> I have to correct myself. It also fails on an export with "sync" mode. Here >> is an strace on the client (strace ln envs/satwindspy/include/ffi.h >> mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h): >> >> [...] >> stat("mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h", 0x7ffdc5c32820) >> = -1 ENOENT (No such file or directory) >> lstat("envs/satwindspy/include/ffi.h", {st_mode=S_IFREG|0664, st_size=13934, >> ...}) = 0 >> linkat(AT_FDCWD, "envs/satwindspy/include/ffi.h", AT_FDCWD, >> "mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h", 0) = -1 EROFS >> (Read-only file system) >> [...] >> write(2, "ln: ", 4ln: ) = 4 >> write(2, "failed to create hard link 'mamb"..., 80failed to create hard link >> 'mambaforge/pkgs/libffi-3.3-h58526e2_2/include/ffi.h') = 80 >> [...] >> write(2, ": Read-only file system", 23: Read-only file system) = 23 >> write(2, "\n", 1 >> ) = 1 >> lseek(0, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) >> close(0) = 0 >> close(1) = 0 >> close(2) = 0 >> exit_group(1) = ? >> +++ exited with 1 +++ >> >> Has anyone advice? >> >> Thanks! >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> >> ________________________________________ >> From: Frank Schilder <fr...@dtu.dk<mailto:fr...@dtu.dk>> >> Sent: Wednesday, March 22, 2023 2:44 PM >> To: ceph-users@ceph.io<mailto:ceph-users@ceph.io> >> Subject: [ceph-users] ln: failed to create hard link 'file name': Read-only >> file system >> >> Hi all, >> >> on an NFS re-export of a ceph-fs (kernel client) I observe a very strange >> error. I'm un-taring a larger package (1.2G) and after some time I get these >> errors: >> >> ln: failed to create hard link 'file name': Read-only file system >> >> The strange thing is that this seems only temporary. When I used "ln src >> dst" for manual testing, the command failed as above. However, after that I >> tried "ln -v src dst" and this command created the hard link with exactly >> the same path arguments. During the period when the error occurs, I can't >> see any FS in read-only mode, neither on the NFS client nor the NFS server. >> Funny thing is that file creation and write still works, its only the >> hard-link creation that fails. >> >> For details, the set-up is: >> >> file-server: mount ceph-fs at /shares/path, export /shares/path as nfs4 to >> other server >> other server: mount /shares/path as NFS >> >> More precisely, on the file-server: >> >> fstab: MON-IPs:/shares/folder /shares/nfs/folder ceph >> defaults,noshare,name=NAME,secretfile=sec.file,mds_namespace=FS-NAME,_netdev >> 0 0 >> exports: /shares/nfs/folder >> -no_root_squash,rw,async,mountpoint,no_subtree_check DEST-IP >> >> On the host at DEST-IP: >> >> fstab: FILE-SERVER-IP:/shares/nfs/folder /mnt/folder nfs defaults,_netdev 0 0 >> >> Both, the file server and the client server are virtual machines. The file >> server is on Centos 8 stream (4.18.0-338.el8.x86_64) and the client machine >> is on AlmaLinux 8 (4.18.0-425.13.1.el8_7.x86_64). >> >> When I change the NFS export from "async" to "sync" everything works. >> However, that's a rather bad workaround and not a solution. Although this >> looks like an NFS issue, I'm afraid it is a problem with hard links and >> ceph-fs. It looks like a race with scheduling and executing operations on >> the ceph-fs kernel mount. >> >> Has anyone seen something like that? >> >> Thanks and best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io> >> To unsubscribe send an email to >> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io> >> To unsubscribe send an email to >> ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > -- > Best Regards, > > Xiubo Li (李秀波) > > Email: xiu...@redhat.com/xiu...@ibm.com > Slack: @Xiubo Li > -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io