Hi, 

So for the record, any version above v16.2.14+, v17.2.7+ or v18.1.2+ has the 
fix. 

Regards, 
Frédéric. 

----- Le 21 Mar 25, à 18:55, Gregory Farnum <gfar...@redhat.com> a écrit : 



Sounds like the scenario addressed in this PR: 
[ https://github.com/ceph/ceph/pull/47399 | 
https://github.com/ceph/ceph/pull/47399 ] 
The tracker ticket it links indicates it should be fixed in reasonably modern 
point releases, but the PR has a better description of the issue. :) 

So it is presumably an older mds version, and the workload involves deleting 
hard links and then creating new hard links to the same file in short-ish 
succession. 
-Greg 

On Fri, Mar 21, 2025 at 10:33 AM Domhnall McGuigan < [ mailto:dmcgui...@kx.com 
| dmcgui...@kx.com ] > wrote: 

BQ_BEGIN
Hi Frédéric, there were no file quotas in place, no, and it's using the ceph 
kernel client rather than FUSE. I also already saw those links while 
investigating our problems, and neither situation would give a satisfactory 
explanation for why the 500ms retry would make the second attempt at link 
creation succeed. Something evidently changed in that window of time and it 
could not have been the ceph client on the machine or the quota set on the 
target directory. 

Regards, Domhnall 

-----Original Message----- 
From: Frédéric Nass < [ mailto:frederic.n...@univ-lorraine.fr | 
frederic.n...@univ-lorraine.fr ] > 
Sent: 21 March 2025 16:40 
To: Domhnall McGuigan < [ mailto:dmcgui...@kx.com | dmcgui...@kx.com ] > 
Cc: ceph-users < [ mailto:ceph-users@ceph.io | ceph-users@ceph.io ] > 
Subject: Re: [ceph-users] Rogue EXDEV errors when hardlinking 

CAUTION: This email originated from outside of the organisation. Do not click 
links or open attachments unless you recognise the sender and know the content 
is safe. 


Hi Domhnall, 

----- Le 20 Mar 25, à 17:45, Domhnall McGuigan [ mailto:dmcgui...@kx.com | 
dmcgui...@kx.com ] a écrit : 

> Hi all, we've been seeing persistent problems when trying to create 
> hardlinks on cephfs; it's returning EXDEV in a way that makes no sense 
> given typical POSIX behaviour and ceph documentation. Here's a typical strace 
> of the problem: 
> 
> 78 13:47:26.572435 link("/data/db/hdb/data/2023.08.06/table1.0/column1", 
> "/data/db/hdb/data/2023.08.06/table1.1/column1") = -1 EXDEV (Invalid 
> cross-device link) 
> 78 13:47:26.577661 write(1, 
> "{\"time\":\"2025-03-03T13:47:26.577z\",\"component\":\"MSVC\",\"level\":\"INFO\",\"message\":\"[eoi-78]
>  
> Retrying in 500 milliseconds\",\"service\":\"eoi\"}\n", 136) = 136 
> 78 13:47:26.577762 clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, 
> tv_nsec=500000000}, NULL) = 0 
> 78 13:47:27.078037 link("/data/db/hdb/data/2023.08.06/table1.0/column1", 
> "/data/db/hdb/data/2023.08.06/table1.1/column1") = 0 
> 
> We try creating a link, get EXDEV, wait 500 milliseconds, then try the 
> same operation again and it succeeds. The link and its target are both 
> on the same cephfs mount (/data/db/hdb in this case), so the normal 
> POSIX 'linking between filesystems' explanation doesn't apply. I've 
> looked through the ceph client and server code and from what I've seen 
> EXDEV is only returned in a couple of other situations: linking 
> between snapshots, and linking across quotas. Neither snapshots nor 
> quotas were in use here, and if they were the culprit it seems 
> unlikely the automatic retry would have worked. Web searches on EXDEV 
> errors in ceph have also proven to be a dead end. My best guess, 
> although it's not a very good one, is that stale MDS cache data is somehow 
> involved -- in one case the issue reportedly got much worse after increasing 
> (!) the MDS memory limit. 
> 
> This error has been occurring for a particular client for upwards of 9 
> months and has proven stubbornly resistant to reproduction elsewhere 
> (we are working on migrating them to a more recent ceph version to see 
> if the error remains), 

You mean Kernel version right? You're not using ceph-fuse to mount the 
filesystem, are you? 

A quick 'cephfs EXDEV' search points to ceph-fuse and/or quotas [1][2]. Are you 
using any of these? 

Frédéric. 

[1] [ https://ceph-users.ceph.narkive.com/XW20WeeF/cephfs-move-operation | 
https://ceph-users.ceph.narkive.com/XW20WeeF/cephfs-move-operation ] 
[2] [ https://www.spinics.net/lists/ceph-users/msg67823.html | 
https://www.spinics.net/lists/ceph-users/msg67823.html ] 

> so our technical investigations haven't got particularly far. I was 
> hoping someone here on ceph-users would have seen similar EXDEV errors 
> in the wild or in development and have some insight into what could be 
> causing them. 
> 
> Regards, Domhnall 
> ********************************************************************** 
> ********************************************************************** 
> *************************** This email, its contents and any files 
> attached are a confidential communication and are intended only for 
> the named addressees indicated in the message. If you are not the 
> named addressee or if you have received this email in error, you may 
> not, without the consent of KX, copy, use or rely on any information 
> or attachments in any way. Please notify the sender by return email 
> and delete it from your email system. 
> Unless separately agreed, KX does not accept any responsibility for 
> the accuracy or completeness of the contents of this email or its 
> attachments. Please note that any views, opinion or advice contained 
> in this communication are those of the sending individual and not 
> those of KX and KX shall have no liability whatsoever in relation to 
> this communication (or its content) unless separately agreed. 
> ********************************************************************** 
> ********************************************************************** 
> *************************** 
> _______________________________________________ 
> ceph-users mailing list -- [ mailto:ceph-users@ceph.io | ceph-users@ceph.io ] 
> To unsubscribe send an 
> email to [ mailto:ceph-users-le...@ceph.io | ceph-users-le...@ceph.io ] 
***********************************************************************************************************************************************************************
 
This email, its contents and any files attached are a confidential 
communication and are intended only for the named addressees indicated in the 
message. If you are not the named addressee or if you have received this email 
in error, you may not, without the consent of KX, copy, use or rely on any 
information or attachments in any way. Please notify the sender by return email 
and delete it from your email system. 
Unless separately agreed, KX does not accept any responsibility for the 
accuracy or completeness of the contents of this email or its attachments. 
Please note that any views, opinion or advice contained in this communication 
are those of the sending individual and not those of KX and KX shall have no 
liability whatsoever in relation to this communication (or its content) unless 
separately agreed. 
***********************************************************************************************************************************************************************
 
_______________________________________________ 
ceph-users mailing list -- [ mailto:ceph-users@ceph.io | ceph-users@ceph.io ] 
To unsubscribe send an email to [ mailto:ceph-users-le...@ceph.io | 
ceph-users-le...@ceph.io ] 





BQ_END


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to