[ceph-users] Re: Cephfs mds node already exists crashes mds

2024-08-20 Thread Xiubo Li
This looks the same with https://tracker.ceph.com/issues/52280, which has already been fixed. I just checked your ceph version, which has already included the fixing. So this should be a new case. BTW, how did this happen ?Were you doing the failover or something else ? Thanks - Xiubo On 8

[ceph-users] Re: [EXTERN] Re: Urgent help with degraded filesystem needed

2024-07-01 Thread Xiubo Li
On 6/26/24 14:08, Dietmar Rieder wrote: ...sending also to the list and Xiubo (were accidentally removed from recipients)... On 6/25/24 21:28, Dietmar Rieder wrote: Hi Patrick,  Xiubo and List, finally we managed to get the filesystem repaired and running again! YEAH, I'm so happy!! Big t

[ceph-users] Re: [EXTERN] Re: Urgent help with degraded filesystem needed

2024-06-19 Thread Xiubo Li
On 6/19/24 16:13, Dietmar Rieder wrote: Hi Xiubo, [...] 0> 2024-06-19T07:12:39.236+ 7f90fa912700 -1 *** Caught signal (Aborted) **  in thread 7f90fa912700 thread_name:md_log_replay  ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)  1: /lib64/libpthre

[ceph-users] Re: Urgent help with degraded filesystem needed

2024-06-19 Thread Xiubo Li
Hi Dietmar, On 6/19/24 15:43, Dietmar Rieder wrote: Hello cephers, we have a degraded filesystem on our ceph 18.2.2 cluster and I'd need to get it up again. We have 6 MDS daemons and (3 active, each pinned to a subtree, 3 standby) It started this night, I got the first HEALTH_WARN emails sa

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread Xiubo Li
. - Xiubo Kind regards, Sake Op 04-06-2024 04:04 CEST schreef Xiubo Li : Hi Nicolas, This is a known issue and Venky is working on it, please see https://tracker.ceph.com/issues/63259. Thanks - Xiubo On 6/3/24 20:04, nbarb...@deltaonline.net wrote: Hello, First of all, thanks for reading

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-03 Thread Xiubo Li
Hi Nicolas, This is a known issue and Venky is working on it, please see https://tracker.ceph.com/issues/63259. Thanks - Xiubo On 6/3/24 20:04, nbarb...@deltaonline.net wrote: Hello, First of all, thanks for reading my message. I set up a Ceph version 18.2.2 cluster with 4 nodes, everythin

[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-08 Thread Xiubo Li
On 5/8/24 17:36, Dejan Lesjak wrote: Hi Xiubo, On 8. 05. 24 09:53, Xiubo Li wrote: Hi Dejan, This is a known issue and please see https://tracker.ceph.com/issues/61009. For the workaround please see https://tracker.ceph.com/issues/61009#note-26. Thank you for the links. Unfortunately

[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-08 Thread Xiubo Li
Hi Dejan, This is a known issue and please see https://tracker.ceph.com/issues/61009. For the workaround please see https://tracker.ceph.com/issues/61009#note-26. Thanks - Xiubo On 5/8/24 06:49, Dejan Lesjak wrote: Hello, We have cephfs with two active MDS. Currently rank 1 is repeatedly cr

[ceph-users] Re: MDS 17.2.7 crashes at rejoin

2024-05-06 Thread Xiubo Li
Possibly, because we have seen this only in ceph 17. And if you could reproduce it then please provide the mds debug logs, after this we can quickly find the root cause of it. Thanks - Xiubo On 5/7/24 12:19, Robert Sander wrote: Hi, would an update to 18.2 help? Regards

[ceph-users] Re: MDS crashes shortly after starting

2024-05-06 Thread Xiubo Li
The same issue with https://tracker.ceph.com/issues/60986 and as Robert Sander reported. On 5/6/24 05:11, E Taka wrote: Hi all, we have a serious problem with CephFS. A few days ago, the CephFS file systems became inaccessible, with the message MDS_DAMAGE: 1 mds daemon damaged The cephfs-jour

[ceph-users] Re: MDS 17.2.7 crashes at rejoin

2024-05-06 Thread Xiubo Li
This is a known issue, please see https://tracker.ceph.com/issues/60986. If you could reproduce it then please enable the mds debug logs and this could help debug it fast: debug_mds = 25 debug_ms = 1 Thanks - Xiubo On 5/7/24 00:26, Robert Sander wrote: Hi, a 17.2.7 cluster with two fil

[ceph-users] Re: MDS Behind on Trimming...

2024-04-29 Thread Xiubo Li
t the test pacakges from https://shaman.ceph.com/builds/ceph/. But this need to trigger a build first. -erich On 4/21/24 9:39 PM, Xiubo Li wrote: Hi Erich, I raised one tracker for this https://tracker.ceph.com/issues/65607. Currently I haven't figured out where was holding the 'dn

[ceph-users] Re: Question about PR merge

2024-04-22 Thread Xiubo Li
the issue we provided logs on the same as Erich or is that a third different locking issue? thanks, nigel. On Thu, 18 Apr 2024 at 12:29, Xiubo Li wrote: On 4/18/24 08:57, Erich Weiler wrote: >> Have you already shared information about this issue? Please do if not. >

[ceph-users] Re: MDS Behind on Trimming...

2024-04-21 Thread Xiubo Li
overloaded? Can a single server hold multiple MDS daemons?  Right now I have three physical servers each with one MDS daemon on it. I can still try reducing to one.  And I'll keep an eye on blocked ops to see if any get to a very old age (and are thus wedged). -erich On 4/18/24 8:55 PM,

[ceph-users] Re: MDS crash

2024-04-21 Thread Xiubo Li
Hi Alexey, This looks a new issue for me. Please create a tracker for it and provide the detail call trace there. Thanks - Xiubo On 4/19/24 05:42, alexey.gerasi...@opencascade.com wrote: Dear colleagues, hope that anybody can help us. The initial point: Ceph cluster v15.2 (installed and c

[ceph-users] Re: Question about PR merge

2024-04-17 Thread Xiubo Li
ug logs to confirm it. Thanks - Xiubo On 4/18/24 14:22, Nigel Williams wrote: Hi Xiubo, Is the issue we provided logs on the same as Erich or is that a third different locking issue? thanks, nigel. On Thu, 18 Apr 2024 at 12:29, Xiubo Li wrote: On 4/18/24 08:57, Erich Weiler wrote: &

[ceph-users] Re: Client kernel crashes on cephfs access

2024-04-17 Thread Xiubo Li
Hi Konstantin, We have fixed it, please see https://patchwork.kernel.org/project/ceph-devel/list/?series=842682&archive=both. - Xiubo On 4/18/24 00:05, Konstantin Shalygin wrote: Hi, On 9 Apr 2024, at 04:07, Xiubo Li wrote: Thanks for reporting this, I generated one patch to fi

[ceph-users] Re: Question about PR merge

2024-04-17 Thread Xiubo Li
On 4/18/24 08:57, Erich Weiler wrote: Have you already shared information about this issue? Please do if not. I am working with Xiubo Li and providing debugging information - in progress! From the blocked ops output it very similiar the same issue as Patrick's lock order fixed before

[ceph-users] Re: MDS Behind on Trimming...

2024-04-14 Thread Xiubo Li
Hi Erich, Two things I need to make them to be clear: 1, Since there has no debug log so I am not very sure my fixing PR will 100% fix this. 2, It will take time to get this PR to be merged in upstream. So I couldn't tell exactly when this PR will be backported to downstream and then be rel

[ceph-users] Re: MDS Behind on Trimming...

2024-04-09 Thread Xiubo Li
On 4/10/24 11:48, Erich Weiler wrote: Dos that mean it could be the locker order bug (https://tracker.ceph.com/issues/62123) as Xiubo suggested? I have raised one PR to fix the lock order issue, if possible please have a try to see could it resolve this issue. Thank you!  Yeah, this issue i

[ceph-users] Re: MDS Behind on Trimming...

2024-04-09 Thread Xiubo Li
On 4/8/24 12:32, Erich Weiler wrote: Ah, I see. Yes, we are already running version 18.2.1 on the server side (we just installed this cluster a few weeks ago from scratch). So I guess if the fix has already been backported to that version, then we still have a problem. Dos that mean it coul

[ceph-users] Re: Client kernel crashes on cephfs access

2024-04-08 Thread Xiubo Li
Hi Marc, Thanks for reporting this, I generated one patch to fix it. Will send it out after testing is done. - Xiubo On 4/8/24 16:01, Marc Ruhmann wrote: Hi everyone, I would like to ask for help regarding client kernel crashes that happen on cephfs access. We have been struggling with this

[ceph-users] Re: MDS Behind on Trimming...

2024-04-08 Thread Xiubo Li
On 4/8/24 12:32, Erich Weiler wrote: Ah, I see. Yes, we are already running version 18.2.1 on the server side (we just installed this cluster a few weeks ago from scratch). So I guess if the fix has already been backported to that version, then we still have a problem. Dos that mean it coul

[ceph-users] Re: MDS Behind on Trimming...

2024-04-07 Thread Xiubo Li
Hi Erich, Thanks for your logs, and it should be the same issue with https://tracker.ceph.com/issues/62052, could you try to test with this fix again ? Please let me know if you still could see this bug then it should be the locker order bug as https://tracker.ceph.com/issues/62123. Thanks

[ceph-users] Re: MDS Behind on Trimming...

2024-03-28 Thread Xiubo Li
, caller_gid=600{600,608,999,}) currently joining batch getattr Can we tell which client the slow requests are coming from?  It says stuff like "client.99445:4189994" but I don't know how to map that to a client... Thanks for the response! -erich On 3/27/24 21:28, Xiubo Li wrot

[ceph-users] Re: MDS Behind on Trimming...

2024-03-27 Thread Xiubo Li
On 3/28/24 04:03, Erich Weiler wrote: Hi All, I've been battling this for a while and I'm not sure where to go from here.  I have a Ceph health warning as such: # ceph -s   cluster:     id: 58bde08a-d7ed-11ee-9098-506b4b4da440     health: HEALTH_WARN     1 MDSs report slow reques

[ceph-users] Re: Ceph & iSCSI

2024-02-26 Thread Xiubo Li
Hi Michael, Please see the previous threads about the same question: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/GDJJL7VSDUJITPM3JV7RCVXVOIQO2CAN/ https://www.spinics.net/lists/ceph-users/msg73969.html Thanks - Xiubo On 2/27/24 11:22, Michael Worsham wrote: I was readin

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-17 Thread Xiubo Li
On 1/13/24 07:02, Özkan Göksu wrote: Hello. I have 5 node ceph cluster and I'm constantly having "clients failing to respond to cache pressure" warning. I have 84 cephfs kernel clients (servers) and my users are accessing their personal subvolumes located on one pool. My users are software d

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-17 Thread Xiubo Li
On 1/17/24 15:57, Eugen Block wrote: Hi, this is not an easy topic and there is no formula that can be applied to all clusters. From my experience, it is exactly how the discussion went in the thread you mentioned, trial & error. Looking at your session ls output, this reminds of a debug sess

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-12-11 Thread Xiubo Li
ings up. I will update you, please keep the tracker open. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Xiubo Li Sent: Tuesday, December 5, 2023 1:58 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph.io Sub

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-12-04 Thread Xiubo Li
7日 13:19,Xiubo Li 写道: On 11/27/23 13:12, zxcs wrote: current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed time(1 hour). we also have a question about how to set no balance for multi active mds. means, we will enable multi active mds(to improve throughput) and no balance

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-12-04 Thread Xiubo Li
se include the part executed on the second host explicitly in an ssh-command. Running your scripts alone in their current form will not reproduce the issue. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________ From: Xiubo Li Se

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-12-03 Thread Xiubo Li
updating in the tracker and I will try it. Thanks - Xiubo Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Xiubo Li Sent: Monday, November 27, 2023 3:59 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-26 Thread Xiubo Li
ion environment. thanks and regards Xiubo Li 于2023年11月23日周四 15:47写道: On 11/23/23 11:25, zxcs wrote: Thanks a ton, Xiubo! it not disappear. even we umount the ceph directory on these two old os node. after dump ops flight , we can see some request, and the earliest complain “failed to authpin, s

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-26 Thread Xiubo Li
m S14 From: Xiubo Li Sent: Thursday, November 23, 2023 3:47 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent I just raised one tracker to follow this: https://tracker.ceph.com/issues/63510 Thanks -

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Xiubo Li
19:44,Xiubo Li 写道: On 11/22/23 16:02, zxcs wrote: HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operati

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-22 Thread Xiubo Li
kclient. Thanks! Will wait for further instructions. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Friday, November 10, 2023 3:14 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph.io Subject: Re: [ceph-users

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-22 Thread Xiubo Li
. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Friday, November 10, 2023 3:14 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent On 11/10/23 00

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Xiubo Li
On 11/22/23 16:02, zxcs wrote: HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same pa

[ceph-users] Re: Different behaviors for ceph kernel client in limiting IOPS when data pool enters `nearfull`?

2023-11-17 Thread Xiubo Li
On 11/17/23 00:41, Ilya Dryomov wrote: On Thu, Nov 16, 2023 at 5:26 PM Matt Larson wrote: Ilya, Thank you for providing these discussion threads on the Kernel fixes for where there was a change and details on this affects the clients. What is the expected behavior in CephFS client when

[ceph-users] Re: Different behaviors for ceph kernel client in limiting IOPS when data pool enters `nearfull`?

2023-11-16 Thread Xiubo Li
On 11/16/23 22:39, Ilya Dryomov wrote: On Thu, Nov 16, 2023 at 3:21 AM Xiubo Li wrote: Hi Matt, On 11/15/23 02:40, Matt Larson wrote: On CentOS 7 systems with the CephFS kernel client, if the data pool has a `nearfull` status there is a slight reduction in write speeds (possibly 20-50

[ceph-users] Re: Different behaviors for ceph kernel client in limiting IOPS when data pool enters `nearfull`?

2023-11-15 Thread Xiubo Li
Hi Matt, On 11/15/23 02:40, Matt Larson wrote: On CentOS 7 systems with the CephFS kernel client, if the data pool has a `nearfull` status there is a slight reduction in write speeds (possibly 20-50% fewer IOPS). On a similar Rocky 8 system with the CephFS kernel client, if the data pool has `n

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-09 Thread Xiubo Li
On 11/10/23 00:18, Frank Schilder wrote: Hi Xiubo, I will try to answer questions from all your 3 e-mails here together with some new information we have. New: The problem occurs in newer python versions when using the shutil.copy function. There is also a function shutil.copy2 for which the

[ceph-users] Re: MDS stuck in rejoin

2023-11-09 Thread Xiubo Li
luded in the last Pacific point release. Yeah, this will be backport after it getting merged. But for kclient we still need another patch. Thanks - Xiubo Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiu

[ceph-users] Re: MDS stuck in rejoin

2023-11-07 Thread Xiubo Li
eagerly waiting for this and another one. Any idea when they might show up in distro kernels? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Xiubo Li Sent: Tuesday, August 8, 2023 2:57 AM To: Frank Schil

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-06 Thread Xiubo Li
bug mds = 25 debug ms = 1 Thanks - Xiubo Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Thursday, November 2, 2023 12:12 PM To: Gregory Farnum; Xiubo Li Cc: ceph-users@ceph.io Subject: [ce

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-06 Thread Xiubo Li
On 11/1/23 23:57, Gregory Farnum wrote: We have seen issues like this a few times and they have all been kernel client bugs with CephFS’ internal “capability” file locking protocol. I’m not aware of any extant bugs like this in our code base, but kernel patches can take a long and winding path

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-06 Thread Xiubo Li
On 11/1/23 22:14, Frank Schilder wrote: Dear fellow cephers, today we observed a somewhat worrisome inconsistency on our ceph fs. A file created on one host showed up as 0 length on all other hosts: [user1@host1 h2lib]$ ls -lh total 37M -rw-rw 1 user1 user1 12K Nov 1 11:59 dll_wrapper.p

[ceph-users] Re: 6.5 CephFS client - ceph_cap_reclaim_work [ceph] / ceph_con_workfn [libceph] hogged CPU

2023-09-13 Thread Xiubo Li
On 9/13/23 20:58, Ilya Dryomov wrote: On Wed, Sep 13, 2023 at 9:20 AM Stefan Kooman wrote: Hi, Since the 6.5 kernel addressed the issue with regards to regression in the readahead handling code... we went ahead and installed this kernel for a couple of mail / web clusters (Ubuntu 6.5.1-060501

[ceph-users] Re: MDS stuck in rejoin

2023-08-07 Thread Xiubo Li
______ From: Xiubo Li Sent: Monday, July 31, 2023 12:14 PM To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS stuck in rejoin On 7/31/23 16:50, Frank Schilder wrote: Hi Xiubo, its a kernel client. I actually made a mistake when trying to evict the client and my co

[ceph-users] Re: MDS stuck in rejoin

2023-07-31 Thread Xiubo Li
for the dmesg logs from the client node. Yeah, after the client's sessions are closed the corresponding warning should be cleared. Thanks Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Xiubo Li Sent: Mo

[ceph-users] Re: MDS stuck in rejoin

2023-07-30 Thread Xiubo Li
#x27;s okay till now. Thanks - Xiubo Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________ From: Xiubo Li Sent: Friday, July 28, 2023 11:37 AM To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS stuck

[ceph-users] Re: MDS stuck in rejoin

2023-07-28 Thread Xiubo Li
On 7/26/23 22:13, Frank Schilder wrote: Hi Xiubo. ... I am more interested in the kclient side logs. Just want to know why that oldest request got stuck so long. I'm afraid I'm a bad admin in this case. I don't have logs from the host any more, I would have needed the output of dmesg and thi

[ceph-users] Re: MDS stuck in rejoin

2023-07-25 Thread Xiubo Li
ecoverable state. I did not observe unusual RAM consumption and there were no MDS large cache messages either. Seems like our situation was of a more harmless nature. Still, the fail did not go entirely smooth. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 __

[ceph-users] Re: mds terminated

2023-07-24 Thread Xiubo Li
On 7/20/23 11:36, dxo...@naver.com wrote: This issue has been closed. If any rook-ceph users see this, when mds replay takes a long time, look at the logs in mds pod. If it's going well and then abruptly terminates, try describing the mds pod, and if liveness probe terminated, try increasing

[ceph-users] Re: MDS stuck in rejoin

2023-07-21 Thread Xiubo Li
On 7/20/23 22:09, Frank Schilder wrote: Hi all, we had a client with the warning "[WRN] MDS_CLIENT_OLDEST_TID: 1 clients failing to advance oldest client/flush tid". I looked at the client and there was nothing going on, so I rebooted it. After the client was back, the message was still there

[ceph-users] Re: MDS stuck in rejoin

2023-07-20 Thread Xiubo Li
On 7/20/23 22:09, Frank Schilder wrote: Hi all, we had a client with the warning "[WRN] MDS_CLIENT_OLDEST_TID: 1 clients failing to advance oldest client/flush tid". I looked at the client and there was nothing going on, so I rebooted it. After the client was back, the message was still there

[ceph-users] Re: ceph fs perf stats output is empty

2023-06-11 Thread Xiubo Li
On 6/10/23 05:35, Denis Polom wrote: Hi I'm running latest Ceph Pacific 16.2.13 with Cephfs. I need to collect performance stats per client, but getting empty list without any numbers I even run dd on client against mounted ceph fs, but output is only like this: #> ceph fs perf stats 0 4

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-06-05 Thread Xiubo Li
Raised one PR to fix this, please see https://github.com/ceph/ceph/pull/51931. Thanks - Xiubo On 5/24/23 23:52, Sandip Divekar wrote: Hi Team, I'm writing to bring to your attention an issue we have encountered with the "mtime" (modification time) behavior for directories in the Ceph filesy

[ceph-users] Re: Unexpected behavior of directory mtime after being set explicitly

2023-06-04 Thread Xiubo Li
Yeah, it's a bug. I have raised on ceph tracker to follow this: https://tracker.ceph.com/issues/61584 And I have found the root cause, more detail please see my comments on the above tracker. I am still going through the code to find one way to fix it. Thanks - Xiubo On 6/5/23 13:42, San

[ceph-users] Re: Ceph iscsi gateway semi deprecation warning?

2023-05-28 Thread Xiubo Li
On 5/24/23 12:23, Mark Kirkwood wrote: I am looking at using an iscsi gateway in front of a ceph setup. However the warning in the docs is concerning: The iSCSI gateway is in maintenance as of November 2022. This means that it is no longer in active development and will not be updated to ad

[ceph-users] Re: mds dump inode crashes file system

2023-05-16 Thread Xiubo Li
On 5/16/23 21:55, Gregory Farnum wrote: On Fri, May 12, 2023 at 5:28 AM Frank Schilder wrote: Dear Xiubo and others. I have never heard about that option until now. How do I check that and how to I disable it if necessary? I'm in meetings pretty much all day and will try to send some more i

[ceph-users] Re: mds dump inode crashes file system

2023-05-16 Thread Xiubo Li
the mds dump inode command. Thanks a lot and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Thursday, May 11, 2023 12:26 PM To: Xiubo Li; ceph-users@ceph.io Subject: [ceph-users] Re: mds dump inode cr

[ceph-users] Re: mds dump inode crashes file system

2023-05-16 Thread Xiubo Li
__ From: Frank Schilder Sent: Monday, May 15, 2023 6:33 PM To: Xiubo Li; ceph-users@ceph.io Subject: [ceph-users] Re: mds dump inode crashes file system Dear Xiubo, I uploaded the cache dump, the MDS log and the dmesg log containing the snaptrace dump to ceph-post-file: 763955a3-7d37-408a-

[ceph-users] Re: mds dump inode crashes file system

2023-05-16 Thread Xiubo Li
Thanks Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________ From: Xiubo Li Sent: Friday, May 12, 2023 3:44 PM To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: mds dump inode crashes file system On 5/12

[ceph-users] Re: mds dump inode crashes file system

2023-05-12 Thread Xiubo Li
how to fix it. Firstly we need to know where the corrupted metadata is. I think the mds debug logs and the above corrupted snaptrace could help. Need to parse that corrupted binary data. Thanks Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___

[ceph-users] Re: mds dump inode crashes file system

2023-05-11 Thread Xiubo Li
the metadata in cephfs. Thanks Thanks a lot and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Frank Schilder Sent: Thursday, May 11, 2023 12:26 PM To: Xiubo Li;ceph-users@ceph.io Subject: [ceph-users] Re: mds d

[ceph-users] Re: mds dump inode crashes file system

2023-05-11 Thread Xiubo Li
by reading the ceph and kceph code. In theory you can reproduce this by making the directory to do the migrating during stress IOs. Thanks Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________ From: Xiubo Li Sent: Thursda

[ceph-users] Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2023-05-11 Thread Xiubo Li
this. Thanks It would be great to have light-weight tools available to rectify such simple conditions in an as non-disruptive as possible way. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li

[ceph-users] Re: mds dump inode crashes file system

2023-05-10 Thread Xiubo Li
Hey Frank, On 5/10/23 21:44, Frank Schilder wrote: The kernel message that shows up on boot on the file server in text format: May 10 13:56:59 rit-pfile01 kernel: WARNING: CPU: 3 PID: 34 at fs/ceph/caps.c:689 ceph_add_cap+0x53e/0x550 [ceph] May 10 13:56:59 rit-pfile01 kernel: Modules linked in

[ceph-users] Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2023-05-09 Thread Xiubo Li
From: Xiubo Li Sent: Friday, May 5, 2023 2:40 AM To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc On 5/1/23 17:35, Frank Schilder wrote: Hi all, I think we might be hitting a

[ceph-users] Re: Unable to restart mds - mds crashes almost immediately after finishing recovery

2023-05-04 Thread Xiubo Li
Hi Emmanuel, This should be one known issue as https://tracker.ceph.com/issues/58392 and there is one fix in https://github.com/ceph/ceph/pull/49652. Could you just stop all the clients first and then set the 'max_mds' to 1 and then restart the MDS daemons ? Thanks On 5/3/23 16:01, Emmanue

[ceph-users] Re: client isn't responding to mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2023-05-04 Thread Xiubo Li
On 5/1/23 17:35, Frank Schilder wrote: Hi all, I think we might be hitting a known problem (https://tracker.ceph.com/issues/57244). I don't want to fail the mds yet, because we have troubles with older kclients that miss the mds restart and hold on to cache entries referring to the killed in

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-11 Thread Xiubo Li
On 4/11/23 15:59, Thomas Widhalm wrote: On 11.04.23 09:16, Xiubo Li wrote: On 4/11/23 03:24, Thomas Widhalm wrote: Hi, If you remember, I hit bug https://tracker.ceph.com/issues/58489 so I was very relieved when 17.2.6 was released and started to update immediately. Please note, this

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-11 Thread Xiubo Li
On 4/11/23 03:24, Thomas Widhalm wrote: Hi, If you remember, I hit bug https://tracker.ceph.com/issues/58489 so I was very relieved when 17.2.6 was released and started to update immediately. Please note, this fix is not in the v17.2.6 yet in upstream code. Thanks - Xiubo But now I'm s

[ceph-users] Re: Why is my cephfs almostfull?

2023-04-09 Thread Xiubo Li
Hi Jorge, On 4/6/23 07:09, Jorge Garcia wrote: We have a ceph cluster with a cephfs filesystem that we use mostly for backups. When I do a "ceph -s" or a "ceph df", it reports lots of space:     data:   pools:   3 pools, 4104 pgs   objects: 1.09 G objects, 944 TiB   usage:   1.5 Pi

[ceph-users] Re: Read and write performance on distributed filesystem

2023-04-04 Thread Xiubo Li
On 4/4/23 07:59, David Cunningham wrote: Hello, We are considering CephFS as an alternative to GlusterFS, and have some questions about performance. Is anyone able to advise us please? This would be for file systems between 100GB and 2TB in size, average file size around 5MB, and a mixture of

[ceph-users] Re: CephFS thrashing through the page cache

2023-04-04 Thread Xiubo Li
mance issue. Would be great if this becomes part of a test suite. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Ashu Pachauri Sent: 17 March 2023 09:55:25 To: Xiubo Li Cc:

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-27 Thread Xiubo Li
=== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io> To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io&

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-27 Thread Xiubo Li
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Monday, March 27, 2023 5:22 PM To: Xiubo Li; Gregory Farnum Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: ln: failed to create hard link 'file name&#x

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-27 Thread Xiubo Li
ar test, which will untar a kernel tarball, but never seen this yet. I will try this again tomorrow without the NFS client. Thanks - Xiubo Thanks for your help and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ Fro

[ceph-users] Re: MDS host in OSD blacklist

2023-03-22 Thread Xiubo Li
y and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Xiubo Li Sent: 22 March 2023 07:27:08 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] MDS host in OSD blacklist Hi Frank, This should be the

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-22 Thread Xiubo Li
h.io> To unsubscribe send an email to ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io> ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS host in OSD blacklist

2023-03-21 Thread Xiubo Li
mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CephFS thrashing through the page cache

2023-03-17 Thread Xiubo Li
is should work for old kernels before the ceph_netfs_expand_readahead() being introduced. I will improve it next week. Thanks for your reporting about this. Thanks - Xiubo Thanks and Regards, Ashu Pachauri On Fri, Mar 17, 2023 at 2:14 PM Xiubo Li wrote: On 15/03/2023 17:20, Frank Schilder wrote: >

[ceph-users] Re: CephFS thrashing through the page cache

2023-03-17 Thread Xiubo Li
.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@i

[ceph-users] Re: CephFS thrashing through the page cache

2023-03-16 Thread Xiubo Li
-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CephFS thrashing through the page cache

2023-03-16 Thread Xiubo Li
ph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: libceph: mds1 IP+PORT wrong peer at address

2023-03-13 Thread Xiubo Li
gards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: 13 March 2023 01:44:49 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] libceph: mds1 IP+PORT wrong peer at address Hi Frank, BTW, what&#

[ceph-users] Re: libceph: mds1 IP+PORT wrong peer at address

2023-03-12 Thread Xiubo Li
__ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___ ceph-users mailing list -- ceph-users@ceph.io T

[ceph-users] Re: Creating a role for quota management

2023-03-06 Thread Xiubo Li
Hi Maybe you can use the rule of CEPHFS CLIENT CAPABILITIES and only enable the 'p' permission for some users, which will allow them to SET_VXATTR. I didn't find the similar cap from the OSD CAPBILITIES. Thanks On 07/03/2023 00:33, anantha.ad...@intel.com wrote: Hello, Can you provide deta

[ceph-users] Re: MDS stuck in "up:replay"

2023-02-22 Thread Xiubo Li
ng hopefully useful debug logs. Not intended to fix the problem for you. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...

[ceph-users] Re: kernel client osdc ops stuck and mds slow reqs

2023-02-20 Thread Xiubo Li
On 20/02/2023 22:28, Kuhring, Mathias wrote: Hey Dan, hey Ilya I know this issue is two years old already, but we are having similar issues. Do you know, if the fixes got ever backported to RHEL kernels? It's already backported to RHEL 8 long time ago since kernel-4.18.0-154.el8. Not look

[ceph-users] Re: ceph-iscsi-cli: cannot remove duplicated gateways.

2023-02-19 Thread Xiubo Li
Cool :-) On 20/02/2023 10:19, luckydog xf wrote: okay, I restored the correct configuration by 'sudo rados put gateway.conf local-gw -p rbd Now problems resolved. Thanks and have a nice day. On Mon, Feb 20, 2023 at 10:13 AM Xiubo Li wrote: On 20/02/2023 10:11, luckydog xf

[ceph-users] Re: ceph-iscsi-cli: cannot remove duplicated gateways.

2023-02-19 Thread Xiubo Li
e commands that dump and restore an object. Could you give me an example? `rados ls -p rbd` shows tons of uuids. https://docs.ceph.com/en/latest/man/8/rados/ On Mon, Feb 20, 2023 at 9:30 AM Xiubo Li wrote: Hi So you are using the default 'rbd

[ceph-users] Re: ceph-iscsi-cli: cannot remove duplicated gateways.

2023-02-19 Thread Xiubo Li
configuration options are as follows, defaults shown. api_user = admin api_password = admin api_port = 5001 # API IP trusted_ip_list = 172.16.200.251,172.16.200.252 -- Best Regards, Xiubo Li (李秀波) Email:xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph-iscsi-cli: cannot remove duplicated gateways.

2023-02-18 Thread Xiubo Li
Please help, thanks. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___

[ceph-users] Re: iscsi target lun error

2023-01-16 Thread Xiubo Li
https://tracker.ceph.com/issues/57018 [3] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/block_device_guide/index#prerequisites_9 - Le 21 Nov 22, à 6:45, Xiubo Li xiu...@redhat.com a écrit : On 15/11/2022 23:44, Randy Morgan wrote: You are correct I am using

[ceph-users] Re: MDS: mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2022-12-20 Thread Xiubo Li
On 20/12/2022 18:34, Stolte, Felix wrote: Hi guys, i stumbled about these log entries in my active MDS on a pacific (16.2.10) cluster: 2022-12-20T10:06:52.124+0100 7f11ab408700 0 log_channel(cluster) log [WRN] : client.1207771517 isn't responding to mclientcaps(revoke), ino 0x10017e84452

[ceph-users] Re: Ceph filesystem

2022-12-19 Thread Xiubo Li
On 19/12/2022 21:19, akshay sharma wrote: Hi All, I have three Virtual machines with a dedicated disk for ceph, ceph cluster is up as shown below user@ubuntu:~/ceph-deploy$ sudo ceph status cluster: id: 06a014a8-d166-4add-a21d-24ed52dce5c0 health: HEALTH_WARN

  1   2   >