[ceph-users] Re: Urgent help with degraded filesystem needed

2024-06-19 Thread Xiubo Li
Hi Dietmar, On 6/19/24 15:43, Dietmar Rieder wrote: Hello cephers, we have a degraded filesystem on our ceph 18.2.2 cluster and I'd need to get it up again. We have 6 MDS daemons and (3 active, each pinned to a subtree, 3 standby) It started this night, I got the first HEALTH_WARN emails sa

[ceph-users] Re: [EXTERN] Re: Urgent help with degraded filesystem needed

2024-06-19 Thread Xiubo Li
On 6/19/24 16:13, Dietmar Rieder wrote: Hi Xiubo, [...] 0> 2024-06-19T07:12:39.236+ 7f90fa912700 -1 *** Caught signal (Aborted) **  in thread 7f90fa912700 thread_name:md_log_replay  ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable)  1: /lib64/libpthre

[ceph-users] Re: [EXTERN] Re: Urgent help with degraded filesystem needed

2024-07-01 Thread Xiubo Li
On 6/26/24 14:08, Dietmar Rieder wrote: ...sending also to the list and Xiubo (were accidentally removed from recipients)... On 6/25/24 21:28, Dietmar Rieder wrote: Hi Patrick,  Xiubo and List, finally we managed to get the filesystem repaired and running again! YEAH, I'm so happy!! Big t

[ceph-users] Re: Cephfs mds node already exists crashes mds

2024-08-20 Thread Xiubo Li
This looks the same with https://tracker.ceph.com/issues/52280, which has already been fixed. I just checked your ceph version, which has already included the fixing. So this should be a new case. BTW, how did this happen ?Were you doing the failover or something else ? Thanks - Xiubo On 8

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-06 Thread Xiubo Li
On 11/1/23 22:14, Frank Schilder wrote: Dear fellow cephers, today we observed a somewhat worrisome inconsistency on our ceph fs. A file created on one host showed up as 0 length on all other hosts: [user1@host1 h2lib]$ ls -lh total 37M -rw-rw 1 user1 user1 12K Nov 1 11:59 dll_wrapper.p

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-06 Thread Xiubo Li
On 11/1/23 23:57, Gregory Farnum wrote: We have seen issues like this a few times and they have all been kernel client bugs with CephFS’ internal “capability” file locking protocol. I’m not aware of any extant bugs like this in our code base, but kernel patches can take a long and winding path

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-06 Thread Xiubo Li
bug mds = 25 debug ms = 1 Thanks - Xiubo Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Thursday, November 2, 2023 12:12 PM To: Gregory Farnum; Xiubo Li Cc: ceph-users@ceph.io Subject: [ce

[ceph-users] Re: MDS stuck in rejoin

2023-11-07 Thread Xiubo Li
eagerly waiting for this and another one. Any idea when they might show up in distro kernels? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Xiubo Li Sent: Tuesday, August 8, 2023 2:57 AM To: Frank Schil

[ceph-users] Re: MDS stuck in rejoin

2023-11-09 Thread Xiubo Li
luded in the last Pacific point release. Yeah, this will be backport after it getting merged. But for kclient we still need another patch. Thanks - Xiubo Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiu

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-09 Thread Xiubo Li
On 11/10/23 00:18, Frank Schilder wrote: Hi Xiubo, I will try to answer questions from all your 3 e-mails here together with some new information we have. New: The problem occurs in newer python versions when using the shutil.copy function. There is also a function shutil.copy2 for which the

[ceph-users] Re: Different behaviors for ceph kernel client in limiting IOPS when data pool enters `nearfull`?

2023-11-15 Thread Xiubo Li
Hi Matt, On 11/15/23 02:40, Matt Larson wrote: On CentOS 7 systems with the CephFS kernel client, if the data pool has a `nearfull` status there is a slight reduction in write speeds (possibly 20-50% fewer IOPS). On a similar Rocky 8 system with the CephFS kernel client, if the data pool has `n

[ceph-users] Re: Different behaviors for ceph kernel client in limiting IOPS when data pool enters `nearfull`?

2023-11-16 Thread Xiubo Li
On 11/16/23 22:39, Ilya Dryomov wrote: On Thu, Nov 16, 2023 at 3:21 AM Xiubo Li wrote: Hi Matt, On 11/15/23 02:40, Matt Larson wrote: On CentOS 7 systems with the CephFS kernel client, if the data pool has a `nearfull` status there is a slight reduction in write speeds (possibly 20-50

[ceph-users] Re: Different behaviors for ceph kernel client in limiting IOPS when data pool enters `nearfull`?

2023-11-17 Thread Xiubo Li
On 11/17/23 00:41, Ilya Dryomov wrote: On Thu, Nov 16, 2023 at 5:26 PM Matt Larson wrote: Ilya, Thank you for providing these discussion threads on the Kernel fixes for where there was a change and details on this affects the clients. What is the expected behavior in CephFS client when

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Xiubo Li
On 11/22/23 16:02, zxcs wrote: HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operation on the same pa

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-22 Thread Xiubo Li
. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Friday, November 10, 2023 3:14 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent On 11/10/23 00

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-22 Thread Xiubo Li
kclient. Thanks! Will wait for further instructions. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Friday, November 10, 2023 3:14 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph.io Subject: Re: [ceph-users

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-22 Thread Xiubo Li
19:44,Xiubo Li 写道: On 11/22/23 16:02, zxcs wrote: HI, Experts, we are using cephfs with 16.2.* with multi active mds, and recently, we have two nodes mount with ceph-fuse due to the old os system. and one nodes run a python script with `glob.glob(path)`, and another client doing `cp` operati

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-26 Thread Xiubo Li
m S14 From: Xiubo Li Sent: Thursday, November 23, 2023 3:47 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent I just raised one tracker to follow this: https://tracker.ceph.com/issues/63510 Thanks -

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-11-26 Thread Xiubo Li
ion environment. thanks and regards Xiubo Li 于2023年11月23日周四 15:47写道: On 11/23/23 11:25, zxcs wrote: Thanks a ton, Xiubo! it not disappear. even we umount the ceph directory on these two old os node. after dump ops flight , we can see some request, and the earliest complain “failed to authpin, s

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-12-03 Thread Xiubo Li
updating in the tracker and I will try it. Thanks - Xiubo Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Xiubo Li Sent: Monday, November 27, 2023 3:59 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-12-04 Thread Xiubo Li
se include the part executed on the second host explicitly in an ssh-command. Running your scripts alone in their current form will not reproduce the issue. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________ From: Xiubo Li Se

[ceph-users] Re: mds slow request with “failed to authpin, subtree is being exported"

2023-12-04 Thread Xiubo Li
7日 13:19,Xiubo Li 写道: On 11/27/23 13:12, zxcs wrote: current, we using `ceph config set mds mds_bal_interval 3600` to set a fixed time(1 hour). we also have a question about how to set no balance for multi active mds. means, we will enable multi active mds(to improve throughput) and no balance

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-12-11 Thread Xiubo Li
ings up. I will update you, please keep the tracker open. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Xiubo Li Sent: Tuesday, December 5, 2023 1:58 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph.io Sub

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-17 Thread Xiubo Li
On 1/17/24 15:57, Eugen Block wrote: Hi, this is not an easy topic and there is no formula that can be applied to all clusters. From my experience, it is exactly how the discussion went in the thread you mentioned, trial & error. Looking at your session ls output, this reminds of a debug sess

[ceph-users] Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

2024-01-17 Thread Xiubo Li
On 1/13/24 07:02, Özkan Göksu wrote: Hello. I have 5 node ceph cluster and I'm constantly having "clients failing to respond to cache pressure" warning. I have 84 cephfs kernel clients (servers) and my users are accessing their personal subvolumes located on one pool. My users are software d

[ceph-users] Re: Ceph & iSCSI

2024-02-26 Thread Xiubo Li
Hi Michael, Please see the previous threads about the same question: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/GDJJL7VSDUJITPM3JV7RCVXVOIQO2CAN/ https://www.spinics.net/lists/ceph-users/msg73969.html Thanks - Xiubo On 2/27/24 11:22, Michael Worsham wrote: I was readin

[ceph-users] Re: MDS Behind on Trimming...

2024-03-27 Thread Xiubo Li
On 3/28/24 04:03, Erich Weiler wrote: Hi All, I've been battling this for a while and I'm not sure where to go from here.  I have a Ceph health warning as such: # ceph -s   cluster:     id: 58bde08a-d7ed-11ee-9098-506b4b4da440     health: HEALTH_WARN     1 MDSs report slow reques

[ceph-users] Re: MDS Behind on Trimming...

2024-03-28 Thread Xiubo Li
, caller_gid=600{600,608,999,}) currently joining batch getattr Can we tell which client the slow requests are coming from?  It says stuff like "client.99445:4189994" but I don't know how to map that to a client... Thanks for the response! -erich On 3/27/24 21:28, Xiubo Li wrot

[ceph-users] Re: MDS Behind on Trimming...

2024-04-07 Thread Xiubo Li
Hi Erich, Thanks for your logs, and it should be the same issue with https://tracker.ceph.com/issues/62052, could you try to test with this fix again ? Please let me know if you still could see this bug then it should be the locker order bug as https://tracker.ceph.com/issues/62123. Thanks

[ceph-users] Re: MDS Behind on Trimming...

2024-04-08 Thread Xiubo Li
On 4/8/24 12:32, Erich Weiler wrote: Ah, I see. Yes, we are already running version 18.2.1 on the server side (we just installed this cluster a few weeks ago from scratch). So I guess if the fix has already been backported to that version, then we still have a problem. Dos that mean it coul

[ceph-users] Re: Client kernel crashes on cephfs access

2024-04-08 Thread Xiubo Li
Hi Marc, Thanks for reporting this, I generated one patch to fix it. Will send it out after testing is done. - Xiubo On 4/8/24 16:01, Marc Ruhmann wrote: Hi everyone, I would like to ask for help regarding client kernel crashes that happen on cephfs access. We have been struggling with this

[ceph-users] Re: MDS Behind on Trimming...

2024-04-09 Thread Xiubo Li
On 4/8/24 12:32, Erich Weiler wrote: Ah, I see. Yes, we are already running version 18.2.1 on the server side (we just installed this cluster a few weeks ago from scratch). So I guess if the fix has already been backported to that version, then we still have a problem. Dos that mean it coul

[ceph-users] Re: MDS Behind on Trimming...

2024-04-09 Thread Xiubo Li
On 4/10/24 11:48, Erich Weiler wrote: Dos that mean it could be the locker order bug (https://tracker.ceph.com/issues/62123) as Xiubo suggested? I have raised one PR to fix the lock order issue, if possible please have a try to see could it resolve this issue. Thank you!  Yeah, this issue i

[ceph-users] Re: MDS Behind on Trimming...

2024-04-14 Thread Xiubo Li
Hi Erich, Two things I need to make them to be clear: 1, Since there has no debug log so I am not very sure my fixing PR will 100% fix this. 2, It will take time to get this PR to be merged in upstream. So I couldn't tell exactly when this PR will be backported to downstream and then be rel

[ceph-users] Re: Question about PR merge

2024-04-17 Thread Xiubo Li
On 4/18/24 08:57, Erich Weiler wrote: Have you already shared information about this issue? Please do if not. I am working with Xiubo Li and providing debugging information - in progress! From the blocked ops output it very similiar the same issue as Patrick's lock order fixed before

[ceph-users] Re: Client kernel crashes on cephfs access

2024-04-17 Thread Xiubo Li
Hi Konstantin, We have fixed it, please see https://patchwork.kernel.org/project/ceph-devel/list/?series=842682&archive=both. - Xiubo On 4/18/24 00:05, Konstantin Shalygin wrote: Hi, On 9 Apr 2024, at 04:07, Xiubo Li wrote: Thanks for reporting this, I generated one patch to fi

[ceph-users] Re: Question about PR merge

2024-04-17 Thread Xiubo Li
ug logs to confirm it. Thanks - Xiubo On 4/18/24 14:22, Nigel Williams wrote: Hi Xiubo, Is the issue we provided logs on the same as Erich or is that a third different locking issue? thanks, nigel. On Thu, 18 Apr 2024 at 12:29, Xiubo Li wrote: On 4/18/24 08:57, Erich Weiler wrote: &

[ceph-users] Re: MDS crash

2024-04-21 Thread Xiubo Li
Hi Alexey, This looks a new issue for me. Please create a tracker for it and provide the detail call trace there. Thanks - Xiubo On 4/19/24 05:42, alexey.gerasi...@opencascade.com wrote: Dear colleagues, hope that anybody can help us. The initial point: Ceph cluster v15.2 (installed and c

[ceph-users] Re: MDS Behind on Trimming...

2024-04-21 Thread Xiubo Li
overloaded? Can a single server hold multiple MDS daemons?  Right now I have three physical servers each with one MDS daemon on it. I can still try reducing to one.  And I'll keep an eye on blocked ops to see if any get to a very old age (and are thus wedged). -erich On 4/18/24 8:55 PM,

[ceph-users] Re: Question about PR merge

2024-04-22 Thread Xiubo Li
the issue we provided logs on the same as Erich or is that a third different locking issue? thanks, nigel. On Thu, 18 Apr 2024 at 12:29, Xiubo Li wrote: On 4/18/24 08:57, Erich Weiler wrote: >> Have you already shared information about this issue? Please do if not. >

[ceph-users] Re: MDS Behind on Trimming...

2024-04-29 Thread Xiubo Li
t the test pacakges from https://shaman.ceph.com/builds/ceph/. But this need to trigger a build first. -erich On 4/21/24 9:39 PM, Xiubo Li wrote: Hi Erich, I raised one tracker for this https://tracker.ceph.com/issues/65607. Currently I haven't figured out where was holding the 'dn

[ceph-users] Re: MDS 17.2.7 crashes at rejoin

2024-05-06 Thread Xiubo Li
This is a known issue, please see https://tracker.ceph.com/issues/60986. If you could reproduce it then please enable the mds debug logs and this could help debug it fast: debug_mds = 25 debug_ms = 1 Thanks - Xiubo On 5/7/24 00:26, Robert Sander wrote: Hi, a 17.2.7 cluster with two fil

[ceph-users] Re: MDS crashes shortly after starting

2024-05-06 Thread Xiubo Li
The same issue with https://tracker.ceph.com/issues/60986 and as Robert Sander reported. On 5/6/24 05:11, E Taka wrote: Hi all, we have a serious problem with CephFS. A few days ago, the CephFS file systems became inaccessible, with the message MDS_DAMAGE: 1 mds daemon damaged The cephfs-jour

[ceph-users] Re: MDS 17.2.7 crashes at rejoin

2024-05-06 Thread Xiubo Li
Possibly, because we have seen this only in ceph 17. And if you could reproduce it then please provide the mds debug logs, after this we can quickly find the root cause of it. Thanks - Xiubo On 5/7/24 12:19, Robert Sander wrote: Hi, would an update to 18.2 help? Regards

[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-08 Thread Xiubo Li
Hi Dejan, This is a known issue and please see https://tracker.ceph.com/issues/61009. For the workaround please see https://tracker.ceph.com/issues/61009#note-26. Thanks - Xiubo On 5/8/24 06:49, Dejan Lesjak wrote: Hello, We have cephfs with two active MDS. Currently rank 1 is repeatedly cr

[ceph-users] Re: MDS crash in interval_set: FAILED ceph_assert(p->first <= start)

2024-05-08 Thread Xiubo Li
On 5/8/24 17:36, Dejan Lesjak wrote: Hi Xiubo, On 8. 05. 24 09:53, Xiubo Li wrote: Hi Dejan, This is a known issue and please see https://tracker.ceph.com/issues/61009. For the workaround please see https://tracker.ceph.com/issues/61009#note-26. Thank you for the links. Unfortunately

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-03 Thread Xiubo Li
Hi Nicolas, This is a known issue and Venky is working on it, please see https://tracker.ceph.com/issues/63259. Thanks - Xiubo On 6/3/24 20:04, nbarb...@deltaonline.net wrote: Hello, First of all, thanks for reading my message. I set up a Ceph version 18.2.2 cluster with 4 nodes, everythin

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread Xiubo Li
. - Xiubo Kind regards, Sake Op 04-06-2024 04:04 CEST schreef Xiubo Li : Hi Nicolas, This is a known issue and Venky is working on it, please see https://tracker.ceph.com/issues/63259. Thanks - Xiubo On 6/3/24 20:04, nbarb...@deltaonline.net wrote: Hello, First of all, thanks for reading

[ceph-users] Re: CephFS perforamnce degradation in root directory

2022-08-15 Thread Xiubo Li
On 8/9/22 4:07 PM, Robert Sander wrote: Hi, we have a cluster with 7 nodes each with 10 SSD OSDs providing CephFS to a CloudStack system as primary storage. When copying a large file into the root directory of the CephFS the bandwidth drops from 500MB/s to 50MB/s after around 30 seconds. W

[ceph-users] Re: ceph kernel client RIP when quota exceeded

2022-08-16 Thread Xiubo Li
ot in that range.     Also, fix ceph_has_realms_with_quotas to return false when encountering     a reserved inode.     URL: https://tracker.ceph.com/issues/53180     Reported-by: Hu Weiwen     Signed-off-by: Jeff Layton     Reviewed-by: Luis Henriques     Reviewed-by: Xiubo Li     Signed-o

[ceph-users] Re: how to fix mds stuck at dispatched without restart ads

2022-08-31 Thread Xiubo Li
On 8/31/22 2:43 PM, zxcs wrote: Hi, experts we have a cephfs(15.2.13) cluster with kernel mount, and when we read from 2000+ processes to one ceph path(called /path/to/A/), then all of the process hung, and ls -lrth /path/to/A/ always stuck, but list other directory are health( /path/to/B/)

[ceph-users] Re: how to fix mds stuck at dispatched without restart ads

2022-09-01 Thread Xiubo Li
t need list directory). Seems it can 100% recurrent these days. Thanks All! If possible please enable the 'debug_mds = 10' and 'debug_ms = 1'. Thanks! Thanks, zx 2022年8月31日 15:23,Xiubo Li <mailto:xiu...@redhat.com>> 写道: On 8/31/22 2:43 PM, zxcs wrote

[ceph-users] Re: mds's stay in up:standby

2022-09-07 Thread Xiubo Li
On 08/09/2022 04:24, Tobias Florek wrote: Hi! I am running a rook managed hyperconverged ceph cluster on kubernetes using ceph 17.2.3 with a single-rank single fs cephfs. I am now facing the problem that the mds's stay in up:standby.  I tried setting allow_standby_replay to false and restar

[ceph-users] Re: Ceph iSCSI rbd-target.api Failed to Load

2022-09-09 Thread Xiubo Li
On 07/09/2022 17:37, duluxoz wrote: Hi All, I've followed the instructions on the CEPH Doco website on Configuring the iSCSI Target. Everything went AOK up to the point where I try to start the rbd-target-api service, which fails (the rbd-target-gw service started OK). A `systemctl status

[ceph-users] Re: Ceph iSCSI rbd-target.api Failed to Load

2022-09-12 Thread Xiubo Li
On 10/09/2022 12:50, duluxoz wrote: Hi Guys, So, I finally got things sorted :-) Time to eat some crow-pie :-P Turns out I had two issues, both of which involved typos (don't they always?). The first was I had transposed two digits of an IP Address in the `iscsi-gateway.cfg` -> `trusted_i

[ceph-users] Re: adding mds service , unable to create keyring for mds

2022-09-14 Thread Xiubo Li
On 15/09/2022 03:09, Jerry Buburuz wrote: Hello, I am trying to add my first mds service on any node. I am unable to add keyring to start mds service. # $ sudo ceph auth get-or-create mds.mynode mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' Error ENINVAL: key for mds.mynode

[ceph-users] Re: adding mds service , unable to create keyring for mds

2022-09-15 Thread Xiubo Li
ment/ If you want to modify it please see the "MODIFY USER CAPABILITIES" section how to do it in the above doc as Eugen suggested. Thanks Xiubo I am following instructions on https://docs.ceph.com/latest/cephfs/add-remove-mds thanks jerry Xiubo Li On 15/09/2022 03:09, Jerry Buburuz

[ceph-users] Re: tcmu-runner lock failure

2022-09-19 Thread Xiubo Li
On 19/09/2022 23:32, j.rasakunasin...@autowork.com wrote: Hi, we have 3x controller and 6xstorage Ceph Cluster running. We use iscsi/tcmu runner (16.2.9) to connect VMware to Ceph. We face an issue, that we lost the connection to the iscsi gateways, that ESXi is connected not works properly.

[ceph-users] Re: Getting started with cephfs-top, how to install

2022-10-18 Thread Xiubo Li
Hi Zach, On 18/10/2022 04:20, Zach Heise (SSCC) wrote: I'd like to see what CephFS clients are doing the most IO. According to this page: https://docs.ceph.com/en/quincy/cephfs/cephfs-top/ - cephfs-top is the simplest way to do this? I enabled 'ceph mgr module enable stats' today, but I'm a

[ceph-users] Re: How to determine if a filesystem is allow_standby_replay = true

2022-10-20 Thread Xiubo Li
Hi Wesley, You can also just run: $ ceph fs get MyFS|grep flags flags    32 joinable allow_snaps allow_multimds_snaps allow_standby_replay And if you can see "allow_standby_replay" flag as above that means it's enabled, or disabled already. - Xiubo On 21/10/2022 05:58, Wesley Dillingham wr

[ceph-users] Re: MDS_CLIENT_LATE_RELEASE after setting up scheduled CephFS snapshots

2022-10-21 Thread Xiubo Li
On 21/10/2022 19:39, Rishabh Dave wrote: Hi Edward, On Wed, 19 Oct 2022 at 21:27, Edward R Huyer wrote: I recently set up scheduled snapshots on my CephFS filesystem, and ever since the cluster has been intermittently going into HEALTH_WARN with an MDS_CLIENT_LATE_RELEASE notification. Sp

[ceph-users] Re: iscsi target lun error

2022-11-09 Thread Xiubo Li
On 10/11/2022 02:21, Randy Morgan wrote: I am trying to create a second iscsi target and I keep getting an error when I create the second target:    Failed to update target 'iqn.2001-07.com.ceph:1667946365517' disk create/update failed on host.containers.internal. LUN allocation fai

[ceph-users] Re: iscsi target lun error

2022-11-20 Thread Xiubo Li
.. [Auth: ACL_ENABLED, Hosts: 0] sh-4.4# Randy On 11/9/2022 6:36 PM, Xiubo Li wrote: On 10/11/2022 02:21, Randy Morgan wrote: I am trying to create a second iscsi target and I keep getting an error when I create the second target:    Failed to update

[ceph-users] Re: filesystem became read only after Quincy upgrade

2022-11-23 Thread Xiubo Li
Hi Adrien, On 23/11/2022 19:49, Adrien Georget wrote: Hi, We upgraded this morning a Pacific Ceph cluster to the last Quincy version. The cluster was healthy before the upgrade, everything was done according to the upgrade procedure (non-cephadm) [1], all services have restarted correctly bu

[ceph-users] Re: filesystem became read only after Quincy upgrade

2022-11-23 Thread Xiubo Li
On 23/11/2022 19:49, Adrien Georget wrote: Hi, We upgraded this morning a Pacific Ceph cluster to the last Quincy version. The cluster was healthy before the upgrade, everything was done according to the upgrade procedure (non-cephadm) [1], all services have restarted correctly but the files

[ceph-users] Re: filesystem became read only after Quincy upgrade

2022-11-24 Thread Xiubo Li
Hi Adren, Thank you for your logs. From your logs I found one bug and I have raised one new tracker [1] to follow it, and raised a ceph PR [2] to fix this. More detail please my analysis in the tracker [2]. [1] https://tracker.ceph.com/issues/58082 [2] https://github.com/ceph/ceph/pull/49048

[ceph-users] Re: filesystem became read only after Quincy upgrade

2022-11-25 Thread Xiubo Li
the MDSs. - Xiubo Cheers, Adrien Le 25/11/2022 à 06:13, Xiubo Li a écrit : Hi Adren, Thank you for your logs. From your logs I found one bug and I have raised one new tracker [1] to follow it, and raised a ceph PR [2] to fix this. More detail please my analysis in the tracker [2]. [1

[ceph-users] Re: filesystem became read only after Quincy upgrade

2022-11-28 Thread Xiubo Li
! Cool! - Xiubo Cheers, Adrien Le 26/11/2022 à 05:08, Xiubo Li a écrit : On 25/11/2022 16:25, Adrien Georget wrote: Hi Xiubo, Thanks for your analysis. Is there anything I can do to put CephFS back in healthy state? Or should I wait for to patch to fix that bug? Please try to trim the

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-12 Thread Xiubo Li
Hi Stolte, For the VMware config could you refer to : https://docs.ceph.com/en/latest/rbd/iscsi-initiator-esx/ ? What's the "Path Selection Policy with ALUA" you are using ? The ceph-iscsi couldn't implement the real AA, so if you use the RR I think it will be like this. - Xiubo On 12/12/

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-13 Thread Xiubo Li
f. Dr.-Ing. Harald Bolt, Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior - - Am 12.12.2022 um 13:03 schrieb Xiu

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-13 Thread Xiubo Li
minutes before a gateway restart, so there is not an outage It has been extremely stable for us Thanks Joe Xiubo Li 12/13/2022 4:21 AM >>> On 13/12/2022 18:57, Stolte, Felix wrote: Hi Xiubo, Thx for pointing me into the right direction. All involved esx host seem to use th

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-15 Thread Xiubo Li
On 15/12/2022 02:46, Joe Comeau wrote: That's correct - we use the kernel target not tcmu-runner Okay. There are some difference for the configurations between kernel target and the ceph-iscsi target. Thanks, - Xiubo >>> Xiubo Li 12/13/2022 6:02 PM >>> O

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-15 Thread Xiubo Li
routinely reboot the iscsi gateways - during patching and updates and the storage migrates to and from all servers without issue We usually wait about 10 minutes before a gateway restart, so there is not an outage It has been extremely stable for us Thanks Joe >>> Xiubo Li 12/13/2022 4:21 A

[ceph-users] Re: ceph-iscsi lock ping pong

2022-12-15 Thread Xiubo Li
e Melchior - - Am 13.12.2022 um 13:21 schrieb Xiubo Li : On 13/12/2022 18:57, Stolte, Felix wrote: Hi Xiubo, Thx for pointing me into the right direction. All involved esx host seem to use the correct policy. I am going to detach the LUN on each host one by one unt

[ceph-users] Re: specify fsname in kubernetes connection (or set default on the keyring)

2022-12-19 Thread Xiubo Li
Hi Carlos This sounds like a bug in k8s. On 20/12/2022 07:21, Carlos Mogas da Silva wrote: To answer my own question. I've found this: https://github.com/kubernetes/kubernetes/issues/104095#issuecomment-1276873578 With it, we just "command inject" the correct fs to the mount command and eve

[ceph-users] Re: Protecting Files in CephFS from accidental deletion or encryption

2022-12-19 Thread Xiubo Li
On 20/12/2022 03:15, Stefan Kooman wrote: On 12/19/22 18:36, Ramana Krisna Venkatesh Raja wrote: On Mon, Dec 19, 2022 at 12:20 PM Ramana Krisna Venkatesh Raja wrote: On Mon, Dec 19, 2022 at 11:14 AM Stefan Kooman wrote: On 12/19/22 16:46, Christoph Adomeit wrote: Hi, we are planning an

[ceph-users] Re: Ceph filesystem

2022-12-19 Thread Xiubo Li
On 19/12/2022 21:19, akshay sharma wrote: Hi All, I have three Virtual machines with a dedicated disk for ceph, ceph cluster is up as shown below user@ubuntu:~/ceph-deploy$ sudo ceph status cluster: id: 06a014a8-d166-4add-a21d-24ed52dce5c0 health: HEALTH_WARN

[ceph-users] Re: MDS: mclientcaps(revoke), pending pAsLsXsFsc issued pAsLsXsFsc

2022-12-20 Thread Xiubo Li
On 20/12/2022 18:34, Stolte, Felix wrote: Hi guys, i stumbled about these log entries in my active MDS on a pacific (16.2.10) cluster: 2022-12-20T10:06:52.124+0100 7f11ab408700 0 log_channel(cluster) log [WRN] : client.1207771517 isn't responding to mclientcaps(revoke), ino 0x10017e84452

[ceph-users] Re: iscsi target lun error

2023-01-16 Thread Xiubo Li
https://tracker.ceph.com/issues/57018 [3] https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html-single/block_device_guide/index#prerequisites_9 - Le 21 Nov 22, à 6:45, Xiubo Li xiu...@redhat.com a écrit : On 15/11/2022 23:44, Randy Morgan wrote: You are correct I am using

[ceph-users] Re: ceph-iscsi-cli: cannot remove duplicated gateways.

2023-02-18 Thread Xiubo Li
Please help, thanks. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___

[ceph-users] Re: ceph-iscsi-cli: cannot remove duplicated gateways.

2023-02-19 Thread Xiubo Li
configuration options are as follows, defaults shown. api_user = admin api_password = admin api_port = 5001 # API IP trusted_ip_list = 172.16.200.251,172.16.200.252 -- Best Regards, Xiubo Li (李秀波) Email:xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph-iscsi-cli: cannot remove duplicated gateways.

2023-02-19 Thread Xiubo Li
e commands that dump and restore an object. Could you give me an example? `rados ls -p rbd` shows tons of uuids. https://docs.ceph.com/en/latest/man/8/rados/ On Mon, Feb 20, 2023 at 9:30 AM Xiubo Li wrote: Hi So you are using the default 'rbd

[ceph-users] Re: ceph-iscsi-cli: cannot remove duplicated gateways.

2023-02-19 Thread Xiubo Li
Cool :-) On 20/02/2023 10:19, luckydog xf wrote: okay, I restored the correct configuration by 'sudo rados put gateway.conf local-gw -p rbd Now problems resolved. Thanks and have a nice day. On Mon, Feb 20, 2023 at 10:13 AM Xiubo Li wrote: On 20/02/2023 10:11, luckydog xf

[ceph-users] Re: kernel client osdc ops stuck and mds slow reqs

2023-02-20 Thread Xiubo Li
On 20/02/2023 22:28, Kuhring, Mathias wrote: Hey Dan, hey Ilya I know this issue is two years old already, but we are having similar issues. Do you know, if the fixes got ever backported to RHEL kernels? It's already backported to RHEL 8 long time ago since kernel-4.18.0-154.el8. Not look

[ceph-users] Re: MDS stuck in "up:replay"

2023-02-22 Thread Xiubo Li
ng hopefully useful debug logs. Not intended to fix the problem for you. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...

[ceph-users] Re: Creating a role for quota management

2023-03-06 Thread Xiubo Li
Hi Maybe you can use the rule of CEPHFS CLIENT CAPABILITIES and only enable the 'p' permission for some users, which will allow them to SET_VXATTR. I didn't find the similar cap from the OSD CAPBILITIES. Thanks On 07/03/2023 00:33, anantha.ad...@intel.com wrote: Hello, Can you provide deta

[ceph-users] Re: libceph: mds1 IP+PORT wrong peer at address

2023-03-12 Thread Xiubo Li
__ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___ ceph-users mailing list -- ceph-users@ceph.io T

[ceph-users] Re: libceph: mds1 IP+PORT wrong peer at address

2023-03-13 Thread Xiubo Li
gards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: 13 March 2023 01:44:49 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] libceph: mds1 IP+PORT wrong peer at address Hi Frank, BTW, what&#

[ceph-users] Re: CephFS thrashing through the page cache

2023-03-16 Thread Xiubo Li
ph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CephFS thrashing through the page cache

2023-03-16 Thread Xiubo Li
-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CephFS thrashing through the page cache

2023-03-17 Thread Xiubo Li
.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@i

[ceph-users] Re: CephFS thrashing through the page cache

2023-03-17 Thread Xiubo Li
is should work for old kernels before the ceph_netfs_expand_readahead() being introduced. I will improve it next week. Thanks for your reporting about this. Thanks - Xiubo Thanks and Regards, Ashu Pachauri On Fri, Mar 17, 2023 at 2:14 PM Xiubo Li wrote: On 15/03/2023 17:20, Frank Schilder wrote: >

[ceph-users] Re: MDS host in OSD blacklist

2023-03-21 Thread Xiubo Li
mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-22 Thread Xiubo Li
h.io> To unsubscribe send an email to ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io> ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Xiubo Li (李秀波) Email: xiu...@redhat.com/xiu...@ibm.com Slack: @Xiubo Li ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS host in OSD blacklist

2023-03-22 Thread Xiubo Li
y and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Xiubo Li Sent: 22 March 2023 07:27:08 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] MDS host in OSD blacklist Hi Frank, This should be the

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-27 Thread Xiubo Li
ar test, which will untar a kernel tarball, but never seen this yet. I will try this again tomorrow without the NFS client. Thanks - Xiubo Thanks for your help and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ Fro

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-27 Thread Xiubo Li
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Monday, March 27, 2023 5:22 PM To: Xiubo Li; Gregory Farnum Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: ln: failed to create hard link 'file name&#x

[ceph-users] Re: ln: failed to create hard link 'file name': Read-only file system

2023-03-27 Thread Xiubo Li
=== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io> To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io&

[ceph-users] Re: CephFS thrashing through the page cache

2023-04-04 Thread Xiubo Li
mance issue. Would be great if this becomes part of a test suite. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Ashu Pachauri Sent: 17 March 2023 09:55:25 To: Xiubo Li Cc:

[ceph-users] Re: Read and write performance on distributed filesystem

2023-04-04 Thread Xiubo Li
On 4/4/23 07:59, David Cunningham wrote: Hello, We are considering CephFS as an alternative to GlusterFS, and have some questions about performance. Is anyone able to advise us please? This would be for file systems between 100GB and 2TB in size, average file size around 5MB, and a mixture of

  1   2   >