[ceph-users] Creating a role for allowing users to set quota on CpehFS pools
Hello, How do I create a role to allow users to set quota on specific Ceph FS pools. ? Thank you ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: orchestrator issues on ceph 16.2.9
Hi Adrian, Could you please open a tracker issue (https://tracker.ceph.com/) and share the traceback from the Dashboard crash? Thank you! Kind Regards, Ernesto On Sat, Mar 4, 2023 at 9:50 AM Adrian Nicolae wrote: > Hi, > > I have some orchestrator issues on our cluster running 16.2.9 with rgw > only services. > > We first noticed these issues a few weeks ago when adding new hosts to > the cluster - the orch was not detecting the new drives to build the > osd containers for them. Debugging the mgr logs, I noticed that the mgr > was crashing due to the dashboard module. I disabled the dashboard > module and the new drives were detected and added to the cluster. > > Now we have other similar issues : we have a failed drive. The failure > was detected , the osd was marked as down and the rebalancing is > finished. I want to remove the failed osd from the cluster but it looks > like the orch is not working : > > - I launched the osd removal with 'ceph orch osd rm 92 --force' where > 92 is the osd id in question > > - I checked the progress but nothing happens even after a few days : > > ceph orch osd rm status > OSD HOSTSTATEPGS REPLACE FORCE ZAPDRAIN STARTED AT > *92 node10 started0 FalseTrue False* > > - the osd process is stopped on that host and from the orch side I can > see this : > > ceph orch ps --daemon_type osd --daemon_id 92 > NAMEHOSTPORTS STATUS REFRESHED AGE MEM USE MEM LIM > VERSIONIMAGE ID > osd.92 node10 error 11h ago 4w-4096M > > > - I have the same long refresh interval on other osds as well. I know it > should be 10 minutes or so : > > NAMEHOSTPORTS STATUSREFRESHED AGE MEM USE MEM LIM > VERSION IMAGE ID CONTAINER ID > osd.93 node09 running (4w)11h ago 4w5573M 4096M > 16.2.9 3520ead5eb19 d2f658e9e37b > > osd: *116 osds: 115 up* (since 11d), 115 in (since 11d) > > 90hdd 16.37109 1.0 16 TiB 4.0 TiB 4.0 TiB 0 B 17 > GiB12 TiB 24.66 0.97 146 up > * 92hdd 0 0 0 B 0 B 0 B 0 B 0 > B 0 B 0 00down* > 94hdd 16.37109 1.0 16 TiB 4.0 TiB 4.0 TiB 0 B 17 > GiB12 TiB 24.66 0.97 146 up > > - I activated debug 20 on the mgr but I can't see any errors or other > clues regarding the osd removal. I also switched to the standby manager > with 'ceph mgr fail'. The mgr switch works but still nothing happens > > - It's not only the osd removal thing. I also tried to deploy new rgw > services by applying rgw labels on 2 new hosts , we have specs for > building rgw containers when detecting the label. Again, nothing happens. > > I'm planning to upgrade to 16.2.11 to see if this solves the issues but > I'm not very confident, I didn't see anything regarding this in the > changelogs. Is there anything else I can try to debug this issue ? > > Thanks. > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] rbd on EC pool with fast and extremely slow writes/reads
Hi, I have a problem on one of ceph clusters I do not understand. ceph 17.2.5 on 17 servers, 400 HDD OSDs, 10 and 25Gb/s NICs 3TB rbd image is on erasure coded 8+3 pool with 128pgs , xfs filesystem, 4MB objects in rbd image, mostly empy. I have created a bunch of 10G files, most of them were written with 1.5GB/s, few of them were really slow, ~10MB/s, a factor of 100. When reading these files back, the fast-written ones are read fast, ~2-2.5GB/s, the slowly-written are also extremely slow in reading, iotop shows between 1 and 30 MB/s reading speed. This does not happen at all on replicated images. There are some OSDs with higher apply/commit latency, eg 200ms, but there are no slow ops. The tests were done actually on proxmox vm with librbd, but the same happens with krbd, and on bare metal with mounted krbd as well. I have tried to check all OSDs for laggy drives, but they all look about the same. I have also copied entire image with "rados get...", object by object, the strange thing here is that most of objects were copied within 0.1-0.2s, but quite some took more than 1s. The cluster is quite busy with base traffic of ~1-2GB/s, so the speeds can vary due to that. But I would not expect a factor of 100 slowdown for some writes/reads with rbds. Any clues on what might be wrong or what else to check? I have another similar ceph cluster where everything looks fine. Best, Andrej -- _ prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674Fax: +386-1-477-3166 - ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: s3 compatible interface
On 3/3/23 13:53, Kai Stian Olstad wrote: On Wed, Mar 01, 2023 at 08:39:56AM -0500, Daniel Gryniewicz wrote: We're actually writing this for RGW right now. It'll be a bit before it's productized, but it's in the works. Just curious, what is the use cases for this feature? S3 against CephFS? Local FS for development use, and distributed FS (initial target is GPFS) for production. There's no current plans to make it work against CephFS, although I would imagine it will work fine. But if you have a Ceph cluster, you're much better off using standard RGW on RADOS. Daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Creating a role for quota management
Hello, Can you provide details on how to create a role for allowing a set of users to set quota on CephFS pools ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: s3 compatible interface
+1. If I know radosgw on top of cephfs is a thing, I may change some plans. Is that the planned route? Thanks, Kevin From: Daniel Gryniewicz Sent: Monday, March 6, 2023 6:21 AM To: Kai Stian Olstad Cc: ceph-users@ceph.io Subject: [ceph-users] Re: s3 compatible interface Check twice before you click! This email originated from outside PNNL. On 3/3/23 13:53, Kai Stian Olstad wrote: > On Wed, Mar 01, 2023 at 08:39:56AM -0500, Daniel Gryniewicz wrote: >> We're actually writing this for RGW right now. It'll be a bit before >> it's productized, but it's in the works. > > Just curious, what is the use cases for this feature? > S3 against CephFS? > Local FS for development use, and distributed FS (initial target is GPFS) for production. There's no current plans to make it work against CephFS, although I would imagine it will work fine. But if you have a Ceph cluster, you're much better off using standard RGW on RADOS. Daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: s3 compatible interface
As far as I know, we have no plans to productize (or even test) on CephFS. It should work, but CephFS isn't pure POSIX, so there may be issues. Daniel On 3/6/23 11:57, Fox, Kevin M wrote: +1. If I know radosgw on top of cephfs is a thing, I may change some plans. Is that the planned route? Thanks, Kevin From: Daniel Gryniewicz Sent: Monday, March 6, 2023 6:21 AM To: Kai Stian Olstad Cc: ceph-users@ceph.io Subject: [ceph-users] Re: s3 compatible interface Check twice before you click! This email originated from outside PNNL. On 3/3/23 13:53, Kai Stian Olstad wrote: On Wed, Mar 01, 2023 at 08:39:56AM -0500, Daniel Gryniewicz wrote: We're actually writing this for RGW right now. It'll be a bit before it's productized, but it's in the works. Just curious, what is the use cases for this feature? S3 against CephFS? Local FS for development use, and distributed FS (initial target is GPFS) for production. There's no current plans to make it work against CephFS, although I would imagine it will work fine. But if you have a Ceph cluster, you're much better off using standard RGW on RADOS. Daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: rbd on EC pool with fast and extremely slow writes/reads
When I have seen behavior like this it was a dying drive. It only became obviously when I did a smart long test and I got failed reads. Still reported smart OK though so that was a lie. -- Paul Mezzanini Platform Engineer III Research Computing Rochester Institute of Technology From: Andrej Filipcic Sent: Monday, March 6, 2023 8:51 AM To: ceph-users Subject: [ceph-users] rbd on EC pool with fast and extremely slow writes/reads Hi, I have a problem on one of ceph clusters I do not understand. ceph 17.2.5 on 17 servers, 400 HDD OSDs, 10 and 25Gb/s NICs 3TB rbd image is on erasure coded 8+3 pool with 128pgs , xfs filesystem, 4MB objects in rbd image, mostly empy. I have created a bunch of 10G files, most of them were written with 1.5GB/s, few of them were really slow, ~10MB/s, a factor of 100. When reading these files back, the fast-written ones are read fast, ~2-2.5GB/s, the slowly-written are also extremely slow in reading, iotop shows between 1 and 30 MB/s reading speed. This does not happen at all on replicated images. There are some OSDs with higher apply/commit latency, eg 200ms, but there are no slow ops. The tests were done actually on proxmox vm with librbd, but the same happens with krbd, and on bare metal with mounted krbd as well. I have tried to check all OSDs for laggy drives, but they all look about the same. I have also copied entire image with "rados get...", object by object, the strange thing here is that most of objects were copied within 0.1-0.2s, but quite some took more than 1s. The cluster is quite busy with base traffic of ~1-2GB/s, so the speeds can vary due to that. But I would not expect a factor of 100 slowdown for some writes/reads with rbds. Any clues on what might be wrong or what else to check? I have another similar ceph cluster where everything looks fine. Best, Andrej -- _ prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674Fax: +386-1-477-3166 - ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5
Can I see the output of `ceph orch upgrade status` and `ceph config dump | grep image`? The "Pulling container image stop" implies somehow (as Eugen pointed out) that cephadm thinks the image to pull is named "stop" which means it is likely set as either the image to upgrade to or as one of the config options. On Sat, Mar 4, 2023 at 2:06 AM wrote: > I initially ran the upgrade fine but it failed @ around 40/100 on an osd, > so after waiting for along time i thought I'd try restarting it and then > restarting the upgrade. > I am stuck with the below debug error, I have tested docker pull from > other servers and they dont fail for the ceph images but on ceph it does. > If i even try to redeploy or add or remove mon damons for example it comes > up with the same error related to the images. > > The error that ceph is giving me is: > 2023-03-02T07:22:45.063976-0700 mgr.mgr-node.idvkbw [DBG] _run_cephadm : > args = [] > 2023-03-02T07:22:45.070342-0700 mgr.mgr-node.idvkbw [DBG] args: --image > stop --no-container-init pull > 2023-03-02T07:22:45.081086-0700 mgr.mgr-node.idvkbw [DBG] Running command: > which python3 > 2023-03-02T07:22:45.180052-0700 mgr.mgr-node.idvkbw [DBG] Running command: > /usr/bin/python3 > /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e > --image stop --no-container-init pull > 2023-03-02T07:22:46.500561-0700 mgr.mgr-node.idvkbw [DBG] code: 1 > 2023-03-02T07:22:46.500787-0700 mgr.mgr-node.idvkbw [DBG] err: Pulling > container image stop... > Non-zero exit code 1 from /usr/bin/docker pull stop > /usr/bin/docker: stdout Using default tag: latest > /usr/bin/docker: stderr Error response from daemon: pull access denied for > stop, repository does not exist or may require 'docker login': denied: > requested access to the resource is denied > ERROR: Failed command: /usr/bin/docker pull stop > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5
I've seen what appears to be the same post on Reddit, previously, and attempted to assist. My suspicion is a "stop" command was passed to ceph orch upgrade in an attempt to stop it, but with the --image flag preceding it, setting the image to stop. I asked the user to do an actual upgrade stop, then re-attempt specifying a different image, and the user indicated the "stop" image pull attempt continued. That part didn't seem right, to which I suggested a bug report. https://www.reddit.com/r/ceph/comments/11g3rze/anyone_having_pull_issues_with_ceph_images/ @OP - are you the same poster as the above, or do you just have the same problem? If there's multiple users with this, it would indicate something larger than just a misplaced option/flag/command. If it is you - could you link to the bug report? Just to make sure, you've issued: "ceph orch upgrade stop" Then performed another "ceph orch upgrade start" specifying a --ceph-version or --image? I'll also echo Adam's request for a "ceph config dump |grep image". It sounds like it's still set to "stop", but I'd have expected the above to initiate an upgrade to the correct image. If not, the bug report would be helpful to continue so it could be fixed. David On Mon, Mar 6, 2023, at 15:02, Adam King wrote: > Can I see the output of `ceph orch upgrade status` and `ceph config dump | > grep image`? The "Pulling container image stop" implies somehow (as Eugen > pointed out) that cephadm thinks the image to pull is named "stop" which > means it is likely set as either the image to upgrade to or as one of the > config options. > > On Sat, Mar 4, 2023 at 2:06 AM wrote: > >> I initially ran the upgrade fine but it failed @ around 40/100 on an osd, >> so after waiting for along time i thought I'd try restarting it and then >> restarting the upgrade. >> I am stuck with the below debug error, I have tested docker pull from >> other servers and they dont fail for the ceph images but on ceph it does. >> If i even try to redeploy or add or remove mon damons for example it comes >> up with the same error related to the images. >> >> The error that ceph is giving me is: >> 2023-03-02T07:22:45.063976-0700 mgr.mgr-node.idvkbw [DBG] _run_cephadm : >> args = [] >> 2023-03-02T07:22:45.070342-0700 mgr.mgr-node.idvkbw [DBG] args: --image >> stop --no-container-init pull >> 2023-03-02T07:22:45.081086-0700 mgr.mgr-node.idvkbw [DBG] Running command: >> which python3 >> 2023-03-02T07:22:45.180052-0700 mgr.mgr-node.idvkbw [DBG] Running command: >> /usr/bin/python3 >> /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e >> --image stop --no-container-init pull >> 2023-03-02T07:22:46.500561-0700 mgr.mgr-node.idvkbw [DBG] code: 1 >> 2023-03-02T07:22:46.500787-0700 mgr.mgr-node.idvkbw [DBG] err: Pulling >> container image stop... >> Non-zero exit code 1 from /usr/bin/docker pull stop >> /usr/bin/docker: stdout Using default tag: latest >> /usr/bin/docker: stderr Error response from daemon: pull access denied for >> stop, repository does not exist or may require 'docker login': denied: >> requested access to the resource is denied >> ERROR: Failed command: /usr/bin/docker pull stop >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Creating a role for quota management
Hi Maybe you can use the rule of CEPHFS CLIENT CAPABILITIES and only enable the 'p' permission for some users, which will allow them to SET_VXATTR. I didn't find the similar cap from the OSD CAPBILITIES. Thanks On 07/03/2023 00:33, anantha.ad...@intel.com wrote: Hello, Can you provide details on how to create a role for allowing a set of users to set quota on CephFS pools ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io