[ceph-users] Creating a role for allowing users to set quota on CpehFS pools

2023-03-06 Thread ananda a

Hello,

 How do I create a role to allow users to set quota  on specific Ceph FS pools. 
?

 Thank you

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: orchestrator issues on ceph 16.2.9

2023-03-06 Thread Ernesto Puerta
Hi Adrian,

Could you please open a tracker issue (https://tracker.ceph.com/) and share
the traceback from the Dashboard crash?

Thank you!

Kind Regards,
Ernesto


On Sat, Mar 4, 2023 at 9:50 AM Adrian Nicolae 
wrote:

> Hi,
>
> I have some orchestrator issues on our cluster running 16.2.9 with rgw
> only services.
>
> We first noticed these issues a few weeks ago when adding new hosts to
> the cluster -  the orch was not detecting the new drives to build the
> osd containers for them. Debugging the mgr logs, I noticed that the mgr
> was crashing due to the dashboard module.  I disabled the dashboard
> module and the new drives were detected and added to the cluster.
>
> Now we have other similar issues : we have a failed drive. The failure
> was detected , the osd was marked as down and the rebalancing is
> finished. I want to remove the failed osd from the cluster but it looks
> like the orch is not working :
>
>   - I launched the osd removal with 'ceph orch osd rm 92 --force' where
> 92 is the osd id in question
>
> - I checked the progress but nothing happens even after a few days :
>
> ceph orch osd rm status
> OSD  HOSTSTATEPGS  REPLACE  FORCE  ZAPDRAIN STARTED AT
> *92   node10  started0  FalseTrue   False*
>
> -  the osd process is stopped on that host and from the orch side I can
> see this :
>
> ceph orch ps --daemon_type osd --daemon_id 92
> NAMEHOSTPORTS  STATUS  REFRESHED  AGE  MEM USE  MEM LIM
> VERSIONIMAGE ID
> osd.92  node10 error 11h ago   4w-4096M
>   
>
> - I have the same long refresh interval on other osds as well. I know it
> should be 10 minutes or so :
>
> NAMEHOSTPORTS  STATUSREFRESHED  AGE  MEM USE  MEM LIM
> VERSION  IMAGE ID  CONTAINER ID
> osd.93  node09 running (4w)11h ago   4w5573M 4096M
> 16.2.9   3520ead5eb19  d2f658e9e37b
>
>   osd: *116 osds: 115 up* (since 11d), 115 in (since 11d)
>
>   90hdd  16.37109   1.0   16 TiB  4.0 TiB  4.0 TiB  0 B   17
> GiB12 TiB  24.66  0.97  146  up
> * 92hdd 0 0  0 B  0 B  0 B 0 B  0
> B   0 B  0 00down*
>   94hdd  16.37109   1.0   16 TiB  4.0 TiB  4.0 TiB  0 B   17
> GiB12 TiB  24.66  0.97  146  up
>
> - I activated debug 20 on the mgr but I can't see any errors or other
> clues regarding the osd removal. I also switched to the standby manager
> with 'ceph mgr fail'. The mgr switch works but still nothing happens
>
> - It's not only the osd removal thing. I also tried to deploy new rgw
> services by applying rgw labels on 2 new hosts , we have specs for
> building rgw containers when detecting the label.  Again, nothing happens.
>
> I'm planning to upgrade to 16.2.11 to see if this solves the issues but
> I'm not very confident, I didn't see anything regarding this in the
> changelogs. Is there anything else I can try to debug this issue ?
>
> Thanks.
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rbd on EC pool with fast and extremely slow writes/reads

2023-03-06 Thread Andrej Filipcic



Hi,

I have a problem on one of ceph clusters I do not understand.
ceph 17.2.5 on 17 servers, 400 HDD OSDs, 10 and 25Gb/s NICs

3TB rbd image is on erasure coded 8+3 pool with 128pgs , xfs filesystem, 
4MB objects in rbd image, mostly empy.


I have created a bunch of 10G files, most of them were written with 
1.5GB/s, few of them were really slow, ~10MB/s, a factor of 100.


When reading these files back, the fast-written ones are read fast, 
~2-2.5GB/s, the slowly-written are also extremely slow in reading, iotop 
shows between 1 and 30 MB/s reading speed.


This does not happen at all on replicated images. There are some OSDs 
with higher apply/commit latency, eg 200ms, but there are no slow ops.


The tests were done actually on proxmox vm with librbd, but the same 
happens with krbd, and on bare metal with mounted krbd as well.


I have tried to check all OSDs for laggy drives, but they all look about 
the same.


I have also copied entire image with "rados get...", object by object, 
the strange thing here is that most of objects were copied within 
0.1-0.2s, but quite some took more than 1s.
The cluster is quite busy with base traffic of ~1-2GB/s, so the speeds 
can vary due to that. But I would not expect a factor of 100 slowdown 
for some writes/reads with rbds.


Any clues on what might be wrong or what else to check? I have another 
similar ceph cluster where everything looks fine.


Best,
Andrej

--
_
   prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674Fax: +386-1-477-3166
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 compatible interface

2023-03-06 Thread Daniel Gryniewicz

On 3/3/23 13:53, Kai Stian Olstad wrote:

On Wed, Mar 01, 2023 at 08:39:56AM -0500, Daniel Gryniewicz wrote:
We're actually writing this for RGW right now.  It'll be a bit before 
it's productized, but it's in the works.


Just curious, what is the use cases for this feature?
S3 against CephFS?



Local FS for development use, and distributed FS (initial target is 
GPFS) for production.   There's no current plans to make it work against 
CephFS, although I would imagine it will work fine.  But if you have a 
Ceph cluster, you're much better off using standard RGW on RADOS.


Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Creating a role for quota management

2023-03-06 Thread anantha . adiga
Hello,

Can you provide details on how to create a role for allowing a set of users to 
set quota on CephFS pools ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 compatible interface

2023-03-06 Thread Fox, Kevin M
+1. If I know radosgw on top of cephfs is a thing, I may change some plans. Is 
that the planned route?

Thanks,
Kevin


From: Daniel Gryniewicz 
Sent: Monday, March 6, 2023 6:21 AM
To: Kai Stian Olstad
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: s3 compatible interface

Check twice before you click! This email originated from outside PNNL.


On 3/3/23 13:53, Kai Stian Olstad wrote:
> On Wed, Mar 01, 2023 at 08:39:56AM -0500, Daniel Gryniewicz wrote:
>> We're actually writing this for RGW right now.  It'll be a bit before
>> it's productized, but it's in the works.
>
> Just curious, what is the use cases for this feature?
> S3 against CephFS?
>

Local FS for development use, and distributed FS (initial target is
GPFS) for production.   There's no current plans to make it work against
CephFS, although I would imagine it will work fine.  But if you have a
Ceph cluster, you're much better off using standard RGW on RADOS.

Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 compatible interface

2023-03-06 Thread Daniel Gryniewicz
As far as I know, we have no plans to productize (or even test) on 
CephFS.  It should work, but CephFS isn't pure POSIX, so there may be 
issues.


Daniel

On 3/6/23 11:57, Fox, Kevin M wrote:

+1. If I know radosgw on top of cephfs is a thing, I may change some plans. Is 
that the planned route?

Thanks,
Kevin


From: Daniel Gryniewicz 
Sent: Monday, March 6, 2023 6:21 AM
To: Kai Stian Olstad
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: s3 compatible interface

Check twice before you click! This email originated from outside PNNL.


On 3/3/23 13:53, Kai Stian Olstad wrote:

On Wed, Mar 01, 2023 at 08:39:56AM -0500, Daniel Gryniewicz wrote:

We're actually writing this for RGW right now.  It'll be a bit before
it's productized, but it's in the works.


Just curious, what is the use cases for this feature?
S3 against CephFS?



Local FS for development use, and distributed FS (initial target is
GPFS) for production.   There's no current plans to make it work against
CephFS, although I would imagine it will work fine.  But if you have a
Ceph cluster, you're much better off using standard RGW on RADOS.

Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd on EC pool with fast and extremely slow writes/reads

2023-03-06 Thread Paul Mezzanini
When I have seen behavior like this it was a dying drive.  It only became 
obviously when I did a smart long test and I got failed reads.  Still reported 
smart OK though so that was a lie.



--

Paul Mezzanini
Platform Engineer III
Research Computing

Rochester Institute of Technology










From: Andrej Filipcic 
Sent: Monday, March 6, 2023 8:51 AM
To: ceph-users
Subject: [ceph-users] rbd on EC pool with fast and extremely slow writes/reads


Hi,

I have a problem on one of ceph clusters I do not understand.
ceph 17.2.5 on 17 servers, 400 HDD OSDs, 10 and 25Gb/s NICs

3TB rbd image is on erasure coded 8+3 pool with 128pgs , xfs filesystem,
4MB objects in rbd image, mostly empy.

I have created a bunch of 10G files, most of them were written with
1.5GB/s, few of them were really slow, ~10MB/s, a factor of 100.

When reading these files back, the fast-written ones are read fast,
~2-2.5GB/s, the slowly-written are also extremely slow in reading, iotop
shows between 1 and 30 MB/s reading speed.

This does not happen at all on replicated images. There are some OSDs
with higher apply/commit latency, eg 200ms, but there are no slow ops.

The tests were done actually on proxmox vm with librbd, but the same
happens with krbd, and on bare metal with mounted krbd as well.

I have tried to check all OSDs for laggy drives, but they all look about
the same.

I have also copied entire image with "rados get...", object by object,
the strange thing here is that most of objects were copied within
0.1-0.2s, but quite some took more than 1s.
The cluster is quite busy with base traffic of ~1-2GB/s, so the speeds
can vary due to that. But I would not expect a factor of 100 slowdown
for some writes/reads with rbds.

Any clues on what might be wrong or what else to check? I have another
similar ceph cluster where everything looks fine.

Best,
Andrej

--
_
prof. dr. Andrej Filipcic,   E-mail: andrej.filip...@ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001 Ljubljana, Slovenia
Tel.: +386-1-477-3674Fax: +386-1-477-3166
-
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-06 Thread Adam King
Can I see the output of `ceph orch upgrade status` and `ceph config dump |
grep image`? The "Pulling container image stop" implies somehow (as Eugen
pointed out) that cephadm thinks the image to pull is named "stop" which
means it is likely set as either the image to upgrade to or as one of the
config options.

On Sat, Mar 4, 2023 at 2:06 AM  wrote:

> I initially ran the upgrade fine but it failed @ around 40/100 on an osd,
> so after waiting for  along time i thought I'd try restarting it and then
> restarting the upgrade.
> I am stuck with the below debug error, I have tested docker pull from
> other servers and they dont fail for the ceph images but on ceph it does.
> If i even try to redeploy or add or remove mon damons for example it comes
> up with the same error related to the images.
>
> The error that ceph is giving me is:
> 2023-03-02T07:22:45.063976-0700 mgr.mgr-node.idvkbw [DBG] _run_cephadm :
> args = []
> 2023-03-02T07:22:45.070342-0700 mgr.mgr-node.idvkbw [DBG] args: --image
> stop --no-container-init pull
> 2023-03-02T07:22:45.081086-0700 mgr.mgr-node.idvkbw [DBG] Running command:
> which python3
> 2023-03-02T07:22:45.180052-0700 mgr.mgr-node.idvkbw [DBG] Running command:
> /usr/bin/python3
> /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e
> --image stop --no-container-init pull
> 2023-03-02T07:22:46.500561-0700 mgr.mgr-node.idvkbw [DBG] code: 1
> 2023-03-02T07:22:46.500787-0700 mgr.mgr-node.idvkbw [DBG] err: Pulling
> container image stop...
> Non-zero exit code 1 from /usr/bin/docker pull stop
> /usr/bin/docker: stdout Using default tag: latest
> /usr/bin/docker: stderr Error response from daemon: pull access denied for
> stop, repository does not exist or may require 'docker login': denied:
> requested access to the resource is denied
> ERROR: Failed command: /usr/bin/docker pull stop
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issue upgrading 17.2.0 to 17.2.5

2023-03-06 Thread David Orman
I've seen what appears to be the same post on Reddit, previously, and attempted 
to assist. My suspicion is a "stop" command was passed to ceph orch upgrade in 
an attempt to stop it, but with the --image flag preceding it, setting the 
image to stop. I asked the user to do an actual upgrade stop, then re-attempt 
specifying a different image, and the user indicated the "stop" image pull 
attempt continued. That part didn't seem right, to which I suggested a bug 
report.

https://www.reddit.com/r/ceph/comments/11g3rze/anyone_having_pull_issues_with_ceph_images/

@OP - are you the same poster as the above, or do you just have the same 
problem? If there's multiple users with this, it would indicate something 
larger than just a misplaced option/flag/command. If it is you - could you link 
to the bug report?

Just to make sure, you've issued:

"ceph orch upgrade stop"

Then performed another "ceph orch upgrade start" specifying a --ceph-version or 
--image?

I'll also echo Adam's request for a "ceph config dump |grep image". It sounds 
like it's still set to "stop", but I'd have expected the above to initiate an 
upgrade to the correct image. If not, the bug report would be helpful to 
continue so it could be fixed.

David

On Mon, Mar 6, 2023, at 15:02, Adam King wrote:
> Can I see the output of `ceph orch upgrade status` and `ceph config dump |
> grep image`? The "Pulling container image stop" implies somehow (as Eugen
> pointed out) that cephadm thinks the image to pull is named "stop" which
> means it is likely set as either the image to upgrade to or as one of the
> config options.
>
> On Sat, Mar 4, 2023 at 2:06 AM  wrote:
>
>> I initially ran the upgrade fine but it failed @ around 40/100 on an osd,
>> so after waiting for  along time i thought I'd try restarting it and then
>> restarting the upgrade.
>> I am stuck with the below debug error, I have tested docker pull from
>> other servers and they dont fail for the ceph images but on ceph it does.
>> If i even try to redeploy or add or remove mon damons for example it comes
>> up with the same error related to the images.
>>
>> The error that ceph is giving me is:
>> 2023-03-02T07:22:45.063976-0700 mgr.mgr-node.idvkbw [DBG] _run_cephadm :
>> args = []
>> 2023-03-02T07:22:45.070342-0700 mgr.mgr-node.idvkbw [DBG] args: --image
>> stop --no-container-init pull
>> 2023-03-02T07:22:45.081086-0700 mgr.mgr-node.idvkbw [DBG] Running command:
>> which python3
>> 2023-03-02T07:22:45.180052-0700 mgr.mgr-node.idvkbw [DBG] Running command:
>> /usr/bin/python3
>> /var/lib/ceph/5058e342-dac7-11ec-ada3-01065e90228d/cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4e
>> --image stop --no-container-init pull
>> 2023-03-02T07:22:46.500561-0700 mgr.mgr-node.idvkbw [DBG] code: 1
>> 2023-03-02T07:22:46.500787-0700 mgr.mgr-node.idvkbw [DBG] err: Pulling
>> container image stop...
>> Non-zero exit code 1 from /usr/bin/docker pull stop
>> /usr/bin/docker: stdout Using default tag: latest
>> /usr/bin/docker: stderr Error response from daemon: pull access denied for
>> stop, repository does not exist or may require 'docker login': denied:
>> requested access to the resource is denied
>> ERROR: Failed command: /usr/bin/docker pull stop
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Creating a role for quota management

2023-03-06 Thread Xiubo Li

Hi

Maybe you can use the rule of CEPHFS CLIENT CAPABILITIES and only enable 
the 'p' permission for some users, which will allow them to SET_VXATTR.


I didn't find the similar cap from the OSD CAPBILITIES.

Thanks

On 07/03/2023 00:33, anantha.ad...@intel.com wrote:

Hello,

Can you provide details on how to create a role for allowing a set of users to 
set quota on CephFS pools ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io