[ceph-users] Re: cephadm host maintenance

2022-07-13 Thread Adam King
Hello Steven,

Arguably, it should, but right now nothing is implemented to do so and
you'd have to manually run the "ceph mgr fail
node2-cobj2-atdev1-nvan.ghxlvw" before it would allow you to put the host
in maintenance. It's non-trivial from a technical point of view to have it
automatically do the switch as the cephadm instance is running on that
active mgr, so it will have to store somewhere that we wanted this host in
maintenance, fail over the mgr itself, then have the new cephadm instance
pick up that we wanted the host in maintenance and do so. Possible, but not
something anyone has had a chance to implement. FWIW, I do believe there
are also plans to eventually have a playbook for a rolling reboot or
something of the sort added to https://github.com/ceph/cephadm-ansible. But
for now, I think some sort of intervention to cause the fail over to happen
before running the maintenance enter command is necessary.

Regards,
 - Adam King

On Wed, Jul 13, 2022 at 11:02 AM Steven Goodliff <
steven.goodl...@globalrelay.net> wrote:

>
> Hi,
>
>
> I'm trying to reboot a ceph cluster one instance at a time by running in a
> Ansible playbook which basically runs
>
>
> cephadm shell ceph orch host maintenance enter   and then
> reboots the instance and exits the maintenance
>
>
> but i get
>
>
> ALERT: Cannot stop active Mgr daemon, Please switch active Mgrs with 'ceph
> mgr fail node2-cobj2-atdev1-nvan.ghxlvw'
>
>
> on one instance.  should cephadm handle the switch ?
>
>
> thanks
>
> Steven Goodliff
> Global Relay
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm host maintenance

2022-07-13 Thread Robert Gallop
This brings up a good follow on…. Rebooting in general for OS patching.

I have not been leveraging the maintenance mode function, as I found it was
really no different than just setting noout and doing the reboot.  I find
if the box is the active manager the failover happens quick, painless and
automatically.  All the OSD’s just show as missing and come back once the
box is back from reboot…

Am I causing issues I may not be aware of?  How is everyone handling
patching reboots?

The only place I’m careful is the active MDS nodes, since that failover
does cause a period of no i/o for the mounted clients, I generally fail
that manually so I can ensure I don’t have to wait for the MDS to figure
out an instance is gone and spin up a standby….

Any tips or techniques until there is a more holistic approach?

Thanks!


On Wed, Jul 13, 2022 at 9:49 AM Adam King  wrote:

> Hello Steven,
>
> Arguably, it should, but right now nothing is implemented to do so and
> you'd have to manually run the "ceph mgr fail
> node2-cobj2-atdev1-nvan.ghxlvw" before it would allow you to put the host
> in maintenance. It's non-trivial from a technical point of view to have it
> automatically do the switch as the cephadm instance is running on that
> active mgr, so it will have to store somewhere that we wanted this host in
> maintenance, fail over the mgr itself, then have the new cephadm instance
> pick up that we wanted the host in maintenance and do so. Possible, but not
> something anyone has had a chance to implement. FWIW, I do believe there
> are also plans to eventually have a playbook for a rolling reboot or
> something of the sort added to https://github.com/ceph/cephadm-ansible.
> But
> for now, I think some sort of intervention to cause the fail over to happen
> before running the maintenance enter command is necessary.
>
> Regards,
>  - Adam King
>
> On Wed, Jul 13, 2022 at 11:02 AM Steven Goodliff <
> steven.goodl...@globalrelay.net> wrote:
>
> >
> > Hi,
> >
> >
> > I'm trying to reboot a ceph cluster one instance at a time by running in
> a
> > Ansible playbook which basically runs
> >
> >
> > cephadm shell ceph orch host maintenance enter   and then
> > reboots the instance and exits the maintenance
> >
> >
> > but i get
> >
> >
> > ALERT: Cannot stop active Mgr daemon, Please switch active Mgrs with
> 'ceph
> > mgr fail node2-cobj2-atdev1-nvan.ghxlvw'
> >
> >
> > on one instance.  should cephadm handle the switch ?
> >
> >
> > thanks
> >
> > Steven Goodliff
> > Global Relay
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Radosgw issues after upgrade to 14.2.21

2022-07-13 Thread richard.andr...@centro.net
Hello,

We recently upgraded a 3 node cluster running luminous 12.2.13(ceph repos) on 
Debian 9 to Nautilus v14.2.21(Debian stable repo) on Debian 11.  For the most 
part everything seems to be fine with the exception of access to the bucket 
defined inside of RadosGW.

Since the upgrade users are now getting 403 Access Denied when trying to list 
their objects and/or put new objects in the bucket.   We've attempted to 
re-apply the IAM policies defined on the bucket for the users, however that 
fails.  Even after taking ownership of the bucket with a newly created 
account's credentials via radosgw-admin.  We've also added caps(buckets *, 
users *, policy *) to the newly created bucket "admin" but that didn't help 
either in re-applying the IAM policies.

What are we missing?


Richard Andrews

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rbd iostat requires pool specified

2022-07-13 Thread Reed Dier
Hoping this may be trivial to point me towards, but I typically keep a 
background screen running `rbd perf image iostat` that shows all of the rbd 
devices with io, and how busy that disk may be at any given moment.

Recently after upgrading everything to latest octopus release (15.2.16), it no 
longer allows for not specifying the pool, which then means I can’t blend all 
rbd pools together into a single view.
How it used to appear:
> NAMEWRRDWR_BYTESRD_BYTES  
> WR_LATRD_LAT
> rbd-ssd/app1 322/s   0/s   5.6 MiB/s   0 B/s 2.28 
> ms   0.00 ns
> rbd-ssd/app2 223/s   5/s   2.1 MiB/s   147 KiB/s 3.56 
> ms   1.12 ms
> rbd-hybrid/app3   76/s   0/s11 MiB/s   0 B/s16.61 
> ms   0.00 ns
> rbd-hybrid/app4   11/s   0/s   395 KiB/s   0 B/s51.29 
> ms   0.00 ns
> rbd-hybrid/app53/s   0/s74 KiB/s   0 B/s   151.54 
> ms   0.00 ns
> rbd-hybrid/app60/s   0/s42 KiB/s   0 B/s13.90 
> ms   0.00 ns
> rbd-hybrid/app70/s   0/s   2.4 KiB/s   0 B/s 1.70 
> ms   0.00 ns
> 
> NAMEWRRDWR_BYTES   RD_BYTES 
> WR_LAT  RD_LAT
> rbd-ssd/app1 483/s   0/s   7.3 MiB/s  0 B/s2.17 
> ms 0.00 ns
> rbd-ssd/app2 279/s   5/s   2.5 MiB/s   69 KiB/s3.82 
> ms   516.30 us
> rbd-hybrid/app3  147/s   0/s10 MiB/s  0 B/s8.59 
> ms 0.00 ns
> rbd-hybrid/app6   10/s   0/s   425 KiB/s  0 B/s   75.79 
> ms 0.00 ns
> rbd-hybrid/app80/s   0/s   2.4 KiB/s  0 B/s1.85 
> ms 0.00 ns


> $ uname -r && rbd --version && rbd perf image iostat
> 5.4.0-107-generic
> ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus 
> (stable)
> rbd: mgr command failed: (2) No such file or directory: [errno 2] RADOS 
> object not found (Pool 'rbd' not found)

This is ubuntu 20.04, using packages rather than cephadm.
I do not have a pool named `rbd` so that is correct, but I have a handful of 
pools with the rbd application set.

> $ for pool in rbd-{ssd,hybrid,ec82} ; do ceph osd pool application get $pool 
> ; done
> {
> "rbd": {}
> }
> {
> "rbd": {}
> }
> {
> "rbd": {}
> }

Looking at the help output, it doesn’t seem to imply that the `pool-spec` is 
optional, and it won’t take wildcard globs like `rbd*` for the pool name.

> $ rbd help perf image iostat
> usage: rbd perf image iostat [--pool ] [--namespace ]
>  [--iterations ] [--sort-by ]
>  [--format ] [--pretty-format]
>  
> 
> Display image IO statistics.
> 
> Positional arguments
>   pool specification
>  (example: [/]
> 
> Optional arguments
>   -p [ --pool ] arg  pool name
>   --namespace argnamespace name
>   --iterations arg   iterations of metric collection [> 0]
>   --sort-by arg (=write_ops) sort-by IO metric (write-ops, read-ops,
>  write-bytes, read-bytes, write-latency,
>  read-latency) [default: write-ops]
>   --format arg   output format (plain, json, or xml) [default:
>  plain]
>   --pretty-formatpretty formatting (json and xml)

Setting a pool name to one of my rbd pools either as pool-spec or -p/—pool 
works, but obviously only for that pool, and not for *all* rbd pools, as it 
functioned previously, in what appears to have been 15.2.13 previously.
I didn’t see a PR mentioned in the 5.2.14-16 release notes that seemed to 
mention changes to rbd that would affect this, but I could have glossed over 
something.
Appreciate any pointers.

Thanks,
Reed
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-13 Thread Zakhar Kirpichenko
Hi!

My apologies for butting in. Please confirm
that bluestore_prefer_deferred_size_hdd is a runtime option, which doesn't
require OSDs to be stopped or rebuilt?

Best regards,
Zakhar

On Tue, 12 Jul 2022 at 14:46, Dan van der Ster  wrote:

> Hi Igor,
>
> Thank you for the reply and information.
> I confirm that `ceph config set osd bluestore_prefer_deferred_size_hdd
> 65537` correctly defers writes in my clusters.
>
> Best regards,
>
> Dan
>
>
>
> On Tue, Jul 12, 2022 at 1:16 PM Igor Fedotov 
> wrote:
> >
> > Hi Dan,
> >
> > I can confirm this is a regression introduced by
> https://github.com/ceph/ceph/pull/42725.
> >
> > Indeed strict comparison is a key point in your specific case but
> generally  it looks like this piece of code needs more redesign to better
> handle fragmented allocations (and issue deferred write for every short
> enough fragment independently).
> >
> > So I'm looking for a way to improve that at the moment. Will fallback to
> trivial comparison fix if I fail to do find better solution.
> >
> > Meanwhile you can adjust bluestore_min_alloc_size_hdd indeed but I'd
> prefer not to raise it that high as 128K to avoid too many writes being
> deferred (and hence DB overburden).
> >
> > IMO setting the parameter to 64K+1 should be fine.
> >
> >
> > Thanks,
> >
> > Igor
> >
> > On 7/7/2022 12:43 AM, Dan van der Ster wrote:
> >
> > Hi Igor and others,
> >
> > (apologies for html, but i want to share a plot ;) )
> >
> > We're upgrading clusters to v16.2.9 from v15.2.16, and our simple "rados
> bench -p test 10 write -b 4096 -t 1" latency probe showed something is very
> wrong with deferred writes in pacific.
> > Here is an example cluster, upgraded today:
> >
> >
> >
> > The OSDs are 12TB HDDs, formatted in nautilus with the default
> bluestore_min_alloc_size_hdd = 64kB, and each have a large flash block.db.
> >
> > I found that the performance issue is because 4kB writes are no longer
> deferred from those pre-pacific hdds to flash in pacific with the default
> config !!!
> > Here are example bench writes from both releases:
> https://pastebin.com/raw/m0yL1H9Z
> >
> > I worked out that the issue is fixed if I set
> bluestore_prefer_deferred_size_hdd = 128k (up from the 64k pacific default.
> Note the default was 32k in octopus).
> >
> > I think this is related to the fixes in
> https://tracker.ceph.com/issues/52089 which landed in 16.2.6 --
> _do_alloc_write is comparing the prealloc size 0x1 with
> bluestore_prefer_deferred_size_hdd (0x1) and the "strictly less than"
> condition prevents deferred writes from ever happening.
> >
> > So I think this would impact anyone upgrading clusters with hdd/ssd
> mixed osds ... surely we must not be the only clusters impacted by this?!
> >
> > Should we increase the default bluestore_prefer_deferred_size_hdd up to
> 128kB or is there in fact a bug here?
> >
> > Best Regards,
> >
> > Dan
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-13 Thread Dan van der Ster
Yes, that is correct. No need to restart the osds.

.. Dan


On Thu., Jul. 14, 2022, 07:04 Zakhar Kirpichenko,  wrote:

> Hi!
>
> My apologies for butting in. Please confirm
> that bluestore_prefer_deferred_size_hdd is a runtime option, which doesn't
> require OSDs to be stopped or rebuilt?
>
> Best regards,
> Zakhar
>
> On Tue, 12 Jul 2022 at 14:46, Dan van der Ster  wrote:
>
>> Hi Igor,
>>
>> Thank you for the reply and information.
>> I confirm that `ceph config set osd bluestore_prefer_deferred_size_hdd
>> 65537` correctly defers writes in my clusters.
>>
>> Best regards,
>>
>> Dan
>>
>>
>>
>> On Tue, Jul 12, 2022 at 1:16 PM Igor Fedotov 
>> wrote:
>> >
>> > Hi Dan,
>> >
>> > I can confirm this is a regression introduced by
>> https://github.com/ceph/ceph/pull/42725.
>> >
>> > Indeed strict comparison is a key point in your specific case but
>> generally  it looks like this piece of code needs more redesign to better
>> handle fragmented allocations (and issue deferred write for every short
>> enough fragment independently).
>> >
>> > So I'm looking for a way to improve that at the moment. Will fallback
>> to trivial comparison fix if I fail to do find better solution.
>> >
>> > Meanwhile you can adjust bluestore_min_alloc_size_hdd indeed but I'd
>> prefer not to raise it that high as 128K to avoid too many writes being
>> deferred (and hence DB overburden).
>> >
>> > IMO setting the parameter to 64K+1 should be fine.
>> >
>> >
>> > Thanks,
>> >
>> > Igor
>> >
>> > On 7/7/2022 12:43 AM, Dan van der Ster wrote:
>> >
>> > Hi Igor and others,
>> >
>> > (apologies for html, but i want to share a plot ;) )
>> >
>> > We're upgrading clusters to v16.2.9 from v15.2.16, and our simple
>> "rados bench -p test 10 write -b 4096 -t 1" latency probe showed something
>> is very wrong with deferred writes in pacific.
>> > Here is an example cluster, upgraded today:
>> >
>> >
>> >
>> > The OSDs are 12TB HDDs, formatted in nautilus with the default
>> bluestore_min_alloc_size_hdd = 64kB, and each have a large flash block.db.
>> >
>> > I found that the performance issue is because 4kB writes are no longer
>> deferred from those pre-pacific hdds to flash in pacific with the default
>> config !!!
>> > Here are example bench writes from both releases:
>> https://pastebin.com/raw/m0yL1H9Z
>> >
>> > I worked out that the issue is fixed if I set
>> bluestore_prefer_deferred_size_hdd = 128k (up from the 64k pacific default.
>> Note the default was 32k in octopus).
>> >
>> > I think this is related to the fixes in
>> https://tracker.ceph.com/issues/52089 which landed in 16.2.6 --
>> _do_alloc_write is comparing the prealloc size 0x1 with
>> bluestore_prefer_deferred_size_hdd (0x1) and the "strictly less than"
>> condition prevents deferred writes from ever happening.
>> >
>> > So I think this would impact anyone upgrading clusters with hdd/ssd
>> mixed osds ... surely we must not be the only clusters impacted by this?!
>> >
>> > Should we increase the default bluestore_prefer_deferred_size_hdd up to
>> 128kB or is there in fact a bug here?
>> >
>> > Best Regards,
>> >
>> > Dan
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-13 Thread Zakhar Kirpichenko
Many thanks, Dan. Much appreciated!

/Z

On Thu, 14 Jul 2022 at 08:43, Dan van der Ster  wrote:

> Yes, that is correct. No need to restart the osds.
>
> .. Dan
>
>
> On Thu., Jul. 14, 2022, 07:04 Zakhar Kirpichenko, 
> wrote:
>
>> Hi!
>>
>> My apologies for butting in. Please confirm
>> that bluestore_prefer_deferred_size_hdd is a runtime option, which doesn't
>> require OSDs to be stopped or rebuilt?
>>
>> Best regards,
>> Zakhar
>>
>> On Tue, 12 Jul 2022 at 14:46, Dan van der Ster 
>> wrote:
>>
>>> Hi Igor,
>>>
>>> Thank you for the reply and information.
>>> I confirm that `ceph config set osd bluestore_prefer_deferred_size_hdd
>>> 65537` correctly defers writes in my clusters.
>>>
>>> Best regards,
>>>
>>> Dan
>>>
>>>
>>>
>>> On Tue, Jul 12, 2022 at 1:16 PM Igor Fedotov 
>>> wrote:
>>> >
>>> > Hi Dan,
>>> >
>>> > I can confirm this is a regression introduced by
>>> https://github.com/ceph/ceph/pull/42725.
>>> >
>>> > Indeed strict comparison is a key point in your specific case but
>>> generally  it looks like this piece of code needs more redesign to better
>>> handle fragmented allocations (and issue deferred write for every short
>>> enough fragment independently).
>>> >
>>> > So I'm looking for a way to improve that at the moment. Will fallback
>>> to trivial comparison fix if I fail to do find better solution.
>>> >
>>> > Meanwhile you can adjust bluestore_min_alloc_size_hdd indeed but I'd
>>> prefer not to raise it that high as 128K to avoid too many writes being
>>> deferred (and hence DB overburden).
>>> >
>>> > IMO setting the parameter to 64K+1 should be fine.
>>> >
>>> >
>>> > Thanks,
>>> >
>>> > Igor
>>> >
>>> > On 7/7/2022 12:43 AM, Dan van der Ster wrote:
>>> >
>>> > Hi Igor and others,
>>> >
>>> > (apologies for html, but i want to share a plot ;) )
>>> >
>>> > We're upgrading clusters to v16.2.9 from v15.2.16, and our simple
>>> "rados bench -p test 10 write -b 4096 -t 1" latency probe showed something
>>> is very wrong with deferred writes in pacific.
>>> > Here is an example cluster, upgraded today:
>>> >
>>> >
>>> >
>>> > The OSDs are 12TB HDDs, formatted in nautilus with the default
>>> bluestore_min_alloc_size_hdd = 64kB, and each have a large flash block.db.
>>> >
>>> > I found that the performance issue is because 4kB writes are no longer
>>> deferred from those pre-pacific hdds to flash in pacific with the default
>>> config !!!
>>> > Here are example bench writes from both releases:
>>> https://pastebin.com/raw/m0yL1H9Z
>>> >
>>> > I worked out that the issue is fixed if I set
>>> bluestore_prefer_deferred_size_hdd = 128k (up from the 64k pacific default.
>>> Note the default was 32k in octopus).
>>> >
>>> > I think this is related to the fixes in
>>> https://tracker.ceph.com/issues/52089 which landed in 16.2.6 --
>>> _do_alloc_write is comparing the prealloc size 0x1 with
>>> bluestore_prefer_deferred_size_hdd (0x1) and the "strictly less than"
>>> condition prevents deferred writes from ever happening.
>>> >
>>> > So I think this would impact anyone upgrading clusters with hdd/ssd
>>> mixed osds ... surely we must not be the only clusters impacted by this?!
>>> >
>>> > Should we increase the default bluestore_prefer_deferred_size_hdd up
>>> to 128kB or is there in fact a bug here?
>>> >
>>> > Best Regards,
>>> >
>>> > Dan
>>> >
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS snapshots with samba shadowcopy

2022-07-13 Thread Sebastian Knust

Hi,

I am providing CephFS snapshots via Samba with the shadow_copy2 VFS 
object. I am running CentOS 7 with smbd 4.10.16 for which ceph_snapshots 
is not available AFAIK.


Snapshots are created by a cronjob above the root of my shares with
  export TZ=GMT
  mkdir /cephfs/path/.snap/`date +@GMT-%Y.%m.%d-%H.%M.%S`
i.e. the exported shares are subfolders of the folder in which I create 
snapshots.


Samba configuration is:
  [global]
  ...
  shadow:snapdir = .snap
  shadow:snapdirseverywhere = yes
  shadow:format = _@GMT-%Y.%m.%d-%H.%M.%S_some-inode-number
  ...
  [sharename]
  ...
  path = /cephfs/path_to_main_root/share
  vfs object = shadow_copy2
  ...
  [other_share_with_different_root]
  ...
  path = /cephfs/path_to_different_root/other_share
  vfs object = shadow_copy2
  shadow:format = _@GMT-%Y.%m.%d-%H.%M.%S_other-inode-number

The inode numbers in the configuration are of course the inode numbers 
of the directory containing the snapshots.


Cheers
Sebastian

On 13.07.22 02:08, Bailey Allison wrote:

Hi All,

  


Curious if anyone is making use of samba shadowcopy with CephFS snapshots
using the vfs object ceph_snapshots?

  


I've had wildly different results on an Ubuntu 20.04 LTS samba server where
the snaps just do not appear at all within shadowcopy, and a Rocky Linux
samba server where the snaps do appear within shadowcopy but when opening
them they contain absolutely no files at all.

  


Both the Ubuntu and Rocky samba server are sharing out kernel cephfs mount
via samba, ceph version is 17.2.1 and samba version is 4.13.7 for Ubuntu
20.04 and 4.15.5 for Rocky Linux.

  


I have also tried using a samba fuse mount with vfs_ceph with the same
results.

  


More so just curious to see if anyone on the list has had success with
making use of the ceph_snapshots vfs object and if they can share how it has
worked for them.

  


Included below is the share config for both Ubuntu and Rocky if anyone is
curious:

  


Ubuntu 20.04 LTS

  


[public]

 force group = nogroup

 force user = nobody

 guest ok = Yes

 path = /mnt/cephfs/public

 read only = No

 vfs objects = ceph_snapshots

  


Rocky Linux

  


[public]

 force group = nogroup

 force user = nobody

 guest ok = Yes

 path = /mnt/cephfs/public

 read only = No

 vfs objects = ceph_snapshots

  


Regards,

  


Bailey

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Dr. Sebastian Knust  | Bielefeld University
IT Administrator | Faculty of Physics
Office: D2-110   | Universitätsstr. 25
Phone: +49 521 106 5234  | 33615 Bielefeld
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MGR permissions question

2022-07-13 Thread Robert Reihs
Hi,
we have discovered this solution for CSI plugin permissions:
https://github.com/ceph/ceph-csi/issues/2687#issuecomment-1014360244
We are not sure of the implications of adding the mgr permissions to the
(non admin) user.
The documentation seems to be sparse on this topic. Is it ok to give a
limited user just mgr permissions or can we restrict it.
Thanks
Best
Robert
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: size=1 min_size=0 any way to set?

2022-07-13 Thread huxia...@horebdata.cn
Just go straightforward to set size=1 and min_size=1. Setting min_size to 0 
does not make any sense.




huxia...@horebdata.cn
 
From: Szabo, Istvan (Agoda)
Date: 2022-07-13 11:38
To: ceph-users@ceph.io
Subject: [ceph-users] size=1 min_size=0 any way to set?
Hi,
 
Is there a way to set this? Yes I know it will have immediately data loss but 
the data is not important and it can be reproduced easily so temporarily would 
like to set, however ceph doesn't allow:
 
Error EINVAL: pool min_size must be between 1 and size, which is set to 1
 
Anybody knows any way?
 
Thank you
 

This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: size=1 min_size=0 any way to set?

2022-07-13 Thread huxia...@horebdata.cn
As far as I know, one can read and write with it.



huxia...@horebdata.cn
 
From: Szabo, Istvan (Agoda)
Date: 2022-07-13 11:49
To: huxia...@horebdata.cn
CC: ceph-users
Subject: RE: [ceph-users] size=1 min_size=0 any way to set?
But that one makes the pool read only I guess right?
 
Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---
 
From: huxia...@horebdata.cn  
Sent: Wednesday, July 13, 2022 4:48 PM
To: Szabo, Istvan (Agoda) 
Cc: ceph-users 
Subject: Re: [ceph-users] size=1 min_size=0 any way to set?
 
Email received from the internet. If in doubt, don't click any link nor open 
any attachment ! 


Just go straightforward to set size=1 and min_size=1. Setting min_size to 0 
does not make any sense.
 
 


huxia...@horebdata.cn
 
From: Szabo, Istvan (Agoda)
Date: 2022-07-13 11:38
To: ceph-users@ceph.io
Subject: [ceph-users] size=1 min_size=0 any way to set?
Hi,
 
Is there a way to set this? Yes I know it will have immediately data loss but 
the data is not important and it can be reproduced easily so temporarily would 
like to set, however ceph doesn't allow:
 
Error EINVAL: pool min_size must be between 1 and size, which is set to 1
 
Anybody knows any way?
 
Thank you
 

This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-13 Thread David Orman
Is this something that makes sense to do the 'quick' fix on for the next
pacific release to minimize impact to users until the improved iteration
can be implemented?

On Tue, Jul 12, 2022 at 6:16 AM Igor Fedotov  wrote:

> Hi Dan,
>
> I can confirm this is a regression introduced by
> https://github.com/ceph/ceph/pull/42725.
>
> Indeed strict comparison is a key point in your specific case but
> generally  it looks like this piece of code needs more redesign to
> better handle fragmented allocations (and issue deferred write for every
> short enough fragment independently).
>
> So I'm looking for a way to improve that at the moment. Will fallback to
> trivial comparison fix if I fail to do find better solution.
>
> Meanwhile you can adjust bluestore_min_alloc_size_hdd indeed but I'd
> prefer not to raise it that high as 128K to avoid too many writes being
> deferred (and hence DB overburden).
>
> IMO setting the parameter to 64K+1 should be fine.
>
>
> Thanks,
>
> Igor
>
> On 7/7/2022 12:43 AM, Dan van der Ster wrote:
> > Hi Igor and others,
> >
> > (apologies for html, but i want to share a plot ;) )
> >
> > We're upgrading clusters to v16.2.9 from v15.2.16, and our simple
> > "rados bench -p test 10 write -b 4096 -t 1" latency probe showed
> > something is very wrong with deferred writes in pacific.
> > Here is an example cluster, upgraded today:
> >
> > image.png
> >
> > The OSDs are 12TB HDDs, formatted in nautilus with the default
> > bluestore_min_alloc_size_hdd = 64kB, and each have a large flash
> block.db.
> >
> > I found that the performance issue is because 4kB writes are no longer
> > deferred from those pre-pacific hdds to flash in pacific with the
> > default config !!!
> > Here are example bench writes from both releases:
> > https://pastebin.com/raw/m0yL1H9Z
> >
> > I worked out that the issue is fixed if I set
> > bluestore_prefer_deferred_size_hdd = 128k (up from the 64k pacific
> > default. Note the default was 32k in octopus).
> >
> > I think this is related to the fixes in
> > https://tracker.ceph.com/issues/52089 which landed in 16.2.6 --
> > _do_alloc_write is comparing the prealloc size 0x1 with
> > bluestore_prefer_deferred_size_hdd (0x1) and the "strictly less
> > than" condition prevents deferred writes from ever happening.
> >
> > So I think this would impact anyone upgrading clusters with hdd/ssd
> > mixed osds ... surely we must not be the only clusters impacted by this?!
> >
> > Should we increase the default bluestore_prefer_deferred_size_hdd up
> > to 128kB or is there in fact a bug here?
> >
> > Best Regards,
> >
> > Dan
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm host maintenance

2022-07-13 Thread Steven Goodliff


Hi,


I'm trying to reboot a ceph cluster one instance at a time by running in a 
Ansible playbook which basically runs


cephadm shell ceph orch host maintenance enter   and then reboots the 
instance and exits the maintenance


but i get


ALERT: Cannot stop active Mgr daemon, Please switch active Mgrs with 'ceph mgr 
fail node2-cobj2-atdev1-nvan.ghxlvw'


on one instance.  should cephadm handle the switch ?


thanks

Steven Goodliff
Global Relay
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific doesn't defer small writes for pre-pacific hdd osds

2022-07-13 Thread Igor Fedotov
May be. My plan is to attempt to make general fix and if this wouldn't 
work within a short time frame - publish a 'quick' one.



On 7/13/2022 4:58 PM, David Orman wrote:
Is this something that makes sense to do the 'quick' fix on for the 
next pacific release to minimize impact to users until the improved 
iteration can be implemented?


On Tue, Jul 12, 2022 at 6:16 AM Igor Fedotov  
wrote:


Hi Dan,

I can confirm this is a regression introduced by
https://github.com/ceph/ceph/pull/42725.

Indeed strict comparison is a key point in your specific case but
generally  it looks like this piece of code needs more redesign to
better handle fragmented allocations (and issue deferred write for
every
short enough fragment independently).

So I'm looking for a way to improve that at the moment. Will
fallback to
trivial comparison fix if I fail to do find better solution.

Meanwhile you can adjust bluestore_min_alloc_size_hdd indeed but I'd
prefer not to raise it that high as 128K to avoid too many writes
being
deferred (and hence DB overburden).

IMO setting the parameter to 64K+1 should be fine.


Thanks,

Igor

On 7/7/2022 12:43 AM, Dan van der Ster wrote:
> Hi Igor and others,
>
> (apologies for html, but i want to share a plot ;) )
>
> We're upgrading clusters to v16.2.9 from v15.2.16, and our simple
> "rados bench -p test 10 write -b 4096 -t 1" latency probe showed
> something is very wrong with deferred writes in pacific.
> Here is an example cluster, upgraded today:
>
> image.png
>
> The OSDs are 12TB HDDs, formatted in nautilus with the default
> bluestore_min_alloc_size_hdd = 64kB, and each have a large flash
block.db.
>
> I found that the performance issue is because 4kB writes are no
longer
> deferred from those pre-pacific hdds to flash in pacific with the
> default config !!!
> Here are example bench writes from both releases:
> https://pastebin.com/raw/m0yL1H9Z
>
> I worked out that the issue is fixed if I set
> bluestore_prefer_deferred_size_hdd = 128k (up from the 64k pacific
> default. Note the default was 32k in octopus).
>
> I think this is related to the fixes in
> https://tracker.ceph.com/issues/52089 which landed in 16.2.6 --
> _do_alloc_write is comparing the prealloc size 0x1 with
> bluestore_prefer_deferred_size_hdd (0x1) and the "strictly less
> than" condition prevents deferred writes from ever happening.
>
> So I think this would impact anyone upgrading clusters with hdd/ssd
> mixed osds ... surely we must not be the only clusters impacted
by this?!
>
> Should we increase the default
bluestore_prefer_deferred_size_hdd up
> to 128kB or is there in fact a bug here?
>
> Best Regards,
>
> Dan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io