[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Ashley Merrick
Are you sure your not being hit by: ceph config set osd bluestore_fsck_quick_fix_on_mount false @ https://docs.ceph.com/docs/master/releases/octopus/ Have all your OSD's successfully completed the fsck? Reasons I say that is I can see "20 OSD(s) reporting legacy (not per-pool) BlueStore om

[ceph-users] Nautilus 14.2.7 radosgw lifecycle not removing expired objects

2020-04-08 Thread oneill . gs
Hello, I'm running a cluster with Ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable). I've encountered an issue with my cluster where objects are marked as expired but are not removed during lifecycle processing. These buckets have a mix of objects with and withou

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Jack
Just to confirm this does not get better: root@backup1:~# ceph status cluster: id: 9cd41f0f-936d-4b59-8e5d-9b679dae9140 health: HEALTH_WARN 20 OSD(s) reporting legacy (not per-pool) BlueStore omap usage stats 4/50952060 objects unfound (0.000%) nob

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Jack
The CPU is used by userspace, not kernelspace Here is the perf top, see attachment Rocksdb eats everything :/ On 4/8/20 3:14 PM, Paul Emmerich wrote: > What's the CPU busy with while spinning at 100%? > > Check "perf top" for a quick overview > > > Paul > Samples: 1M of event 'cycles:ppp',

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Paul Emmerich
What's the CPU busy with while spinning at 100%? Check "perf top" for a quick overview Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Apr 8, 2020 at 3:09 PM

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Jack
I do: root@backup1:~# ceph config dump | grep snap_trim_sleep globaladvanced osd_snap_trim_sleep 60.00 globaladvanced osd_snap_trim_sleep_hdd 60.00 (cluster is fully rusty) On 4/8/20 2:53 PM, Dan van der Ster wrote: > Do you have a custom value for osd_snap_trim_sleep ?

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Dan van der Ster
Do you have a custom value for osd_snap_trim_sleep ? On Wed, Apr 8, 2020 at 2:03 PM Jack wrote: > > I put the nosnaptrim during upgrade because I saw high CPU usage and > though it was somehow related to the upgrade process > However, all my daemon are now running Octopus, and the issue is still

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Jack
I put the nosnaptrim during upgrade because I saw high CPU usage and though it was somehow related to the upgrade process However, all my daemon are now running Octopus, and the issue is still here, so I was wrong On 4/8/20 1:58 PM, Wido den Hollander wrote: > > > On 4/8/20 1:38 PM, Jack wrote:

[ceph-users] Re: [Octopus] OSD overloading

2020-04-08 Thread Wido den Hollander
On 4/8/20 1:38 PM, Jack wrote: > Hello, > > I've a issue, since my Nautilus -> Octopus upgrade > > My cluster has many rbd images (~3k or something) > Each of them has ~30 snapshots > Each day, I create and remove a least a snapshot per image > > Since Octopus, when I remove the "nosnaptrim"

[ceph-users] [Octopus] OSD overloading

2020-04-08 Thread Jack
Hello, I've a issue, since my Nautilus -> Octopus upgrade My cluster has many rbd images (~3k or something) Each of them has ~30 snapshots Each day, I create and remove a least a snapshot per image Since Octopus, when I remove the "nosnaptrim" flags, each OSDs uses 100% of its CPU time The whole

[ceph-users] Re: Fwd: Question on rbd maps

2020-04-08 Thread Ilya Dryomov
A note of caution, though. "rbd status" just lists watches on the image header object and a watch is not a reliable indicator of whether the image is mapped somewhere or not. It is true that all read-write mappings establish a watch, but it can come and go due to network partitions, OSD crashes o

[ceph-users] Re: Fwd: question on rbd locks

2020-04-08 Thread Glen Baars
I experienced an issue where lock didn't get cleared automatically on RBDs. When a kvm hosts crashed the locks never cleared. It was a permission issue on cephx. Maybe test with a admin user? Maybe post what permissions you have for that user with `ceph auth list`? Glen -Original Message--

[ceph-users] Re: Fwd: question on rbd locks

2020-04-08 Thread Ilya Dryomov
On Tue, Apr 7, 2020 at 6:49 PM Void Star Nill wrote: > > Hello All, > > Is there a way to specify that a lock (shared or exclusive) on an rbd > volume be released if the client machine becomes unreachable or > irresponsive? > > In one of our clusters, we use rbd locks on volumes to make sure provi