Re: [ceph-users] x pgs not deep-scrubbed in time

2019-04-04 Thread Alexandru Cucu
scrubs performed, would you expect > tweaking any of those values to let the deep-scrubs finish in time/ > > Thanks, > Michael > > On Wed, 3 Apr 2019 at 10:30, Alexandru Cucu wrote: >> >> Hello, >> >> You can increase *osd scrub max interval* and *osd d

Re: [ceph-users] x pgs not deep-scrubbed in time

2019-04-03 Thread Alexandru Cucu
Hello, You can increase *osd scrub max interval* and *osd deep scrub interval* if you don't want at least one scrub/deep scrub per week. I would also play with *osd max scrubs* and *osd scrub load threshold* to do more scrubbing work, but be careful as it will have a huge impact on performance.

Re: [ceph-users] Resizing a cache tier rbd

2019-03-27 Thread Alexandru Cucu
Hello, On Wed, Mar 27, 2019 at 1:19 AM Jason Dillaman wrote: > When using cache pools (which are essentially deprecated functionality > BTW), you should always reference the base tier pool. Could you point to more details about the plan to deprecate cache tiers? AFAIK and as far as he documentat

Re: [ceph-users] Right way to delete OSD from cluster?

2019-03-01 Thread Alexandru Cucu
More on the subject can be found here: https://ceph.com/geen-categorie/difference-between-ceph-osd-reweight-and-ceph-osd-crush-reweight/ On Fri, Mar 1, 2019 at 2:22 PM Darius Kasparavičius wrote: > > Hi, > > Setting crush weight to 0 removes the osds weight from crushmap, by > modifying hosts tot

Re: [ceph-users] Ceph cluster stability

2019-02-20 Thread Alexandru Cucu
Hi, I would decrese max active recovery processes per osd and increase recovery sleep. osd recovery max active = 1 (default is 3) osd recovery sleep = 1 (default is 0 or 0.1) osd max backfills defaults to 1 so that should be OK if he's using the default :D Disabling scrubbing during reco

Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread Alexandru Cucu
Hello, Not exactly what you were looking for, but you could use the Prometheus plugin for ceph-mgr and get the health status from the metrics. curl -s http://ceph-mgr-node:9283/metrics | grep ^ceph_health_status On Wed, Jan 30, 2019 at 3:04 PM PHARABOT Vincent wrote: > > Hello, > > > > I h

Re: [ceph-users] Scheduling deep-scrub operations

2018-12-14 Thread Alexandru Cucu
Hi, Unfortunately there is no way of doing this from the Ceph configuration but you could create some cron jobs to add and remove the nodeep-scrub flag. The only problem would be that your cluster status will show HEALTH_WARN but i think you could set/unset the flags per pool to avoid this. On Fr

Re: [ceph-users] Large omap objects - how to fix ?

2018-10-31 Thread Alexandru Cucu
> *** NOTICE: operation will not remove old bucket index objects *** > *** these will need to be removed manually *** > > > All the best, > Flo > > > Am 10/26/18 um 3:56 PM schrieb Alexandru Cucu: > > Hi, > > Sorry to hijack this thread. I

Re: [ceph-users] OSD node reinstallation

2018-10-30 Thread Alexandru Cucu
Don't forget about the cephx keyring if you are using cephx ;) Usually sits in: /var/lib/ceph/bootstrap-osd/ceph.keyring --- Alex On Tue, Oct 30, 2018 at 4:48 AM David Turner wrote: > > Set noout, reinstall the OS without going the OSDs (including any journal > partitions and maintaining a

Re: [ceph-users] Large omap objects - how to fix ?

2018-10-26 Thread Alexandru Cucu
Hi, Sorry to hijack this thread. I have a similar issue also with 12.2.8 recently upgraded from Jewel. I my case all buckets are within limits: # radosgw-admin bucket limit check | jq '.[].buckets[].fill_status' | uniq "OK" # radosgw-admin bucket limit check | jq '.[].buckets[].objec

Re: [ceph-users] Migrate/convert replicated pool to EC?

2018-10-26 Thread Alexandru Cucu
Hi, Have a look at this article: https://ceph.com/geen-categorie/ceph-pool-migration/ --- Alex Cucu On Thu, Oct 25, 2018 at 7:31 PM Matthew Vernon wrote: > > Hi, > > I thought I'd seen that it was possible to migrate a replicated pool to > being erasure-coded (but not the converse); but I'm fai

[ceph-users] Invalid bucket in reshard list

2018-10-05 Thread Alexandru Cucu
Hello, I'm running a Luminous 12.2.7 cluster. Wanted to reshard the index of an RGW bucket and accidentally typed the name wrong. Now in "radosgw-admin reshard list" I have a task for a bucket that does not exist. Can't process or cancel it: # radosgw-admin reshard process ERROR: failed

Re: [ceph-users] Remotely tell an OSD to stop ?

2018-09-21 Thread Alexandru Cucu
Hi, You won't be able to stop them, but if the OSDs are still running I would just set them as out, wait for all data to be moved from them and then it should be safe to power off the host. --- Alex On Fri, Sep 21, 2018 at 11:50 AM Nicolas Huillard wrote: > > Hi all, > > One of my server crashe

Re: [ceph-users] Ceph RGW Index Sharding In Jewel

2018-08-24 Thread Alexandru Cucu
You should probably have a look at ceph-ansible as it has a "take-over-existing-cluster" playbook. I think versions older than 2.0 support Ceph versions older than Jewel. --- Alex Cucu On Fri, Aug 24, 2018 at 4:31 AM Russell Holloway wrote: > > Thanks. Unfortunately even my version of hammer is

Re: [ceph-users] [Jewel 10.2.11] OSD Segmentation fault

2018-08-13 Thread Alexandru Cucu
Am 3. August 2018 12:03:17 MESZ schrieb Alexandru Cucu : > >Hello, > > > > Hello Alex, > > >Another OSD started randomly crashing with segmentation fault. Haven't > >managed to add the last 3 OSDs back to the cluster as the daemons keep > >crashing. >

Re: [ceph-users] ceph issue tracker tells that posting issues is forbidden

2018-08-06 Thread Alexandru Cucu
Hello, Any news? Still can't open new issues. Thanks, Alex On Sun, Aug 5, 2018 at 1:50 PM Виталий Филиппов wrote: > > Thanks for the reply! Ok I understand :-) > > But the page still shows 403 by now... > > 5 августа 2018 г. 6:42:33 GMT+03:00, Gregory Farnum > пишет: >> >> On Sun, Aug 5, 2018

Re: [ceph-users] [Jewel 10.2.11] OSD Segmentation fault

2018-08-03 Thread Alexandru Cucu
()+0xf5) [0x7f12c0374c05] 21: (()+0x3c8847) [0x7f12c3cb4847] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. --- Any help would be appreciated. Thanks, Alex Cucu On Mon, Jul 30, 2018 at 4:55 PM Alexandru Cucu wrote: > > Hello Ceph users, > > We hav

[ceph-users] [Jewel 10.2.11] OSD Segmentation fault

2018-07-30 Thread Alexandru Cucu
Hello Ceph users, We have updated our cluster from 10.2.7 to 10.2.11. A few hours after the update, 1 OSD crashed. When trying to add the OSD back to the cluster, other 2 OSDs started crashing with segmentation fault. Had to mark all 3 OSDs as down as we had stuck PGs and blocked operations and th

Re: [ceph-users] Stop scrubbing

2018-06-06 Thread Alexandru Cucu
Hi, The only way I know is pretty brutal: list all the PGs with a scrubbing process, get the primary OSD and mark it as down. The scrubbing process will stop. Make sure you set the noout, norebalance and norecovery flags so you don't add even more load to your cluster. On Tue, Jun 5, 2018 at 11:4

[ceph-users] PG replication issues

2018-02-12 Thread Alexandru Cucu
Hello, Warning, this is a long story! There's a TL;DR; close to the end. We are replacing some of our spinning drives with SSDs. We have 14 OSD nodes with 12 drives each. We are replacing 4 drives from each node with SSDs. The cluster is running Ceph Jewel (10.2.7). The affected pool had min_size