[ceph-users] Re: Ceph Health error right after starting balancer
Looks like you didn't tell the whole story, please post the *full* output of ceph -s and ceph osd df tree. Wild guess: you need to increase "mon max pg per osd" Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Thu, Oct 31, 2019 at 8:17 PM Thomas <74cmo...@gmail.com> wrote: > > This is the output of OSD.270 that remains with slow requests blocked > even after restarting. > What's the interpretation of it? > > root@ld5507:~# ceph daemon osd.270 dump_blocked_ops [330/1857] > { > "ops": [ > { > "description": "osd_pg_create(e293649 59.b:267033 > 59.2c:267033)", > "initiated_at": "2019-10-31 19:22:13.563017", > "age": 2785.269856041, > "duration": 2785.269905628, > "type_data": { > "flag_point": "started", > "events": [ > { > "time": "2019-10-31 19:22:13.563017", > "event": "initiated" > }, > { > "time": "2019-10-31 19:22:13.563017", > "event": "header_read" > }, > { > "time": "2019-10-31 19:22:13.563011", > "event": "throttled" > }, > { > "time": "2019-10-31 19:22:13.563024", > "event": "all_read" > }, > { > "time": "2019-10-31 20:07:43.881441", > "event": "dispatched" > }, [300/1857] > { > "time": "2019-10-31 20:07:43.881472", > "event": "wait for new map" > }, > { > "time": "2019-10-31 20:07:44.665714", > "event": "started" > } > ] > } > }, > { > "description": "osd_pg_create(e293650 59.b:267033 > 59.2c:267033)", > "initiated_at": "2019-10-31 19:23:16.150040", > "age": 2722.682833165, > "duration": 2722.683007228, > "type_data": { > "flag_point": "delayed", > "events": [ > { > "time": "2019-10-31 19:23:16.150040", > "event": "initiated" > }, > { > "time": "2019-10-31 19:23:16.150040", > "event": "header_read" > }, > { > "time": "2019-10-31 19:23:16.150035", > "event": "throttled" > }, [269/1857] > { > "time": "2019-10-31 19:23:16.150055", > "event": "all_read" > }, > { > "time": "2019-10-31 20:07:43.882197", > "event": "dispatched" > }, > { > "time": "2019-10-31 20:07:43.882198", > "event": "wait for new map" > } > ] > } > }, > { > "description": "osd_pg_create(e293651 59.b:267033 > 59.2c:267033)", > "initiated_at": "2019-10-31 19:23:17.779034", > "age": 2721.0538393319998, > "duration": 2721.0541152350002, > "type_data": { > "flag_point": "delayed", > "events": [ > { > "time": "2019-10-31 19:23:17.779034", > "event": "initiated" > }, > { > "time": "2019-10-31 19:23:17.779034", > "event": "header_read" > }, [238/1857] > { > "time": "2019-10-31 19:23:17.779027", > "event": "throttled" > }, > { > "time": "2019-10-31 19:23:17.779044", > "event": "all_read" > }, > { > "time": "2019-10-31 20:07:43.882326", > "event": "dispatched" > }, > { > "time": "2019-10-31 20:07:43.882328", > "event": "wait for new map" > } > ] > } > }, > { > "description": "osd_pg_create
[ceph-users] Re: V/v Multiple pool for data in Ceph object
Ok, thanks. Br, -- Dương Tuấn Dũng Email: dungdt.aicgr...@gmail.com ĐT: 0986153686 On Wed, Oct 30, 2019 at 1:51 PM Konstantin Shalygin wrote: > On 10/29/19 3:45 PM, tuan dung wrote: > > i have a cluster run ceph object using version 14.2.1. I want to creat 2 > pool for bucket data for purposes for security: > + one bucket-data pool for public client access from internet (name > *zone1.rgw.buckets.data-pub) * > + one bucket-data pool for private client access from local network (name > *zone1.rgw.buckets.data-pub)* > each pool bucket-data has one individual access key: access key public > (access pool public) and access key private (access pool private). > Can you give me a recomment for this or bestpractice that you've done? > what needs to be done? > Or give me your best solution for securiy a cluster ceph object with > public client access and private client access? > > You need add extra placement. This setup is pretty useless IMHO because > you still will be going from one rgw zone. > > > > k > > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: subtrees have overcommitted (target_size_bytes / target_size_ratio)
Is there anybody who can explain the overcommitment calcuation? Thanks Mon, 28 Oct 2019 11:24:54 +0100 Lars Täuber ==> ceph-users : > Is there a way to get rid of this warnings with activated autoscaler besides > adding new osds? > > Yet I couldn't get a satisfactory answer to the question why this all happens. > > ceph osd pool autoscale-status : > POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET > RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE > cephfs_data 122.2T1.5165.4T 1.1085 > 0.8500 1.01024 on > > versus > > ceph df : > RAW STORAGE: > CLASS SIZEAVAIL USEDRAW USED %RAW USED > hdd 165 TiB 41 TiB 124 TiB 124 TiB 74.95 > > POOLS: > POOLID STORED OBJECTS USED%USED > MAX AVAIL > cephfs_data 1 75 TiB 49.31M 122 TiB 87.16 > 12 TiB > > > It seems that the overcommitment is wrongly calculated. Isn't the RATE > already used to calculate the SIZE? > > It seems USED(df) = SIZE(autoscale-status) > Isn't the RATE already taken into account here? > > Could someone please explain the numbers to me? > > > Thanks! > Lars > > Fri, 25 Oct 2019 07:42:58 +0200 > Lars Täuber ==> Nathan Fish : > > Hi Nathan, > > > > Thu, 24 Oct 2019 10:59:55 -0400 > > Nathan Fish ==> Lars Täuber : > > > Ah, I see! The BIAS reflects the number of placement groups it should > > > create. Since cephfs metadata pools are usually very small, but have > > > many objects and high IO, the autoscaler gives them 4x the number of > > > placement groups that it would normally give for that amount of data. > > > > > ah ok, I understand. > > > > > So, your cephfs_data is set to a ratio of 0.9, and cephfs_metadata to > > > 0.3? Are the two pools using entirely different device classes, so > > > they are not sharing space? > > > > Yes, the metadata is on SSDs and the data on HDDs. > > > > > Anyway, I see that your overcommit is only "1.031x". So if you set > > > cephfs_data to 0.85, it should go away. > > > > This is not the case. I set the target_ratio to 0.7 and get this: > > > > POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET > > RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE > > cephfs_metadata 15736M3.0 2454G 0.0188 > > 0.3000 4.0 256 on > > cephfs_data 122.2T1.5165.4T 1.1085 > > 0.7000 1.01024 on > > > > The ratio seems to have nothing to do with the target_ratio but the SIZE > > and the RAW_CAPACITY. > > Because the pool is still getting more data the SIZE increases and > > therefore the RATIO increases. > > The RATIO seems to be calculated by this formula > > RATIO = SIZE * RATE / RAW_CAPACITY. > > > > This is what I don't understand. The data in the cephfs_data pool seems to > > need more space than the raw capacity of the cluster provides. Hence the > > situation is called "overcommitment". > > > > But why is this only the case when the autoscaler is active? > > > > Thanks > > Lars > > > > > > > > On Thu, Oct 24, 2019 at 10:09 AM Lars Täuber wrote: > > > > > > > > Thanks Nathan for your answer, > > > > > > > > but I set the the Target Ratio to 0.9. It is the cephfs_data pool that > > > > makes the troubles. > > > > > > > > The 4.0 is the BIAS from the cephfs_metadata pool. This "BIAS" is not > > > > explained on the page linked below. So I don't know its meaning. > > > > > > > > How can be a pool overcommited when it is the only pool on a set of > > > > OSDs? > > > > > > > > Best regards, > > > > Lars > > > > > > > > Thu, 24 Oct 2019 09:39:51 -0400 > > > > Nathan Fish ==> Lars Täuber : > > > > > > > > > The formatting is mangled on my phone, but if I am reading it > > > > > correctly, > > > > > you have set Target Ratio to 4.0. This means you have told the > > > > > balancer > > > > > that this pool will occupy 4x the space of your whole cluster, and to > > > > > optimize accordingly. This is naturally a problem. Setting it to 0 > > > > > will > > > > > clear the setting and allow the autobalancer to work. > > > > > > > > > > On Thu., Oct. 24, 2019, 5:18 a.m. Lars Täuber, > > > > > wrote: > > > > > > > > > > > This question is answered here: > > > > > > https://ceph.io/rados/new-in-nautilus-pg-merging-and-autotuning/ > > > > > > > > > > > > But it tells me that there is more data stored in the pool than the > > > > > > raw > > > > > > capacity provides (taking the replication factor RATE into account) > > > > > > hence > > > > > > the RATIO being above 1.0 . > > > > > > > > > > > > How comes this is the case? - Data is stored outside of the pool? > > > > > > How comes this is only the case when the autoscaler is active? > > > > > > > > > > > > Thanks > > > > > > Lars > > > >
[ceph-users] Re: subtrees have overcommitted (target_size_bytes / target_size_ratio)
This was fixed a few weeks back. It should be resolved in 14.2.5. https://tracker.ceph.com/issues/41567 https://github.com/ceph/ceph/pull/31100 sage On Fri, 1 Nov 2019, Lars Täuber wrote: > Is there anybody who can explain the overcommitment calcuation? > > Thanks > > > Mon, 28 Oct 2019 11:24:54 +0100 > Lars Täuber ==> ceph-users : > > Is there a way to get rid of this warnings with activated autoscaler > > besides adding new osds? > > > > Yet I couldn't get a satisfactory answer to the question why this all > > happens. > > > > ceph osd pool autoscale-status : > > POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET > > RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE > > cephfs_data 122.2T1.5165.4T 1.1085 > > 0.8500 1.01024 on > > > > versus > > > > ceph df : > > RAW STORAGE: > > CLASS SIZEAVAIL USEDRAW USED %RAW USED > > hdd 165 TiB 41 TiB 124 TiB 124 TiB 74.95 > > > > POOLS: > > POOLID STORED OBJECTS USED%USED > > MAX AVAIL > > cephfs_data 1 75 TiB 49.31M 122 TiB 87.16 > >12 TiB > > > > > > It seems that the overcommitment is wrongly calculated. Isn't the RATE > > already used to calculate the SIZE? > > > > It seems USED(df) = SIZE(autoscale-status) > > Isn't the RATE already taken into account here? > > > > Could someone please explain the numbers to me? > > > > > > Thanks! > > Lars > > > > Fri, 25 Oct 2019 07:42:58 +0200 > > Lars Täuber ==> Nathan Fish : > > > Hi Nathan, > > > > > > Thu, 24 Oct 2019 10:59:55 -0400 > > > Nathan Fish ==> Lars Täuber : > > > > Ah, I see! The BIAS reflects the number of placement groups it should > > > > create. Since cephfs metadata pools are usually very small, but have > > > > many objects and high IO, the autoscaler gives them 4x the number of > > > > placement groups that it would normally give for that amount of data. > > > > > > > ah ok, I understand. > > > > > > > So, your cephfs_data is set to a ratio of 0.9, and cephfs_metadata to > > > > 0.3? Are the two pools using entirely different device classes, so > > > > they are not sharing space? > > > > > > Yes, the metadata is on SSDs and the data on HDDs. > > > > > > > Anyway, I see that your overcommit is only "1.031x". So if you set > > > > cephfs_data to 0.85, it should go away. > > > > > > This is not the case. I set the target_ratio to 0.7 and get this: > > > > > > POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET > > > RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE > > > cephfs_metadata 15736M3.0 2454G 0.0188 > > > 0.3000 4.0 256 on > > > cephfs_data 122.2T1.5165.4T 1.1085 > > > 0.7000 1.01024 on > > > > > > The ratio seems to have nothing to do with the target_ratio but the SIZE > > > and the RAW_CAPACITY. > > > Because the pool is still getting more data the SIZE increases and > > > therefore the RATIO increases. > > > The RATIO seems to be calculated by this formula > > > RATIO = SIZE * RATE / RAW_CAPACITY. > > > > > > This is what I don't understand. The data in the cephfs_data pool seems > > > to need more space than the raw capacity of the cluster provides. Hence > > > the situation is called "overcommitment". > > > > > > But why is this only the case when the autoscaler is active? > > > > > > Thanks > > > Lars > > > > > > > > > > > On Thu, Oct 24, 2019 at 10:09 AM Lars Täuber wrote: > > > > > > > > > > > > > > Thanks Nathan for your answer, > > > > > > > > > > but I set the the Target Ratio to 0.9. It is the cephfs_data pool > > > > > that makes the troubles. > > > > > > > > > > The 4.0 is the BIAS from the cephfs_metadata pool. This "BIAS" is not > > > > > explained on the page linked below. So I don't know its meaning. > > > > > > > > > > How can be a pool overcommited when it is the only pool on a set of > > > > > OSDs? > > > > > > > > > > Best regards, > > > > > Lars > > > > > > > > > > Thu, 24 Oct 2019 09:39:51 -0400 > > > > > Nathan Fish ==> Lars Täuber : > > > > > > > > > > > The formatting is mangled on my phone, but if I am reading it > > > > > > correctly, > > > > > > you have set Target Ratio to 4.0. This means you have told the > > > > > > balancer > > > > > > that this pool will occupy 4x the space of your whole cluster, and > > > > > > to > > > > > > optimize accordingly. This is naturally a problem. Setting it to 0 > > > > > > will > > > > > > clear the setting and allow the autobalancer to work. > > > > > > > > > > > > On Thu., Oct. 24, 2019, 5:18 a.m. Lars Täuber, > > > > > > wrote: > > > > > > > > > > > > > This question is answered here: > > > > > > > https://ceph.io/rados/new-in-nautilus-pg-merging-and-autotunin
[ceph-users] Re: subtrees have overcommitted (target_size_bytes / target_size_ratio)
Thanks a lot! Lars Fri, 1 Nov 2019 13:03:25 + (UTC) Sage Weil ==> Lars Täuber : > This was fixed a few weeks back. It should be resolved in 14.2.5. > > https://tracker.ceph.com/issues/41567 > https://github.com/ceph/ceph/pull/31100 > > sage > > > On Fri, 1 Nov 2019, Lars Täuber wrote: > > > Is there anybody who can explain the overcommitment calcuation? > > > > Thanks > > > > > > Mon, 28 Oct 2019 11:24:54 +0100 > > Lars Täuber ==> ceph-users : > > > Is there a way to get rid of this warnings with activated autoscaler > > > besides adding new osds? > > > > > > Yet I couldn't get a satisfactory answer to the question why this all > > > happens. > > > > > > ceph osd pool autoscale-status : > > > POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET > > > RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE > > > cephfs_data 122.2T1.5165.4T 1.1085 > > > 0.8500 1.01024 on > > > > > > versus > > > > > > ceph df : > > > RAW STORAGE: > > > CLASS SIZEAVAIL USEDRAW USED %RAW USED > > > hdd 165 TiB 41 TiB 124 TiB 124 TiB 74.95 > > > > > > POOLS: > > > POOLID STORED OBJECTS USED%USED > > > MAX AVAIL > > > cephfs_data 1 75 TiB 49.31M 122 TiB 87.16 > > > 12 TiB > > > > > > > > > It seems that the overcommitment is wrongly calculated. Isn't the RATE > > > already used to calculate the SIZE? > > > > > > It seems USED(df) = SIZE(autoscale-status) > > > Isn't the RATE already taken into account here? > > > > > > Could someone please explain the numbers to me? > > > > > > > > > Thanks! > > > Lars > > > > > > Fri, 25 Oct 2019 07:42:58 +0200 > > > Lars Täuber ==> Nathan Fish : > > > > Hi Nathan, > > > > > > > > Thu, 24 Oct 2019 10:59:55 -0400 > > > > Nathan Fish ==> Lars Täuber : > > > > > > > > > Ah, I see! The BIAS reflects the number of placement groups it should > > > > > create. Since cephfs metadata pools are usually very small, but have > > > > > many objects and high IO, the autoscaler gives them 4x the number of > > > > > placement groups that it would normally give for that amount of data. > > > > > > > > > ah ok, I understand. > > > > > > > > > So, your cephfs_data is set to a ratio of 0.9, and cephfs_metadata to > > > > > 0.3? Are the two pools using entirely different device classes, so > > > > > they are not sharing space? > > > > > > > > Yes, the metadata is on SSDs and the data on HDDs. > > > > > > > > > Anyway, I see that your overcommit is only "1.031x". So if you set > > > > > cephfs_data to 0.85, it should go away. > > > > > > > > This is not the case. I set the target_ratio to 0.7 and get this: > > > > > > > > POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO > > > > TARGET RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE > > > > cephfs_metadata 15736M3.0 2454G 0.0188 > > > > 0.3000 4.0 256 on > > > > cephfs_data 122.2T1.5165.4T 1.1085 > > > > 0.7000 1.01024 on > > > > > > > > The ratio seems to have nothing to do with the target_ratio but the > > > > SIZE and the RAW_CAPACITY. > > > > Because the pool is still getting more data the SIZE increases and > > > > therefore the RATIO increases. > > > > The RATIO seems to be calculated by this formula > > > > RATIO = SIZE * RATE / RAW_CAPACITY. > > > > > > > > This is what I don't understand. The data in the cephfs_data pool seems > > > > to need more space than the raw capacity of the cluster provides. Hence > > > > the situation is called "overcommitment". > > > > > > > > But why is this only the case when the autoscaler is active? > > > > > > > > Thanks > > > > Lars > > > > > > > > > > > > > > On Thu, Oct 24, 2019 at 10:09 AM Lars Täuber wrote: > > > > > > > > > > > > > > > > > Thanks Nathan for your answer, > > > > > > > > > > > > but I set the the Target Ratio to 0.9. It is the cephfs_data pool > > > > > > that makes the troubles. > > > > > > > > > > > > The 4.0 is the BIAS from the cephfs_metadata pool. This "BIAS" is > > > > > > not explained on the page linked below. So I don't know its meaning. > > > > > > > > > > > > How can be a pool overcommited when it is the only pool on a set of > > > > > > OSDs? > > > > > > > > > > > > Best regards, > > > > > > Lars > > > > > > > > > > > > Thu, 24 Oct 2019 09:39:51 -0400 > > > > > > Nathan Fish ==> Lars Täuber > > > > > > : > > > > > > > The formatting is mangled on my phone, but if I am reading it > > > > > > > correctly, > > > > > > > you have set Target Ratio to 4.0. This means you have told the > > > > > > > balancer > > > > > > > that this pool will occupy 4x the space of your whole cluster, > > > > > > > and to > > > > > > > opti
[ceph-users] Re: Ceph Health error right after starting balancer
Hi Paul, the situation has changed in the meantime. However, I can reproduce a similar behaviour. This means, - I disable balancer (ceph balancer off) - and then start reweighting of a specific OSD (ceph osd reweight 134 1.0) The cluster immediatelly reports slow requests. root@ld3955:~# ceph health detail HEALTH_WARN 434 slow requests are blocked > 32 sec; mon ld5505 is low on available space REQUEST_SLOW 434 slow requests are blocked > 32 sec 131 ops are blocked > 131.072 sec 270 ops are blocked > 65.536 sec 33 ops are blocked > 32.768 sec osd.66 has blocked requests > 32.768 sec osds 19,65,67,426 have blocked requests > 65.536 sec osds 0,2,3,5,6,7,8,16,24,28,29,30,32,34,35,36,37,38,39,40,41,59,60,61,62,63,64,68,69,70,71,72,73,74,75,173,174,178,180,181,184,185 ,186,187,188,268,269,270,271,368,369,370,420,421,423,424,429,431,432,433,434,435,436 have blocked requests > 131.072 sec MON_DISK_LOW mon ld5505 is low on available space mon.ld5505 has 24% avail root@ld3955:~# ceph -s cluster: id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae health: HEALTH_WARN 453 slow requests are blocked > 32 sec mon ld5505 is low on available space services: mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 23h) mgr: ld5505(active, since 22h), standbys: ld5506, ld5507, ld5508 mds: cephfs:1 {0=ld4465=up:active} 1 up:standby osd: 442 osds: 442 up, 442 in; 5 remapped pgs data: pools: 6 pools, 8312 pgs objects: 63.92M objects, 244 TiB usage: 731 TiB used, 800 TiB / 1.5 PiB avail pgs: 37702/191053577 objects misplaced (0.020%) 8249 active+clean 29 active+clean+scrubbing+deep 29 active+clean+scrubbing 3 active+remapped+backfill_wait 2 active+remapped+backfilling io: client: 1.4 KiB/s rd, 87 MiB/s wr, 1 op/s rd, 22 op/s wr recovery: 34 MiB/s, 8 objects/s In this example I have reweight an HDD, but many OSDs that have blocked requests (0,2,3,5,6,7,8) are SSDs. root@ld3955:~# ceph osd df tree ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -17 1363.19983 - 1.3 PiB 725 TiB 724 TiB 31 MiB 1.4 TiB 637 TiB 53.25 1.12 - root hdd_strgbox -43 349.43994 - 349 TiB 171 TiB 171 TiB 6.3 MiB 300 GiB 178 TiB 48.91 1.02 - host ld4257-hdd_strgbox 371 hdd 7.28000 1.0 7.3 TiB 3.5 TiB 3.5 TiB 180 KiB 5.9 GiB 3.8 TiB 47.84 1.00 118 up osd.371 372 hdd 7.28000 1.0 7.3 TiB 3.5 TiB 3.5 TiB 128 KiB 6.2 GiB 3.7 TiB 48.65 1.02 120 up osd.372 373 hdd 7.28000 1.0 7.3 TiB 3.5 TiB 3.5 TiB 8 KiB 6.1 GiB 3.7 TiB 48.71 1.02 120 up osd.373 374 hdd 7.28000 1.0 7.3 TiB 3.4 TiB 3.4 TiB 68 KiB 6.0 GiB 3.8 TiB 47.34 0.99 117 up osd.374 375 hdd 7.28000 1.0 7.3 TiB 3.9 TiB 3.9 TiB 208 KiB 6.7 GiB 3.4 TiB 53.49 1.12 132 up osd.375 376 hdd 7.28000 1.0 7.3 TiB 3.1 TiB 3.1 TiB 72 KiB 5.7 GiB 4.2 TiB 42.11 0.88 104 up osd.376 377 hdd 7.28000 1.0 7.3 TiB 3.1 TiB 3.1 TiB 120 KiB 5.7 GiB 4.2 TiB 42.13 0.88 104 up osd.377 378 hdd 7.28000 1.0 7.3 TiB 3.4 TiB 3.4 TiB 176 KiB 6.0 GiB 3.8 TiB 47.37 0.99 117 up osd.378 379 hdd 7.28000 1.0 7.3 TiB 3.6 TiB 3.6 TiB 32 KiB 6.2 GiB 3.7 TiB 49.47 1.04 122 up osd.379 380 hdd 7.28000 1.0 7.3 TiB 3.3 TiB 3.3 TiB 168 KiB 5.8 GiB 4.0 TiB 45.33 0.95 112 up osd.380 381 hdd 7.28000 1.0 7.3 TiB 3.7 TiB 3.7 TiB 284 KiB 6.4 GiB 3.6 TiB 50.30 1.05 124 up osd.381 382 hdd 7.28000 1.0 7.3 TiB 3.4 TiB 3.4 TiB 12 KiB 5.8 GiB 3.9 TiB 46.92 0.98 116 up osd.382 383 hdd 7.28000 1.0 7.3 TiB 3.6 TiB 3.6 TiB 172 KiB 6.2 GiB 3.7 TiB 49.75 1.04 123 up osd.383 384 hdd 7.28000 1.0 7.3 TiB 3.8 TiB 3.8 TiB 60 KiB 7.3 GiB 3.5 TiB 51.88 1.09 128 up osd.384 385 hdd 7.28000 1.0 7.3 TiB 3.4 TiB 3.4 TiB 76 KiB 6.5 GiB 3.8 TiB 47.10 0.99 116 up osd.385 386 hdd 7.28000 1.0 7.3 TiB 3.9 TiB 3.9 TiB 84 KiB 7.2 GiB 3.4 TiB 53.83 1.13 133 up osd.386 387 hdd 7.28000 1.0 7.3 TiB 3.6 TiB 3.6 TiB 200 KiB 6.3 GiB 3.7 TiB 49.36 1.03 122 up osd.387 388 hdd 7.28000 1.0 7.3 TiB 3.4 TiB 3.4 TiB 72 KiB 5.8 GiB 3.9 TiB 46.62 0.98 115 up osd.388 389 hdd 7.28000 1.0 7.3 TiB 3.8 TiB 3.8 TiB 276 KiB 6.6 GiB 3.5 TiB 52.24 1.09 128 up osd.389 390 hdd 7.28000 1.0 7.3 TiB 3.1 TiB 3.1 TiB 72 KiB 5.3 GiB 4.2 TiB 42.24 0.88 104 up osd.390 391 hdd 7.28000 1.0 7.3 TiB 3.4 TiB 3.4 TiB 148 KiB 5.8 GiB 3.9 TiB 46.57 0.98 115 up osd.391 392 hdd
[ceph-users] mgr daemons becoming unresponsive
Dear Cephers, this is a 14.2.4 cluster with device health metrics enabled - since about a day, all mgr daemons go "silent" on me after a few hours, i.e. "ceph -s" shows: cluster: id: 269cf2b2-7e7c-4ceb-bd1b-a33d915ceee9 health: HEALTH_WARN no active mgr 1/3 mons down, quorum mon001,mon002 services: mon:3 daemons, quorum mon001,mon002 (age 57m), out of quorum: mon003 mgr:no daemons active (since 56m) ... (the third mon has a planned outage and will come back in a few days) Checking the logs of the mgr daemons, I find some "reset" messages at the time when it goes "silent", first for the first mgr: 2019-11-01 21:34:40.286 7f2df6a6b700 0 log_channel(cluster) log [DBG] : pgmap v1798: 1585 pgs: 1585 active+clean; 1.1 TiB data, 2.3 TiB used, 136 TiB / 138 TiB avail 2019-11-01 21:34:41.458 7f2e0d59b700 0 client.0 ms_handle_reset on v2:10.160.16.1:6800/401248 2019-11-01 21:34:42.287 7f2df6a6b700 0 log_channel(cluster) log [DBG] : pgmap v1799: 1585 pgs: 1585 active+clean; 1.1 TiB data, 2.3 TiB used, 136 TiB / 138 TiB avail and a bit later, on the standby mgr: 2019-11-01 22:18:14.892 7f7bcc8ae700 0 log_channel(cluster) log [DBG] : pgmap v1798: 1585 pgs: 166 active+clean+snaptrim, 858 active+clean+snaptrim_wait, 561 active+clean; 1.1 TiB data, 2.3 TiB used, 136 TiB / 138 TiB avail 2019-11-01 22:18:16.022 7f7be9e72700 0 client.0 ms_handle_reset on v2:10.160.16.2:6800/352196 2019-11-01 22:18:16.893 7f7bcc8ae700 0 log_channel(cluster) log [DBG] : pgmap v1799: 1585 pgs: 166 active+clean+snaptrim, 858 active+clean+snaptrim_wait, 561 active+clean; 1.1 TiB data, 2.3 TiB used, 136 TiB / 138 TiB avail Interestingly, the dashboard still works, but presents outdated information, and for example zero I/O going on. I believe this started to happen mainly after the third mon went into the known downtime, but I am not fully sure if this was the trigger, since the cluster is still growing. It may also have been the addition of 24 more OSDs. I also find other messages in the mgr logs which seem problematic, but I am not sure they are related: -- 2019-11-01 21:17:09.849 7f2df4266700 0 mgr[devicehealth] Error reading OMAP: [errno 22] Failed to operate read op for oid Traceback (most recent call last): File "/usr/share/ceph/mgr/devicehealth/module.py", line 396, in put_device_metrics ioctx.operate_read_op(op, devid) File "rados.pyx", line 516, in rados.requires.wrapper.validate_func (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUIL D/ceph-14.2.4/build/src/pybind/rados/pyrex/rados.c:4721) File "rados.pyx", line 3474, in rados.Ioctx.operate_read_op (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUILD/ceph-14.2.4/build/src/pybind/rados/pyrex/rados.c:36554) InvalidArgumentError: [errno 22] Failed to operate read op for oid -- or: -- 2019-11-01 21:33:53.977 7f7bd38bc700 0 mgr[devicehealth] Fail to parse JSON result from daemon osd.51 () 2019-11-01 21:33:53.978 7f7bd38bc700 0 mgr[devicehealth] Fail to parse JSON result from daemon osd.52 () 2019-11-01 21:33:53.979 7f7bd38bc700 0 mgr[devicehealth] Fail to parse JSON result from daemon osd.53 () -- The reason why I am cautious about the health metrics is that I observed a crash when trying to query them: -- 2019-11-01 20:21:23.661 7fa46314a700 0 log_channel(audit) log [DBG] : from='client.174136 -' entity='client.admin' cmd=[{"prefix": "device get-health-metrics", "devid": "osd.11", "target": ["mgr", ""]}]: dispatch 2019-11-01 20:21:23.661 7fa46394b700 0 mgr[devicehealth] handle_command 2019-11-01 20:21:23.663 7fa46394b700 -1 *** Caught signal (Segmentation fault) ** in thread 7fa46394b700 thread_name:mgr-fin ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable) 1: (()+0xf5f0) [0x7fa488cee5f0] 2: (PyEval_EvalFrameEx()+0x1a9) [0x7fa48aeb50f9] 3: (PyEval_EvalFrameEx()+0x67bd) [0x7fa48aebb70d] 4: (PyEval_EvalFrameEx()+0x67bd) [0x7fa48aebb70d] 5: (PyEval_EvalFrameEx()+0x67bd) [0x7fa48aebb70d] 6: (PyEval_EvalCodeEx()+0x7ed) [0x7fa48aebe08d] 7: (()+0x709c8) [0x7fa48ae479c8] 8: (PyObject_Call()+0x43) [0x7fa48ae22ab3] 9: (()+0x5aaa5) [0x7fa48ae31aa5] 10: (PyObject_Call()+0x43) [0x7fa48ae22ab3] 11: (()+0x4bb95) [0x7fa48ae22b95] 12: (PyObject_CallMethod()+0xbb) [0x7fa48ae22ecb] 13: (ActivePyModule::handle_command(std::map >, std::vector >, std::vector > >, std::less, std::allocator >, std::vector >, std::vector > > > > > const&, ceph::buffer::v14_2_0::list const&, std::basic_stringstream, std::allocator >*, std::basic_stringstream, std::allocator >*)+0x20e) [0x55c3
[ceph-users] Weird blocked OP issue.
We had an OSD host with 13 OSDs fail today and we have a weird blocked OP message that I can't understand. There are no OSDs with blocked ops, just `mon` (multiple times), and some of the rgw instances. cluster: id: 570bcdbb-9fdf-406f-9079-b0181025f8d0 health: HEALTH_WARN 1 large omap objects Degraded data redundancy: 2083023/195702437 objects degraded (1.064%), 880 pgs degraded, 880 pgs undersized 1609 pgs not deep-scrubbed in time 4 slow ops, oldest one blocked for 506699 sec, daemons [mon,sun-gcs02-rgw01,mon,sun-gcs02-rgw02,mon,sun-gcs02-rgw03] have slow ops. services: mon: 3 daemons, quorum sun-gcs02-rgw01,sun-gcs02-rgw02,sun-gcs02-rgw03 (age 6m) mgr: sun-gcs02-rgw02(active, since 5d), standbys: sun-gcs02-rgw03, sun-gcs02-rgw04 osd: 767 osds: 754 up (since 10m), 754 in (since 104m); 880 remapped pgs rgw: 16 daemons active (sun-gcs02-rgw01.rgw0, sun-gcs02-rgw01.rgw1, sun-gcs02-rgw01.rgw2, sun-gcs02-rgw01.rgw3, sun-gcs02-rgw02.rgw0, sun-gcs02-rgw02.rgw1, sun-gcs02-rgw02.rgw2, sun-gcs02-rgw02.rgw3, sun-gcs02-rgw03.rgw0, sun-gcs02-rgw03.rgw1, sun-gcs02-rgw03.rgw2, s un-gcs02-rgw03.rgw3, sun-gcs02-rgw04.rgw0, sun-gcs02-rgw04.rgw1, sun-gcs02-rgw04.rgw2, sun-gcs02-rgw04.rgw3) data: pools: 7 pools, 8240 pgs objects: 19.57M objects, 52 TiB usage: 88 TiB used, 6.1 PiB / 6.2 PiB avail pgs: 2083023/195702437 objects degraded (1.064%) 43492/195702437 objects misplaced (0.022%) 7360 active+clean 868 active+undersized+degraded+remapped+backfill_wait 12 active+undersized+degraded+remapped+backfilling io: client: 150 MiB/s rd, 642 op/s rd, 0 op/s wr recovery: 626 MiB/s, 223 objects/s $ ceph versions { "mon": { "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)": 3 }, "osd": { "ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 754 }, "mds": {}, "rgw": { "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)": 16 }, "overall": { "ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)": 754, "ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)": 22 } } I restarted one of the monitors and it dropped out of the list only showing 2 blocked ops, but then showed up again a little while later. Any ideas on where to look? Thanks, Robert LeBlanc Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mgr daemons becoming unresponsive
Dear Cephers, interestingly, after: ceph device monitoring off the mgrs seem to be stable now - the active one still went silent a few minutes later, but the standby took over and was stable, and restarting the broken one, it's now stable since an hour, too, so probably, a restart of the mgr is needed after disabling device monitoring to get things stable again. So it seems to be caused by a problem with the device health metrics. In case this is a red herring and mgrs become instable again in the next days, I'll let you know. Cheers, Oliver Am 01.11.19 um 23:09 schrieb Oliver Freyermuth: > Dear Cephers, > > this is a 14.2.4 cluster with device health metrics enabled - since about a > day, all mgr daemons go "silent" on me after a few hours, i.e. "ceph -s" > shows: > > cluster: > id: 269cf2b2-7e7c-4ceb-bd1b-a33d915ceee9 > health: HEALTH_WARN > no active mgr > 1/3 mons down, quorum mon001,mon002 > > services: > mon:3 daemons, quorum mon001,mon002 (age 57m), out of quorum: > mon003 > mgr:no daemons active (since 56m) > ... > (the third mon has a planned outage and will come back in a few days) > > Checking the logs of the mgr daemons, I find some "reset" messages at the > time when it goes "silent", first for the first mgr: > > 2019-11-01 21:34:40.286 7f2df6a6b700 0 log_channel(cluster) log [DBG] : > pgmap v1798: 1585 pgs: 1585 active+clean; 1.1 TiB data, 2.3 TiB used, 136 TiB > / 138 TiB avail > 2019-11-01 21:34:41.458 7f2e0d59b700 0 client.0 ms_handle_reset on > v2:10.160.16.1:6800/401248 > 2019-11-01 21:34:42.287 7f2df6a6b700 0 log_channel(cluster) log [DBG] : > pgmap v1799: 1585 pgs: 1585 active+clean; 1.1 TiB data, 2.3 TiB used, 136 TiB > / 138 TiB avail > > and a bit later, on the standby mgr: > > 2019-11-01 22:18:14.892 7f7bcc8ae700 0 log_channel(cluster) log [DBG] : > pgmap v1798: 1585 pgs: 166 active+clean+snaptrim, 858 > active+clean+snaptrim_wait, 561 active+clean; 1.1 TiB data, 2.3 TiB used, 136 > TiB / 138 TiB avail > 2019-11-01 22:18:16.022 7f7be9e72700 0 client.0 ms_handle_reset on > v2:10.160.16.2:6800/352196 > 2019-11-01 22:18:16.893 7f7bcc8ae700 0 log_channel(cluster) log [DBG] : > pgmap v1799: 1585 pgs: 166 active+clean+snaptrim, 858 > active+clean+snaptrim_wait, 561 active+clean; 1.1 TiB data, 2.3 TiB used, 136 > TiB / 138 TiB avail > > Interestingly, the dashboard still works, but presents outdated information, > and for example zero I/O going on. > I believe this started to happen mainly after the third mon went into the > known downtime, but I am not fully sure if this was the trigger, since the > cluster is still growing. > It may also have been the addition of 24 more OSDs. > > > I also find other messages in the mgr logs which seem problematic, but I am > not sure they are related: > -- > 2019-11-01 21:17:09.849 7f2df4266700 0 mgr[devicehealth] Error reading OMAP: > [errno 22] Failed to operate read op for oid > Traceback (most recent call last): > File "/usr/share/ceph/mgr/devicehealth/module.py", line 396, in > put_device_metrics > ioctx.operate_read_op(op, devid) > File "rados.pyx", line 516, in rados.requires.wrapper.validate_func > (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUIL > D/ceph-14.2.4/build/src/pybind/rados/pyrex/rados.c:4721) > File "rados.pyx", line 3474, in rados.Ioctx.operate_read_op > (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUILD/ceph-14.2.4/build/src/pybind/rados/pyrex/rados.c:36554) > InvalidArgumentError: [errno 22] Failed to operate read op for oid > -- > or: > -- > 2019-11-01 21:33:53.977 7f7bd38bc700 0 mgr[devicehealth] Fail to parse JSON > result from daemon osd.51 () > 2019-11-01 21:33:53.978 7f7bd38bc700 0 mgr[devicehealth] Fail to parse JSON > result from daemon osd.52 () > 2019-11-01 21:33:53.979 7f7bd38bc700 0 mgr[devicehealth] Fail to parse JSON > result from daemon osd.53 () > -- > > The reason why I am cautious about the health metrics is that I observed a > crash when trying to query them: > -- > 2019-11-01 20:21:23.661 7fa46314a700 0 log_channel(audit) log [DBG] : > from='client.174136 -' entity='client.admin' cmd=[{"prefix": "device > get-health-metrics", "devid": "osd.11", "target": ["mgr", ""]}]: dispatch > 2019-11-01 20:21:23.661 7fa46394b700 0 mgr[devicehealth] handle_command > 2019-11-01 20:21:23.663 7fa46394b700 -1 *** Caught signal (Segmentation > fault) ** > in thread 7fa46394b700 thread_name:mgr-fin > > ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus > (stable) > 1: (()+0xf5f0) [0x7fa48
[ceph-users] Re: mgr daemons becoming unresponsive
On Sat, 2 Nov 2019, Oliver Freyermuth wrote: > Dear Cephers, > > interestingly, after: > ceph device monitoring off > the mgrs seem to be stable now - the active one still went silent a few > minutes later, > but the standby took over and was stable, and restarting the broken one, it's > now stable since an hour, too, > so probably, a restart of the mgr is needed after disabling device monitoring > to get things stable again. > > So it seems to be caused by a problem with the device health metrics. In case > this is a red herring and mgrs become instable again in the next days, > I'll let you know. If this seems to stabilize things, and you can tolerate inducing the failure again, reproducing the problem with mgr logs cranked up (debug_mgr = 20, debug_ms = 1) would probably give us a good idea of why the mgr is hanging. Let us know! Thanks, sage > > Cheers, > Oliver > > Am 01.11.19 um 23:09 schrieb Oliver Freyermuth: > > Dear Cephers, > > > > this is a 14.2.4 cluster with device health metrics enabled - since about a > > day, all mgr daemons go "silent" on me after a few hours, i.e. "ceph -s" > > shows: > > > > cluster: > > id: 269cf2b2-7e7c-4ceb-bd1b-a33d915ceee9 > > health: HEALTH_WARN > > no active mgr > > 1/3 mons down, quorum mon001,mon002 > > > > services: > > mon:3 daemons, quorum mon001,mon002 (age 57m), out of quorum: > > mon003 > > mgr:no daemons active (since 56m) > > ... > > (the third mon has a planned outage and will come back in a few days) > > > > Checking the logs of the mgr daemons, I find some "reset" messages at the > > time when it goes "silent", first for the first mgr: > > > > 2019-11-01 21:34:40.286 7f2df6a6b700 0 log_channel(cluster) log [DBG] : > > pgmap v1798: 1585 pgs: 1585 active+clean; 1.1 TiB data, 2.3 TiB used, 136 > > TiB / 138 TiB avail > > 2019-11-01 21:34:41.458 7f2e0d59b700 0 client.0 ms_handle_reset on > > v2:10.160.16.1:6800/401248 > > 2019-11-01 21:34:42.287 7f2df6a6b700 0 log_channel(cluster) log [DBG] : > > pgmap v1799: 1585 pgs: 1585 active+clean; 1.1 TiB data, 2.3 TiB used, 136 > > TiB / 138 TiB avail > > > > and a bit later, on the standby mgr: > > > > 2019-11-01 22:18:14.892 7f7bcc8ae700 0 log_channel(cluster) log [DBG] : > > pgmap v1798: 1585 pgs: 166 active+clean+snaptrim, 858 > > active+clean+snaptrim_wait, 561 active+clean; 1.1 TiB data, 2.3 TiB used, > > 136 TiB / 138 TiB avail > > 2019-11-01 22:18:16.022 7f7be9e72700 0 client.0 ms_handle_reset on > > v2:10.160.16.2:6800/352196 > > 2019-11-01 22:18:16.893 7f7bcc8ae700 0 log_channel(cluster) log [DBG] : > > pgmap v1799: 1585 pgs: 166 active+clean+snaptrim, 858 > > active+clean+snaptrim_wait, 561 active+clean; 1.1 TiB data, 2.3 TiB used, > > 136 TiB / 138 TiB avail > > > > Interestingly, the dashboard still works, but presents outdated > > information, and for example zero I/O going on. > > I believe this started to happen mainly after the third mon went into the > > known downtime, but I am not fully sure if this was the trigger, since the > > cluster is still growing. > > It may also have been the addition of 24 more OSDs. > > > > > > I also find other messages in the mgr logs which seem problematic, but I am > > not sure they are related: > > -- > > 2019-11-01 21:17:09.849 7f2df4266700 0 mgr[devicehealth] Error reading > > OMAP: [errno 22] Failed to operate read op for oid > > Traceback (most recent call last): > > File "/usr/share/ceph/mgr/devicehealth/module.py", line 396, in > > put_device_metrics > > ioctx.operate_read_op(op, devid) > > File "rados.pyx", line 516, in rados.requires.wrapper.validate_func > > (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUIL > > D/ceph-14.2.4/build/src/pybind/rados/pyrex/rados.c:4721) > > File "rados.pyx", line 3474, in rados.Ioctx.operate_read_op > > (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUILD/ceph-14.2.4/build/src/pybind/rados/pyrex/rados.c:36554) > > InvalidArgumentError: [errno 22] Failed to operate read op for oid > > -- > > or: > > -- > > 2019-11-01 21:33:53.977 7f7bd38bc700 0 mgr[devicehealth] Fail to parse > > JSON result from daemon osd.51 () > > 2019-11-01 21:33:53.978 7f7bd38bc700 0 mgr[devicehealth] Fail to parse > > JSON result from daemon osd.52 () > > 2019-11-01 21:33:53.979 7f7bd38bc700 0 mgr[devicehealth] Fail to parse > > JSON result from daemon osd.53 () > > -- > > > > The reason why I am cautious about the health metrics is that I observed a > > crash when trying to query them: > > -- > > 2019-11-01 20:21:23.661 7fa46314a700 0
[ceph-users] Re: Weird blocked OP issue.
On Fri, Nov 1, 2019 at 6:10 PM Robert LeBlanc wrote: > > We had an OSD host with 13 OSDs fail today and we have a weird blocked > OP message that I can't understand. There are no OSDs with blocked > ops, just `mon` (multiple times), and some of the rgw instances. > > cluster: >id: 570bcdbb-9fdf-406f-9079-b0181025f8d0 >health: HEALTH_WARN >1 large omap objects >Degraded data redundancy: 2083023/195702437 objects > degraded (1.064%), 880 pgs degraded, 880 pgs undersized >1609 pgs not deep-scrubbed in time >4 slow ops, oldest one blocked for 506699 sec, daemons > [mon,sun-gcs02-rgw01,mon,sun-gcs02-rgw02,mon,sun-gcs02-rgw03] have > slow ops. > > services: >mon: 3 daemons, quorum > sun-gcs02-rgw01,sun-gcs02-rgw02,sun-gcs02-rgw03 (age 6m) >mgr: sun-gcs02-rgw02(active, since 5d), standbys: sun-gcs02-rgw03, > sun-gcs02-rgw04 >osd: 767 osds: 754 up (since 10m), 754 in (since 104m); 880 remapped pgs >rgw: 16 daemons active (sun-gcs02-rgw01.rgw0, sun-gcs02-rgw01.rgw1, > sun-gcs02-rgw01.rgw2, sun-gcs02-rgw01.rgw3, sun-gcs02-rgw02.rgw0, > sun-gcs02-rgw02.rgw1, sun-gcs02-rgw02.rgw2, sun-gcs02-rgw02.rgw3, > sun-gcs02-rgw03.rgw0, sun-gcs02-rgw03.rgw1, sun-gcs02-rgw03.rgw2, s > un-gcs02-rgw03.rgw3, sun-gcs02-rgw04.rgw0, sun-gcs02-rgw04.rgw1, > sun-gcs02-rgw04.rgw2, sun-gcs02-rgw04.rgw3) > > data: >pools: 7 pools, 8240 pgs >objects: 19.57M objects, 52 TiB >usage: 88 TiB used, 6.1 PiB / 6.2 PiB avail >pgs: 2083023/195702437 objects degraded (1.064%) > 43492/195702437 objects misplaced (0.022%) > 7360 active+clean > 868 active+undersized+degraded+remapped+backfill_wait > 12 active+undersized+degraded+remapped+backfilling > > io: >client: 150 MiB/s rd, 642 op/s rd, 0 op/s wr >recovery: 626 MiB/s, 223 objects/s > > $ ceph versions > { >"mon": { >"ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) > nautilus (stable)": 3 >}, >"mgr": { >"ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) > nautilus (stable)": 3 >}, >"osd": { >"ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) > nautilus (stable)": 754 >}, >"mds": {}, >"rgw": { >"ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) > nautilus (stable)": 16 >}, >"overall": { >"ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) > nautilus (stable)": 754, >"ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) > nautilus (stable)": 22 >} > } > > I restarted one of the monitors and it dropped out of the list only > showing 2 blocked ops, but then showed up again a little while later. > > Any ideas on where to look? For posterity's sake, it looks like I got things happy again. The rgw data pool is 8+2 EC, but was set for min_size=10. I thought I had configured that min_size=9, but it was recovering PGs, so I didn't think about it at the time. Then one OSD started crashing with something about strays and would be restarted and crash again. Then incomplete PGs showed up. I dropped the min_size to 8 to get things recovered and marked osd.119 out to empty it off. Once the cluster recovered and all PGs were healthy, I set min_size=9. I then noticed that what I thought were rgw instances being blocked where actually the names of the monitors (the hosts are named after the rgws, but mon, mgr and rgw are all containers on the boxes). I thought, well let me try to roll the first monitor again and see if that unblocks the op, sure enough it looks like it unblocked this time and has not showed up again in 10 minutes. After letting osd.119 sit empty for about 10 minutes, I set it back in and it doesn't seem to be crashing anymore, so I wonder if it had some bad db entry. It's almost halfway back in and so far so good. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io