[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Mark Johnson
Thanks for you swift reply.  Below is the requested information.

I understand the bit about not being able to reduce the pg count as we've come 
across this issue once before.  This is the reason I've been hesitant to make 
any changes there without being 100% certain of getting it right and the impact 
of these changes.  That, and the more I read about how to calculate this, the 
more confused I get.  As for the reweight, is that just a matter of "ceph osd 
reweight osd.3 1" once the other issues are sorted out (or perhaps start with a 
less dramatic change and work up)?

Also, presuming I need to change the pg/pgp num, would you be suggesting on 
pool 2 based on the below info (the pool with a few large files) or on pool 20 
(the pool with the most data but an average of about 250KB file size)?  I'm 
just completely confused as to what's caused this issue in the first place and 
how to go about fixing it.  On top of that, am I going to be able to increase 
the pg/pgp count with the cluster in a state of health_warn?  Just some posts 
I've read seem to indicate that the health state needs to be ok before this 
sort of thing can be changed (but I could be misunnderstanding what I'm 
reading).

Anyway, here's the info:

# ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
28219G 11227G   15558G 55.13
POOLS:
NAME  ID USED   %USED MAX AVAIL 
OBJECTS
rbd   0   0 0  690G 
   0
KUBERNETES1122G 15.11  690G
34188
KUBERNETES_METADATA   2  49310k 0  690G 
1426
default.rgw.control   11  0 0  690G 
   8
default.rgw.data.root 12 20076k 0  690G
54412
default.rgw.gc13  0 0  690G 
  32
default.rgw.log   14  0 0  690G 
 127
default.rgw.users.uid 15   4942 0  690G 
  15
default.rgw.users.keys16126 0  690G 
   4
default.rgw.users.swift   17252 0  690G 
   8
default.rgw.buckets.index 18  0 0  690G
27206
.rgw.root 19   1588 0  690G 
   4
default.rgw.buckets.data  20  7402G 91.47  690G 
30931617
default.rgw.users.email   21  0 0  690G 
   0


# ceph osd pool ls detail
pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash rjenkins 
pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0
pool 1 'KUBERNETES' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 100 pgp_num 100 last_change 17 flags hashpspool 
crash_replay_interval 45 stripe_width 0
pool 2 'KUBERNETES_METADATA' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 100 pgp_num 100 last_change 16 flags hashpspool 
stripe_width 0
pool 11 'default.rgw.control' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 4 pgp_num 4 last_change 68 flags hashpspool 
stripe_width 0
pool 12 'default.rgw.data.root' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 4 pgp_num 4 last_change 69 flags hashpspool 
stripe_width 0
pool 13 'default.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 4 pgp_num 4 last_change 70 flags hashpspool 
stripe_width 0
pool 14 'default.rgw.log' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 4 pgp_num 4 last_change 71 flags hashpspool 
stripe_width 0
pool 15 'default.rgw.users.uid' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 4 pgp_num 4 last_change 72 flags hashpspool 
stripe_width 0
pool 16 'default.rgw.users.keys' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 4 pgp_num 4 last_change 73 flags hashpspool 
stripe_width 0
pool 17 'default.rgw.users.swift' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 4 pgp_num 4 last_change 74 flags hashpspool 
stripe_width 0
pool 18 'default.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 
0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 75 flags hashpspool 
stripe_width 0
pool 19 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 4 pgp_num 4 last_change 76 flags hashpspool stripe_width 0
pool 20 'default.rgw.buckets.data' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 64 pgp_num 64 last_change 442 flags hashpspool 
stripe_width 0
pool 21 'default.rgw.users.email' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 16 pgp_num 16 last_change 260 flags hashpspool 
stripe_width 0

On Th

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
MGR is stopped by me cause it took too much memories.
For pg status, I added some OSDs in this cluster, and it

Frank Schilder  于2020年10月29日周四 下午3:27写道:

> Your problem is the overall cluster health. The MONs store cluster history
> information that will be trimmed once it reaches HEALTH_OK. Restarting the
> MONs only makes things worse right now. The health status is a mess, no
> MGR, a bunch of PGs inactive, etc. This is what you need to resolve. How
> did your cluster end up like this?
>
> It looks like all OSDs are up and in. You need to find out
>
> - why there are inactive PGs
> - why there are incomplete PGs
>
> This usually happens when OSDs go missing.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Zhenshi Zhou 
> Sent: 29 October 2020 07:37:19
> To: ceph-users
> Subject: [ceph-users] monitor sst files continue growing
>
> Hi all,
>
> My cluster is in wrong state. SST files in /var/lib/ceph/mon/xxx/store.db
> continue growing. It claims mon are using a lot of disk space.
>
> I set "mon compact on start = true" and restart one of the monitors. But
> it started and campacting for a long time, seems it has no end.
>
> [image.png]
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
After add OSDs into the cluster, the recovery and backfill progress has not
finished yet

Zhenshi Zhou  于2020年10月29日周四 下午3:29写道:

> MGR is stopped by me cause it took too much memories.
> For pg status, I added some OSDs in this cluster, and it
>
> Frank Schilder  于2020年10月29日周四 下午3:27写道:
>
>> Your problem is the overall cluster health. The MONs store cluster
>> history information that will be trimmed once it reaches HEALTH_OK.
>> Restarting the MONs only makes things worse right now. The health status is
>> a mess, no MGR, a bunch of PGs inactive, etc. This is what you need to
>> resolve. How did your cluster end up like this?
>>
>> It looks like all OSDs are up and in. You need to find out
>>
>> - why there are inactive PGs
>> - why there are incomplete PGs
>>
>> This usually happens when OSDs go missing.
>>
>> Best regards,
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> 
>> From: Zhenshi Zhou 
>> Sent: 29 October 2020 07:37:19
>> To: ceph-users
>> Subject: [ceph-users] monitor sst files continue growing
>>
>> Hi all,
>>
>> My cluster is in wrong state. SST files in /var/lib/ceph/mon/xxx/store.db
>> continue growing. It claims mon are using a lot of disk space.
>>
>> I set "mon compact on start = true" and restart one of the monitors. But
>> it started and campacting for a long time, seems it has no end.
>>
>> [image.png]
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Monitor persistently out-of-quorum

2020-10-29 Thread David Caro
On 10/28 17:26, Ki Wong wrote:
> Hello,
> 
> I am at my wit's end.
> 
> So I made a mistake in the configuration of my router and one
> of the monitors (out of 3) dropped out of the quorum and nothing
> I’ve done allow it to rejoin. That includes reinstalling the
> monitor with ceph-ansible.
> 
> The connectivity issue is fixed. I’ve tested it using “nc” and
> the host can connect to both port 3300 and 6789 of the other
> monitors. But the wayward monitor continue to stay out of quorum.

Just to make sure, have you tried from mon1->mon3, mon2->mon3, mon3->mon1 and
mon3->mon2?

> 
> What is wrong? I see a bunch of “EBUSY” errors in the log, with
> the message:
> 
>   e1 handle_auth_request haven't formed initial quorum, EBUSY
> 
> How do I fix this? Any help would be greatly appreciated.
> 
> Many thanks,
> 
> -kc
> 
> 
> With debug_mon at 1/10, I got these log snippets:
> 
> 2020-10-28 15:40:05.961 7fb79253a700  4 mon.mgmt03@2(probing) e1 
> probe_timeout 0x564050353ec0
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 bootstrap
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> sync_reset_requester
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> unregister_cluster_logger - not registered
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> cancel_probe_timeout (none scheduled)
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 monmap e1: 3 
> mons at 
> {mgmt01=[v2:10.0.1.1:3300/0,v1:10.0.1.1:6789/0],mgmt02=[v2:10.1.1.1:3300/0,v1:10.1.1.1:6789/0],mgmt03=[v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0]}
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 _reset
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing).auth v0 
> _set_mon_num_rank num 0 rank 0
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> cancel_probe_timeout (none scheduled)
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> timecheck_finish
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> scrub_event_cancel
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 scrub_reset
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> cancel_probe_timeout (none scheduled)
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> reset_probe_timeout 0x564050347ce0 after 2 seconds
> 2020-10-28 15:40:05.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 probing 
> other monitors
> 2020-10-28 15:40:07.961 7fb79253a700  4 mon.mgmt03@2(probing) e1 
> probe_timeout 0x564050347ce0
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 bootstrap
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> sync_reset_requester
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> unregister_cluster_logger - not registered
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> cancel_probe_timeout (none scheduled)
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 monmap e1: 3 
> mons at 
> {mgmt01=[v2:10.0.1.1:3300/0,v1:10.0.1.1:6789/0],mgmt02=[v2:10.1.1.1:3300/0,v1:10.1.1.1:6789/0],mgmt03=[v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0]}
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 _reset
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing).auth v0 
> _set_mon_num_rank num 0 rank 0
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> cancel_probe_timeout (none scheduled)
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> timecheck_finish
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> scrub_event_cancel
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 scrub_reset
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> cancel_probe_timeout (none scheduled)
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> reset_probe_timeout 0x564050360660 after 2 seconds
> 2020-10-28 15:40:07.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 probing 
> other monitors
> 2020-10-28 15:40:09.107 7fb79253a700 -1 mon.mgmt03@2(probing) e1 
> get_health_metrics reporting 7 slow ops, oldest is log(1 entries from seq 1 
> at 2020-10-27 23:03:41.586915)
> 2020-10-28 15:40:09.961 7fb79253a700  4 mon.mgmt03@2(probing) e1 
> probe_timeout 0x564050360660
> 2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 bootstrap
> 2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> sync_reset_requester
> 2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> unregister_cluster_logger - not registered
> 2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 
> cancel_probe_timeout (none scheduled)
> 2020-10-28 15:40:09.961 7fb79253a700 10 mon.mgmt03@2(probing) e1 monmap e1: 3 
> mons at 
> {mgmt01=[v2:10.0.1.1:3300/0,v1:10.0.1.1:6789/0],mgmt02=[v2:10.1.1.1:3300/0,v1:10.1.1.1:6789/0],mgmt03=[v2:10.2.1.1:3300/0,v1:10.2.1.1:6789/0]}
> 2020-10-28 15:40:09.961 7f

[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Mark Johnson
Thanks again Frank.  That gives me something to digest (and try to understand).

One question regarding maintenance mode, these are production systems that are 
required to be available all the time.  What, exactly, will happen if I issue 
this command for maintenance mode?

Thanks,
Mark


On Thu, 2020-10-29 at 07:51 +, Frank Schilder wrote:

Cephfs pools are uncritical, because ceph fs splits very large files into 
chunks of objectsize. The RGW pool is the problem, because RGW does not as far 
as I know. A few 1TB uploads and you have a problem.


The calculation is confusing, because the term PG is used in two different 
meanings, unfortunately. The pool PG count and OSD PG count are different 
things. A PG is a virtual raid set distributed over some OSDs. The number of 
PGs in a pool is the count of such raid sets. The PG count for an OSD is in 
fact the PG membership count - something completely different. It says in how 
many PGs an OSD is a member of. To create 100PGs with replication 3 you need 
3x100=300 PG memberships. If you have 3 OSDs, these will have 100 PG 
memberships each. This is shown as PGs in the utilisation columns. If these 
terms were used with a bit more precision, it would be less confusing.


If the data distribution will remain more or less the same in the near future, 
changing the PG count as follows should help:


Assuming that you have 20 OSDs (OSD 1 seems to be gone), increasing the PG 
count for pool 20 from 64 to 512 will require 2x(512-64)=896 additional PG 
memberships. Distributed over 20 OSDs, this is on average 44.8 memberships per 
OSD. This will leave PG memberships available for the future and should sort 
out your distribution problem.


If you want to follow this route, you can do the following:


- ceph osd set noout # maintenance mode

- ceph osd set norebalance # prevent immediate start of rebalancing

- increase pg_num and pgp_num of pool 20 to 512

- increase the reweight of osd.3 to, say 0.8

- wait for peering to finish and any recovery to complete

- ceph osd unset noout # leave maintenance mode

- if everything OK (all PGs active, no degraded objects, no recovery) do ceph 
osd unset norebalance

- once the rebalancing is finished, reweight the OSDs manually, the built-in 
reweight commands are a bit limited


is that just a matter of "ceph osd reweight osd.3 1"

Yes, this will do. However, increase probably in less aggressive steps. You 
will need some rebalancing, because you run a bit low on available space.


As a final note, running with size 2 min size 1 is a serious data redundancy 
risk. You should get another server and upgrade to 3(2).


Best regards,

=

Frank Schilder

AIT Risø Campus

Bygning 109, rum S14




From: Mark Johnson <



ma...@iovox.com

>

Sent: 29 October 2020 08:19:01

To:



ceph-users@ceph.io

; Frank Schilder

Subject: Re: pgs stuck backfill_toofull


Thanks for you swift reply.  Below is the requested information.


I understand the bit about not being able to reduce the pg count as we've come 
across this issue once before.  This is the reason I've been hesitant to make 
any changes there without being 100% certain of getting it right and the impact 
of these changes.  That, and the more I read about how to calculate this, the 
more confused I get.  As for the reweight, is that just a matter of "ceph osd 
reweight osd.3 1" once the other issues are sorted out (or perhaps start with a 
less dramatic change and work up)?


Also, presuming I need to change the pg/pgp num, would you be suggesting on 
pool 2 based on the below info (the pool with a few large files) or on pool 20 
(the pool with the most data but an average of about 250KB file size)?  I'm 
just completely confused as to what's caused this issue in the first place and 
how to go about fixing it.  On top of that, am I going to be able to increase 
the pg/pgp count with the cluster in a state of health_warn?  Just some posts 
I've read seem to indicate that the health state needs to be ok before this 
sort of thing can be changed (but I could be misunnderstanding what I'm 
reading).


Anyway, here's the info:


# ceph df

GLOBAL:

SIZE   AVAIL  RAW USED %RAW USED

28219G 11227G   15558G 55.13

POOLS:

NAME  ID USED   %USED MAX AVAIL 
OBJECTS

rbd   0   0 0  690G 
   0

KUBERNETES1122G 15.11  690G
34188

KUBERNETES_METADATA   2  49310k 0  690G 
1426

default.rgw.control   11  0 0  690G 
   8

default.rgw.data.root 12 20076k 0  690G
54412

default.rgw.gc13  0 0  690G 
  32

default.rgw.log

[ceph-users] Re: frequent Monitor down

2020-10-29 Thread Marc Roos
Really? First time I read this here, afaik you can get a split brain 
like this.

 

-Original Message-
Sent: Thursday, October 29, 2020 12:16 AM
To: Eugen Block
Cc: ceph-users
Subject: [ceph-users] Re: frequent Monitor down

Eugen, I've got four physical servers and I've installed mon on all of 
them. I've discussed it with Wido and a few other chaps from ceph and 
there is no issue in doing it. The quorum issues would happen if you 
have 2 mons. If you've got more than 2 you should be fine.

Andrei

- Original Message -
> From: "Eugen Block" 
> To: "Andrei Mikhailovsky" 
> Cc: "ceph-users" 
> Sent: Wednesday, 28 October, 2020 20:19:15
> Subject: Re: [ceph-users] Re: frequent Monitor down

> Why do you have 4 MONs in the first place? That way a quorum is 
> difficult to achieve, could it be related to that?
> 
> Zitat von Andrei Mikhailovsky :
> 
>> Yes, I have, Eugen, I see no obvious reason / error / etc. I see a 
>> lot of entries relating to Compressing as well as monitor going down.
>>
>> Andrei
>>
>>
>>
>> - Original Message -
>>> From: "Eugen Block" 
>>> To: "ceph-users" 
>>> Sent: Wednesday, 28 October, 2020 11:51:20
>>> Subject: [ceph-users] Re: frequent Monitor down
>>
>>> Have you looked into syslog and mon logs?
>>>
>>>
>>> Zitat von Andrei Mikhailovsky :
>>>
 Hello everyone,

 I am having regular messages that the Monitors are going down and 
up:

 2020-10-27T09:50:49.032431+ mon .arh-ibstorage2-ib ( mon .1)
 2248 : cluster [WRN] Health check failed: 1/4 mons down, quorum 
 arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN) 
 2020-10-27T09:50:49.123511+ mon .arh-ibstorage2-ib ( mon .1) 
 2250 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing 
 BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum 
 arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout 
 flag(s) set; 43 pgs not deep-scrubbed in time; 12 pgs not scrubbed 
 in time 2020-10-27T09:50:52.735457+ mon .arh-ibstorage1-ib ( 
 mon .0)
 31287 : cluster [INF] Health check cleared: MON_DOWN (was: 1/4 mons 

 down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib)
 2020-10-27T12:35:20.556458+ mon .arh-ibstorage2-ib ( mon .1) 
 2260 : cluster [WRN] Health check failed: 1/4 mons down, quorum 
 arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN) 
 2020-10-27T12:35:20.643282+ mon .arh-ibstorage2-ib ( mon .1)
 2262 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing 
 BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum 
 arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout 
 flag(s) set; 47 pgs not deep-scrubbed in time; 14 pgs not scrubbed 
 in time


 This happens on a daily basis several times a day.

 Could you please let me know how to fix this annoying problem?

 I am running ceph version 15.2.4
 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) on 
 Ubuntu 18.04 LTS with latest updates.

 Thanks

 Andrei
 ___
 ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send 
 an email to ceph-users-le...@ceph.io
>>>
>>>
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
I reset the pg_num after adding osd, it made some pg inactive(in
activating state)

Frank Schilder  于2020年10月29日周四 下午3:56写道:

> This does not explain incomplete and inactive PGs. Are you hitting
> https://tracker.ceph.com/issues/46847 (see also thread "Ceph does not
> recover from OSD restart"? In that case, temporarily stopping and
> restarting all new OSDs might help.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Zhenshi Zhou 
> Sent: 29 October 2020 08:30:25
> To: Frank Schilder
> Cc: ceph-users
> Subject: Re: [ceph-users] monitor sst files continue growing
>
> After add OSDs into the cluster, the recovery and backfill progress has
> not finished yet
>
> Zhenshi Zhou mailto:deader...@gmail.com>>
> 于2020年10月29日周四 下午3:29写道:
> MGR is stopped by me cause it took too much memories.
> For pg status, I added some OSDs in this cluster, and it
>
> Frank Schilder mailto:fr...@dtu.dk>> 于2020年10月29日周四
> 下午3:27写道:
> Your problem is the overall cluster health. The MONs store cluster history
> information that will be trimmed once it reaches HEALTH_OK. Restarting the
> MONs only makes things worse right now. The health status is a mess, no
> MGR, a bunch of PGs inactive, etc. This is what you need to resolve. How
> did your cluster end up like this?
>
> It looks like all OSDs are up and in. You need to find out
>
> - why there are inactive PGs
> - why there are incomplete PGs
>
> This usually happens when OSDs go missing.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Zhenshi Zhou mailto:deader...@gmail.com>>
> Sent: 29 October 2020 07:37:19
> To: ceph-users
> Subject: [ceph-users] monitor sst files continue growing
>
> Hi all,
>
> My cluster is in wrong state. SST files in /var/lib/ceph/mon/xxx/store.db
> continue growing. It claims mon are using a lot of disk space.
>
> I set "mon compact on start = true" and restart one of the monitors. But
> it started and campacting for a long time, seems it has no end.
>
> [image.png]
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Frank Schilder
Hi Mark,

it looks like you have some very large PGs. Also, you run with a quite low PG 
count, in particular, for the large pool. Please post the output of "ceph df" 
and "ceph osd pool ls detail" to see how much data is in each pool and some 
pool info. I guess you need to increase the PG count of the large pool to split 
PGs up and also reduce the impact of imbalance. When I look at this:

 3 1.37790  0.45013  1410G  1079G   259G 76.49 1.39  21
 4 1.37790  0.95001  1410G  1086G   253G 76.98 1.40  44

I would conclude that the PGs are too large, the reweight of 0.45 without much 
utilization effect indicates that. This weight will need to be rectified as 
well at some time.

You should be able to run with 100-200 PGs per OSD. Please be aware that PG 
planning requires caution as you cannot reduce the PG count of a pool in your 
version. You need to know how much data is in the pools right now and what the 
future plan is.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Mark Johnson 
Sent: 29 October 2020 06:55:55
To: ceph-users@ceph.io
Subject: [ceph-users] pgs stuck backfill_toofull

I've been struggling with this one for a few days now.  We had an OSD report as 
near full a few days ago.  Had this happen a couple of times before and a 
reweight-by-utilization has sorted it out in the past.  Tried the same again 
but this time we ended up with a couple of pgs in a state of backfill_toofull 
and a handful of misplaced objects as a result.

Tried doing the reweight a few more times and it's been moving data around.  We 
did have another osd trigger the near full alert but running the reweight a 
couple more times seems to have moved some of that data around a bit better.  
However, the original near_full osd doesn't seem to have changed much and the 
backfill_toofull pgs are still there.  I'd keep doing the 
reweight-by-utilization but I'm not sure if I'm heading down the right path and 
if it will eventually sort it out.

We have 14 pools, but the vast majority of data resides in just one of those 
pools (pool 20).  The pgs in the backfill state are in pool 2 (as far as I can 
tell).  That particular pool is used for some cephfs stuff and has a handful of 
large files in there (not sure if this is significant to the problem).

All up, our utilization is showing as 55.13% but some of our OSDs are showing 
as 76% in use with this one problem sitting at 85.02%.  Right now, I'm just not 
sure what the proper corrective action is.  The last couple of reweights I've 
run have been a bit more targetted in that I've set it to only function on two 
OSDs at a time.  If I run a test-reweight targetting only one osd, it does say 
it will reweight OSD 9 (the one at 85.02%).  I gather this will move data away 
from this OSD and potentially get it below the threshold.  However, at one 
point in the past couple of days, it's shown as no OSDs in a near full state, 
yet the two pgs in backfill_toofull didn't change.  So, that's why I'm not sure 
continually reweighting is going to solve this issue.

I'm a long way from knowledgable on Ceph so I'm not really sure what 
information is useful here.  Here's a bit of info on what I'm seeing.  Can 
provide anything else that might help.


Basically, we have a three node cluster but only two have OSDs.  The third is 
there simply to enable a quorum to be established.  The OSDs are evenly spread 
across these two needs and the configuration of each is identical.  We are 
running Jewel and are not in a position to upgrade at this stage.




# ceph --version
ceph version 10.2.11 (e4b061b47f07f583c92a050d9e84b1813a35671e)


# ceph health detail
HEALTH_WARN 2 pgs backfill_toofull; 2 pgs stuck unclean; recovery 33/62099566 
objects misplaced (0.000%); 1 near full osd(s)
pg 2.52 is stuck unclean for 201822.031280, current state 
active+remapped+backfill_toofull, last acting [17,3]
pg 2.18 is stuck unclean for 202114.617682, current state 
active+remapped+backfill_toofull, last acting [18,2]
pg 2.18 is active+remapped+backfill_toofull, acting [18,2]
pg 2.52 is active+remapped+backfill_toofull, acting [17,3]
recovery 33/62099566 objects misplaced (0.000%)
osd.9 is near full at 85%


# ceph osd df
ID WEIGHT  REWEIGHT SIZE   USEAVAIL  %USE  VAR  PGS
 2 1.37790  1.0  1410G   842G   496G 59.75 1.08  33
 3 1.37790  0.45013  1410G  1079G   259G 76.49 1.39  21
 4 1.37790  0.95001  1410G  1086G   253G 76.98 1.40  44
 5 1.37790  1.0  1410G   617G   722G 43.74 0.79  43
 6 1.37790  0.65009  1410G   616G   722G 43.69 0.79  39
 7 1.37790  0.95001  1410G   495G   844G 35.10 0.64  40
 8 1.37790  1.0  1410G   732G   606G 51.93 0.94  52
 9 1.37790  0.70007  1410G  1199G   139G 85.02 1.54  37
10 1.37790  1.0  1410G   611G   727G 43.35 0.79  41
11 1.37790  0.75006  1410G   495G   843G 35.11 0.64  32
 0 1.37790  1.0  1410G   731G   608G 51.82 0.94  43
12 1.37790  1.0  1410G   851G   487G 60.36 1.

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Frank Schilder
Your problem is the overall cluster health. The MONs store cluster history 
information that will be trimmed once it reaches HEALTH_OK. Restarting the MONs 
only makes things worse right now. The health status is a mess, no MGR, a bunch 
of PGs inactive, etc. This is what you need to resolve. How did your cluster 
end up like this?

It looks like all OSDs are up and in. You need to find out

- why there are inactive PGs
- why there are incomplete PGs

This usually happens when OSDs go missing.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Zhenshi Zhou 
Sent: 29 October 2020 07:37:19
To: ceph-users
Subject: [ceph-users] monitor sst files continue growing

Hi all,

My cluster is in wrong state. SST files in /var/lib/ceph/mon/xxx/store.db
continue growing. It claims mon are using a lot of disk space.

I set "mon compact on start = true" and restart one of the monitors. But
it started and campacting for a long time, seems it has no end.

[image.png]
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Frank Schilder
Cephfs pools are uncritical, because ceph fs splits very large files into 
chunks of objectsize. The RGW pool is the problem, because RGW does not as far 
as I know. A few 1TB uploads and you have a problem.

The calculation is confusing, because the term PG is used in two different 
meanings, unfortunately. The pool PG count and OSD PG count are different 
things. A PG is a virtual raid set distributed over some OSDs. The number of 
PGs in a pool is the count of such raid sets. The PG count for an OSD is in 
fact the PG membership count - something completely different. It says in how 
many PGs an OSD is a member of. To create 100PGs with replication 3 you need 
3x100=300 PG memberships. If you have 3 OSDs, these will have 100 PG 
memberships each. This is shown as PGs in the utilisation columns. If these 
terms were used with a bit more precision, it would be less confusing.

If the data distribution will remain more or less the same in the near future, 
changing the PG count as follows should help:

Assuming that you have 20 OSDs (OSD 1 seems to be gone), increasing the PG 
count for pool 20 from 64 to 512 will require 2x(512-64)=896 additional PG 
memberships. Distributed over 20 OSDs, this is on average 44.8 memberships per 
OSD. This will leave PG memberships available for the future and should sort 
out your distribution problem.

If you want to follow this route, you can do the following:

- ceph osd set noout # maintenance mode
- ceph osd set norebalance # prevent immediate start of rebalancing
- increase pg_num and pgp_num of pool 20 to 512
- increase the reweight of osd.3 to, say 0.8
- wait for peering to finish and any recovery to complete
- ceph osd unset noout # leave maintenance mode
- if everything OK (all PGs active, no degraded objects, no recovery) do ceph 
osd unset norebalance
- once the rebalancing is finished, reweight the OSDs manually, the built-in 
reweight commands are a bit limited

> is that just a matter of "ceph osd reweight osd.3 1"
Yes, this will do. However, increase probably in less aggressive steps. You 
will need some rebalancing, because you run a bit low on available space.

As a final note, running with size 2 min size 1 is a serious data redundancy 
risk. You should get another server and upgrade to 3(2).

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Mark Johnson 
Sent: 29 October 2020 08:19:01
To: ceph-users@ceph.io; Frank Schilder
Subject: Re: pgs stuck backfill_toofull

Thanks for you swift reply.  Below is the requested information.

I understand the bit about not being able to reduce the pg count as we've come 
across this issue once before.  This is the reason I've been hesitant to make 
any changes there without being 100% certain of getting it right and the impact 
of these changes.  That, and the more I read about how to calculate this, the 
more confused I get.  As for the reweight, is that just a matter of "ceph osd 
reweight osd.3 1" once the other issues are sorted out (or perhaps start with a 
less dramatic change and work up)?

Also, presuming I need to change the pg/pgp num, would you be suggesting on 
pool 2 based on the below info (the pool with a few large files) or on pool 20 
(the pool with the most data but an average of about 250KB file size)?  I'm 
just completely confused as to what's caused this issue in the first place and 
how to go about fixing it.  On top of that, am I going to be able to increase 
the pg/pgp count with the cluster in a state of health_warn?  Just some posts 
I've read seem to indicate that the health state needs to be ok before this 
sort of thing can be changed (but I could be misunnderstanding what I'm 
reading).

Anyway, here's the info:

# ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
28219G 11227G   15558G 55.13
POOLS:
NAME  ID USED   %USED MAX AVAIL 
OBJECTS
rbd   0   0 0  690G 
   0
KUBERNETES1122G 15.11  690G
34188
KUBERNETES_METADATA   2  49310k 0  690G 
1426
default.rgw.control   11  0 0  690G 
   8
default.rgw.data.root 12 20076k 0  690G
54412
default.rgw.gc13  0 0  690G 
  32
default.rgw.log   14  0 0  690G 
 127
default.rgw.users.uid 15   4942 0  690G 
  15
default.rgw.users.keys16126 0  690G 
   4
default.rgw.users.swift   17252 0  690G 
   8
default.rgw.buckets.index 18  0 0  690G
27206
.rgw.root 19   1588 0

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Frank Schilder
This does not explain incomplete and inactive PGs. Are you hitting 
https://tracker.ceph.com/issues/46847 (see also thread "Ceph does not recover 
from OSD restart"? In that case, temporarily stopping and restarting all new 
OSDs might help.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Zhenshi Zhou 
Sent: 29 October 2020 08:30:25
To: Frank Schilder
Cc: ceph-users
Subject: Re: [ceph-users] monitor sst files continue growing

After add OSDs into the cluster, the recovery and backfill progress has not 
finished yet

Zhenshi Zhou mailto:deader...@gmail.com>> 于2020年10月29日周四 
下午3:29写道:
MGR is stopped by me cause it took too much memories.
For pg status, I added some OSDs in this cluster, and it

Frank Schilder mailto:fr...@dtu.dk>> 于2020年10月29日周四 下午3:27写道:
Your problem is the overall cluster health. The MONs store cluster history 
information that will be trimmed once it reaches HEALTH_OK. Restarting the MONs 
only makes things worse right now. The health status is a mess, no MGR, a bunch 
of PGs inactive, etc. This is what you need to resolve. How did your cluster 
end up like this?

It looks like all OSDs are up and in. You need to find out

- why there are inactive PGs
- why there are incomplete PGs

This usually happens when OSDs go missing.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Zhenshi Zhou mailto:deader...@gmail.com>>
Sent: 29 October 2020 07:37:19
To: ceph-users
Subject: [ceph-users] monitor sst files continue growing

Hi all,

My cluster is in wrong state. SST files in /var/lib/ceph/mon/xxx/store.db
continue growing. It claims mon are using a lot of disk space.

I set "mon compact on start = true" and restart one of the monitors. But
it started and campacting for a long time, seems it has no end.

[image.png]
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Frank Schilder
It will prevent OSDs from being marked out if you shut them down or the . 
Changing PG counts does not require a shut down of OSDs, but sometimes OSDs get 
overloaded by peering traffic and the MONs can loose contact for a while. 
Setting noout will prevent flapping and also reduce the administrative traffic 
a bit. Its just a precaution.

If this is a production system, you need to rethink your size 2 min size 1 
config. This is the major problem for keeping the service available under 
maintenance.

Please take your time and read the docs on all the commands I sent you. The 
cluster status is not critical as far as I can see.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Mark Johnson 
Sent: 29 October 2020 08:58:15
To: ceph-users@ceph.io; Frank Schilder
Subject: Re: pgs stuck backfill_toofull

Thanks again Frank.  That gives me something to digest (and try to understand).

One question regarding maintenance mode, these are production systems that are 
required to be available all the time.  What, exactly, will happen if I issue 
this command for maintenance mode?

Thanks,
Mark


On Thu, 2020-10-29 at 07:51 +, Frank Schilder wrote:

Cephfs pools are uncritical, because ceph fs splits very large files into 
chunks of objectsize. The RGW pool is the problem, because RGW does not as far 
as I know. A few 1TB uploads and you have a problem.


The calculation is confusing, because the term PG is used in two different 
meanings, unfortunately. The pool PG count and OSD PG count are different 
things. A PG is a virtual raid set distributed over some OSDs. The number of 
PGs in a pool is the count of such raid sets. The PG count for an OSD is in 
fact the PG membership count - something completely different. It says in how 
many PGs an OSD is a member of. To create 100PGs with replication 3 you need 
3x100=300 PG memberships. If you have 3 OSDs, these will have 100 PG 
memberships each. This is shown as PGs in the utilisation columns. If these 
terms were used with a bit more precision, it would be less confusing.


If the data distribution will remain more or less the same in the near future, 
changing the PG count as follows should help:


Assuming that you have 20 OSDs (OSD 1 seems to be gone), increasing the PG 
count for pool 20 from 64 to 512 will require 2x(512-64)=896 additional PG 
memberships. Distributed over 20 OSDs, this is on average 44.8 memberships per 
OSD. This will leave PG memberships available for the future and should sort 
out your distribution problem.


If you want to follow this route, you can do the following:


- ceph osd set noout # maintenance mode

- ceph osd set norebalance # prevent immediate start of rebalancing

- increase pg_num and pgp_num of pool 20 to 512

- increase the reweight of osd.3 to, say 0.8

- wait for peering to finish and any recovery to complete

- ceph osd unset noout # leave maintenance mode

- if everything OK (all PGs active, no degraded objects, no recovery) do ceph 
osd unset norebalance

- once the rebalancing is finished, reweight the OSDs manually, the built-in 
reweight commands are a bit limited


is that just a matter of "ceph osd reweight osd.3 1"

Yes, this will do. However, increase probably in less aggressive steps. You 
will need some rebalancing, because you run a bit low on available space.


As a final note, running with size 2 min size 1 is a serious data redundancy 
risk. You should get another server and upgrade to 3(2).


Best regards,

=

Frank Schilder

AIT Risø Campus

Bygning 109, rum S14




From: Mark Johnson <



ma...@iovox.com

>

Sent: 29 October 2020 08:19:01

To:



ceph-users@ceph.io

; Frank Schilder

Subject: Re: pgs stuck backfill_toofull


Thanks for you swift reply.  Below is the requested information.


I understand the bit about not being able to reduce the pg count as we've come 
across this issue once before.  This is the reason I've been hesitant to make 
any changes there without being 100% certain of getting it right and the impact 
of these changes.  That, and the more I read about how to calculate this, the 
more confused I get.  As for the reweight, is that just a matter of "ceph osd 
reweight osd.3 1" once the other issues are sorted out (or perhaps start with a 
less dramatic change and work up)?


Also, presuming I need to change the pg/pgp num, would you be suggesting on 
pool 2 based on the below info (the pool with a few large files) or on pool 20 
(the pool with the most data but an average of about 250KB file size)?  I'm 
just completely confused as to what's caused this issue in the first place and 
how to go about fixing it.  On top of that, am I going to be able to increase 
the pg/pgp count with the cluster in a state of health_warn?  Just some posts 
I've read seem to indicate that the health state needs

[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Frank Schilder
He he.

> It will prevent OSDs from being marked out if you shut them down or the .

... down or the MONs loose heartbeats due to high network load during peering.

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: 29 October 2020 09:05:27
To: Mark Johnson; ceph-users@ceph.io
Subject: [ceph-users] Re: pgs stuck backfill_toofull

It will prevent OSDs from being marked out if you shut them down or the . 
Changing PG counts does not require a shut down of OSDs, but sometimes OSDs get 
overloaded by peering traffic and the MONs can loose contact for a while. 
Setting noout will prevent flapping and also reduce the administrative traffic 
a bit. Its just a precaution.

If this is a production system, you need to rethink your size 2 min size 1 
config. This is the major problem for keeping the service available under 
maintenance.

Please take your time and read the docs on all the commands I sent you. The 
cluster status is not critical as far as I can see.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Mark Johnson 
Sent: 29 October 2020 08:58:15
To: ceph-users@ceph.io; Frank Schilder
Subject: Re: pgs stuck backfill_toofull

Thanks again Frank.  That gives me something to digest (and try to understand).

One question regarding maintenance mode, these are production systems that are 
required to be available all the time.  What, exactly, will happen if I issue 
this command for maintenance mode?

Thanks,
Mark


On Thu, 2020-10-29 at 07:51 +, Frank Schilder wrote:

Cephfs pools are uncritical, because ceph fs splits very large files into 
chunks of objectsize. The RGW pool is the problem, because RGW does not as far 
as I know. A few 1TB uploads and you have a problem.


The calculation is confusing, because the term PG is used in two different 
meanings, unfortunately. The pool PG count and OSD PG count are different 
things. A PG is a virtual raid set distributed over some OSDs. The number of 
PGs in a pool is the count of such raid sets. The PG count for an OSD is in 
fact the PG membership count - something completely different. It says in how 
many PGs an OSD is a member of. To create 100PGs with replication 3 you need 
3x100=300 PG memberships. If you have 3 OSDs, these will have 100 PG 
memberships each. This is shown as PGs in the utilisation columns. If these 
terms were used with a bit more precision, it would be less confusing.


If the data distribution will remain more or less the same in the near future, 
changing the PG count as follows should help:


Assuming that you have 20 OSDs (OSD 1 seems to be gone), increasing the PG 
count for pool 20 from 64 to 512 will require 2x(512-64)=896 additional PG 
memberships. Distributed over 20 OSDs, this is on average 44.8 memberships per 
OSD. This will leave PG memberships available for the future and should sort 
out your distribution problem.


If you want to follow this route, you can do the following:


- ceph osd set noout # maintenance mode

- ceph osd set norebalance # prevent immediate start of rebalancing

- increase pg_num and pgp_num of pool 20 to 512

- increase the reweight of osd.3 to, say 0.8

- wait for peering to finish and any recovery to complete

- ceph osd unset noout # leave maintenance mode

- if everything OK (all PGs active, no degraded objects, no recovery) do ceph 
osd unset norebalance

- once the rebalancing is finished, reweight the OSDs manually, the built-in 
reweight commands are a bit limited


is that just a matter of "ceph osd reweight osd.3 1"

Yes, this will do. However, increase probably in less aggressive steps. You 
will need some rebalancing, because you run a bit low on available space.


As a final note, running with size 2 min size 1 is a serious data redundancy 
risk. You should get another server and upgrade to 3(2).


Best regards,

=

Frank Schilder

AIT Risø Campus

Bygning 109, rum S14




From: Mark Johnson <



ma...@iovox.com

>

Sent: 29 October 2020 08:19:01

To:



ceph-users@ceph.io

; Frank Schilder

Subject: Re: pgs stuck backfill_toofull


Thanks for you swift reply.  Below is the requested information.


I understand the bit about not being able to reduce the pg count as we've come 
across this issue once before.  This is the reason I've been hesitant to make 
any changes there without being 100% certain of getting it right and the impact 
of these changes.  That, and the more I read about how to calculate this, the 
more confused I get.  As for the reweight, is that just a matter of "ceph osd 
reweight osd.3 1" once the other issues are sorted out (or perhaps start with a 
less dramatic change and work up)?


Also, presuming I need to change the pg/pgp num, would you be suggesting on 
pool 2 based on the below in

[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
Hi,

I was so anxious a few hours ago cause the sst files were growing so fast
and I don't think
the space on mon servers could afford it.

Let me talk it from the beginning. I have a cluster with OSD deployed on
SATA(7200rpm).
10T each OSD and I used ec pool for more space.I added new OSDs into the
cluster last
week and it has recovered well so far. After that, while the cluster is
still recovering, I increased the pg_num.
Besides that, the clients still write data to the server all the time.

And the cluster became unhealthy last night. Some osds were down and one
mon was down.
Then I found the mon servers' root directories were lack of free space. The
sst files in /var/lib/ceph/mon/ceph-xxx/store.db/
were growing rapidly.


Frank Schilder  于2020年10月29日周四 下午7:15写道:

> I think you really need to sit down and explain the full story. Dropping
> one-liners with new information will not work via e-mail.
>
> I have never heard of the problem you are facing, so you did something
> that possibly no-one else has done before. Unless we know the full history
> from the last time the cluster was health_ok until now, it will almost
> certainly not be possible to figure out what is going on via e-mail.
>
> Usually, setting "norebalance" and "norecovery" should stop any recovery
> IO and allow the PGs to peer. If they do not become active, something is
> wrong and the information we got so far does not give a clue what this
> could be.
>
> Please post the output of "ceph health detail", "ceph osd pool stats" and
> "ceph osd pool ls detail" and a log of actions and results since last
> health_ok status here, maybe it gives a clue what is going on.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Zhenshi Zhou 
> Sent: 29 October 2020 09:44:14
> To: Frank Schilder
> Cc: ceph-users
> Subject: Re: [ceph-users] monitor sst files continue growing
>
> I reset the pg_num after adding osd, it made some pg inactive(in
> activating state)
>
> Frank Schilder mailto:fr...@dtu.dk>> 于2020年10月29日周四
> 下午3:56写道:
> This does not explain incomplete and inactive PGs. Are you hitting
> https://tracker.ceph.com/issues/46847 (see also thread "Ceph does not
> recover from OSD restart"? In that case, temporarily stopping and
> restarting all new OSDs might help.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Zhenshi Zhou mailto:deader...@gmail.com>>
> Sent: 29 October 2020 08:30:25
> To: Frank Schilder
> Cc: ceph-users
> Subject: Re: [ceph-users] monitor sst files continue growing
>
> After add OSDs into the cluster, the recovery and backfill progress has
> not finished yet
>
> Zhenshi Zhou mailto:deader...@gmail.com> deader...@gmail.com>> 于2020年10月29日周四 下午3:29写道:
> MGR is stopped by me cause it took too much memories.
> For pg status, I added some OSDs in this cluster, and it
>
> Frank Schilder mailto:fr...@dtu.dk> >> 于2020年10月29日周四 下午3:27写道:
> Your problem is the overall cluster health. The MONs store cluster history
> information that will be trimmed once it reaches HEALTH_OK. Restarting the
> MONs only makes things worse right now. The health status is a mess, no
> MGR, a bunch of PGs inactive, etc. This is what you need to resolve. How
> did your cluster end up like this?
>
> It looks like all OSDs are up and in. You need to find out
>
> - why there are inactive PGs
> - why there are incomplete PGs
>
> This usually happens when OSDs go missing.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Zhenshi Zhou mailto:deader...@gmail.com
> >>>
> Sent: 29 October 2020 07:37:19
> To: ceph-users
> Subject: [ceph-users] monitor sst files continue growing
>
> Hi all,
>
> My cluster is in wrong state. SST files in /var/lib/ceph/mon/xxx/store.db
> continue growing. It claims mon are using a lot of disk space.
>
> I set "mon compact on start = true" and restart one of the monitors. But
> it started and campacting for a long time, seems it has no end.
>
> [image.png]
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
I then follow someone's guidance, add 'mon compact on start = true' to the
config and restart one mon.
That mon has not joined the cluster until I added two mon deployed on
virtual machines with ssd into
the cluster.

And now the cluster is fine except the pg status.
[image: image.png]
[image: image.png]

Zhenshi Zhou  于2020年10月29日周四 下午8:29写道:

> Hi,
>
> I was so anxious a few hours ago cause the sst files were growing so fast
> and I don't think
> the space on mon servers could afford it.
>
> Let me talk it from the beginning. I have a cluster with OSD deployed on
> SATA(7200rpm).
> 10T each OSD and I used ec pool for more space.I added new OSDs into the
> cluster last
> week and it has recovered well so far. After that, while the cluster is
> still recovering, I increased the pg_num.
> Besides that, the clients still write data to the server all the time.
>
> And the cluster became unhealthy last night. Some osds were down and one
> mon was down.
> Then I found the mon servers' root directories were lack of free space.
> The sst files in /var/lib/ceph/mon/ceph-xxx/store.db/
> were growing rapidly.
>
>
> Frank Schilder  于2020年10月29日周四 下午7:15写道:
>
>> I think you really need to sit down and explain the full story. Dropping
>> one-liners with new information will not work via e-mail.
>>
>> I have never heard of the problem you are facing, so you did something
>> that possibly no-one else has done before. Unless we know the full history
>> from the last time the cluster was health_ok until now, it will almost
>> certainly not be possible to figure out what is going on via e-mail.
>>
>> Usually, setting "norebalance" and "norecovery" should stop any recovery
>> IO and allow the PGs to peer. If they do not become active, something is
>> wrong and the information we got so far does not give a clue what this
>> could be.
>>
>> Please post the output of "ceph health detail", "ceph osd pool stats" and
>> "ceph osd pool ls detail" and a log of actions and results since last
>> health_ok status here, maybe it gives a clue what is going on.
>>
>> Best regards,
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> 
>> From: Zhenshi Zhou 
>> Sent: 29 October 2020 09:44:14
>> To: Frank Schilder
>> Cc: ceph-users
>> Subject: Re: [ceph-users] monitor sst files continue growing
>>
>> I reset the pg_num after adding osd, it made some pg inactive(in
>> activating state)
>>
>> Frank Schilder mailto:fr...@dtu.dk>> 于2020年10月29日周四
>> 下午3:56写道:
>> This does not explain incomplete and inactive PGs. Are you hitting
>> https://tracker.ceph.com/issues/46847 (see also thread "Ceph does not
>> recover from OSD restart"? In that case, temporarily stopping and
>> restarting all new OSDs might help.
>>
>> Best regards,
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> 
>> From: Zhenshi Zhou mailto:deader...@gmail.com>>
>> Sent: 29 October 2020 08:30:25
>> To: Frank Schilder
>> Cc: ceph-users
>> Subject: Re: [ceph-users] monitor sst files continue growing
>>
>> After add OSDs into the cluster, the recovery and backfill progress has
>> not finished yet
>>
>> Zhenshi Zhou mailto:deader...@gmail.com>> deader...@gmail.com>> 于2020年10月29日周四
>> 下午3:29写道:
>> MGR is stopped by me cause it took too much memories.
>> For pg status, I added some OSDs in this cluster, and it
>>
>> Frank Schilder mailto:fr...@dtu.dk>> >> 于2020年10月29日周四 下午3:27写道:
>> Your problem is the overall cluster health. The MONs store cluster
>> history information that will be trimmed once it reaches HEALTH_OK.
>> Restarting the MONs only makes things worse right now. The health status is
>> a mess, no MGR, a bunch of PGs inactive, etc. This is what you need to
>> resolve. How did your cluster end up like this?
>>
>> It looks like all OSDs are up and in. You need to find out
>>
>> - why there are inactive PGs
>> - why there are incomplete PGs
>>
>> This usually happens when OSDs go missing.
>>
>> Best regards,
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> 
>> From: Zhenshi Zhou mailto:deader...@gmail.com
>> >>>
>> Sent: 29 October 2020 07:37:19
>> To: ceph-users
>> Subject: [ceph-users] monitor sst files continue growing
>>
>> Hi all,
>>
>> My cluster is in wrong state. SST files in /var/lib/ceph/mon/xxx/store.db
>> continue growing. It claims mon are using a lot of disk space.
>>
>> I set "mon compact on start = true" and restart one of the monitors. But
>> it started and campacting for a long time, seems it has no end.
>>
>> [image.png]
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Monitor persistently out-of-quorum

2020-10-29 Thread Stefan Kooman
On 2020-10-29 01:26, Ki Wong wrote:
> Hello,
> 
> I am at my wit's end.
> 
> So I made a mistake in the configuration of my router and one
> of the monitors (out of 3) dropped out of the quorum and nothing
> I’ve done allow it to rejoin. That includes reinstalling the
> monitor with ceph-ansible.

What Ceph version?
What kernel version (on the monitors)?


Just to check some things:

make sure the mon-keyring on _all_ monitors is equal and permissions are
correct (ceph can read the file) and read/write to the monstore.

Have you enabled msgr v1 and v2?
Do you use DNS to detect the monitors [1].

ceph daemon mon.$mon$id daemon mon_status <- what does this give on the
out of quorum monitor?

See the troubleshooting documentation [2] for more information.

Gr. Stefan

[1]: https://docs.ceph.com/en/latest/rados/configuration/mon-lookup-dns/
[2]:
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: dashboard object gateway not working

2020-10-29 Thread Siegfried Höllrigl


On the machines with the radosgateways, there is also a haproxy running. 
(And makes https->http conversion).


I have tried it in both ways alreads.

 on Port 443 (resolves to the external IP)

and

 on the internal Port (with a hosts entry to the internal IP; on 
the machine, where the ceph-mgr is running)


-

But I think, the connection from ceph-mgr to the radosgateway is fine, 
because when I change e.g. the port,


i get a completely different error message.


Br,

Am 27.10.2020 um 14:49 schrieb Eugen Block:

Hi,


I am only unsure about the "get-rgw-api-admin-resource" value.
But I think this should be "admin".


yes, "admin" should do it (I have a virtual lab environment without 
much traffic).


I think there would be another failure than "Sorry, we could not find 
what you were looking for", if something with the access or secret 
key would be wrong, correct ?


I'm not sure if there would be a different message. Does your DNS work 
properly? Have you checked the client rgw section and does the 
rgw_dns_name match the dashboard rgw-api-host?
Is 'ceph mgr services' matching your expectations? The RGWs are up and 
running?




Zitat von Siegfried Höllrigl :


I have checked these values more than once.

I am only unsure about the "get-rgw-api-admin-resource" value.

But I think this should be "admin".


I think there would be another failure than "Sorry, we could not find 
what you were looking for",


if something with the access or secret key would be wrong, correct ?

Br,


Am 27.10.2020 um 12:43 schrieb Eugen Block:
I think you might need to set rgw-api-host, -port etc. for the 
dashboard:


ceph dashboard get-rgw-api-access-key
ceph dashboard get-rgw-api-admin-resource
ceph dashboard get-rgw-api-host
ceph dashboard get-rgw-api-port
ceph dashboard get-rgw-api-secret-key
ceph dashboard get-rgw-api-ssl-verify
ceph dashboard get-rgw-api-user-id

To set a value you run

ceph dashboard set-rgw-api-secret-key 

Does this help?

Regards,
Eugen

Zitat von Siegfried Höllrigl :

We are running Ceph 14.2.12 and would like to manage our object 
gateways via the dashboard.


The rados gateways are running on different (virtual) machines than 
the mon servers (where mon, mgr and mds are running).


The dashboard seems to be running fine.

But when we cklick on "Object Gateway" we get the following error 
message :


Sorry, we could not find what you were looking for

500 - Internal Server Error
The server encountered an unexpected condition which prevented it 
from fulfilling the request.

10/27/20 12:30:32 PM

The logfiles (of mgr and radosgw) do not show anything helpful.

Maybe we have missed someting that needs to be 
enabled/installed/activated ?!?


Any Ideas ?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
 Siegfried Höllrigl
Technik

 




Xidras GmbH
Stockern 47
3744 Stockern
Austria

Tel: +43 (0)2983 201 30505
Fax: +43 (0)2983 201 30505 9
Email:   siegfried.hoellr...@xidras.com
Web: http://www.xidras.com

FN 317036 f | Landesgericht Krems | ATU64485024

 



VERTRAULICHE INFORMATIONEN!
Diese eMail enthält vertrauliche Informationen und ist nur für den 
berechtigten
Empfänger bestimmt. Wenn diese eMail nicht für Sie bestimmt ist, 
bitten wir Sie,
diese eMail an uns zurückzusenden und anschließend auf Ihrem Computer 
und
Mail-Server zu löschen. Solche eMails und Anlagen dürfen Sie weder 
nutzen,

noch verarbeiten oder Dritten zugänglich machen, gleich in welcher Form.
Wir danken für Ihre Kooperation!

CONFIDENTIAL!
This email contains confidential information and is intended for the 
authorised
recipient only. If you are not an authorised recipient, please return 
the email
to us and then delete it from your computer and mail-server. You may 
neither
use nor edit any such emails including attachments, nor make them 
accessible

to third parties in any manner whatsoever.
Thank you for your cooperation
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
 
Siegfried Höllrigl

Technik




Xidras GmbH
Stockern 47
3744 Stockern
Austria

Tel: +43 (0)2983 201 30505
Fax: +43 (0)2983 201 30505 9
Email:   siegfried.hoellr...@xidras.com
Web: http://www.xidras.com

FN 317036 f | Landesgericht Krems | ATU64485024

__

[ceph-users] Cloud Sync Module

2020-10-29 Thread Sailaja Yedugundla
I am trying to configure cloud sync module in my ceph cluster to implement 
backup to AWS S3 cluster. I could not find configure using the available 
documentation. Can someone help me to implement this? 

Thanks,
Sailaja
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Alex Gracie
We hit this issue over the weekend on our HDD backed EC Nautilus cluster while 
removing a single OSD. We also did not have any luck using compaction. The 
mon-logs filled up our entire root disk on the mon servers and we were running 
on a single monitor for hours while we tried to finish recovery and reclaim 
space. The past couple weeks we also noticed "pg not scubbed in time" errors 
but are unsure if they are related. I'm still the exact cause of this(other 
than the general misplaced/degraded objects) and what kind of growth is 
acceptable for these store.db files. 

In order to get our downed mons restarted, we ended up backing up and coping 
the /var/lib/ceph/mon/* contents to a remote host, setting up an sshfs mount to 
that new host with large NVME and SSDs, ensuring the mount paths were owned by 
ceph, then clearing up enough space on the monitor host to start the service. 
This allowed our store.db directory to grow freely until the misplaced/degraded 
objects could recover and monitors all rejoined eventually.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Frank Schilder
I think you really need to sit down and explain the full story. Dropping 
one-liners with new information will not work via e-mail.

I have never heard of the problem you are facing, so you did something that 
possibly no-one else has done before. Unless we know the full history from the 
last time the cluster was health_ok until now, it will almost certainly not be 
possible to figure out what is going on via e-mail.

Usually, setting "norebalance" and "norecovery" should stop any recovery IO and 
allow the PGs to peer. If they do not become active, something is wrong and the 
information we got so far does not give a clue what this could be.

Please post the output of "ceph health detail", "ceph osd pool stats" and "ceph 
osd pool ls detail" and a log of actions and results since last health_ok 
status here, maybe it gives a clue what is going on.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Zhenshi Zhou 
Sent: 29 October 2020 09:44:14
To: Frank Schilder
Cc: ceph-users
Subject: Re: [ceph-users] monitor sst files continue growing

I reset the pg_num after adding osd, it made some pg inactive(in activating 
state)

Frank Schilder mailto:fr...@dtu.dk>> 于2020年10月29日周四 下午3:56写道:
This does not explain incomplete and inactive PGs. Are you hitting 
https://tracker.ceph.com/issues/46847 (see also thread "Ceph does not recover 
from OSD restart"? In that case, temporarily stopping and restarting all new 
OSDs might help.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Zhenshi Zhou mailto:deader...@gmail.com>>
Sent: 29 October 2020 08:30:25
To: Frank Schilder
Cc: ceph-users
Subject: Re: [ceph-users] monitor sst files continue growing

After add OSDs into the cluster, the recovery and backfill progress has not 
finished yet

Zhenshi Zhou 
mailto:deader...@gmail.com>>>
 于2020年10月29日周四 下午3:29写道:
MGR is stopped by me cause it took too much memories.
For pg status, I added some OSDs in this cluster, and it

Frank Schilder 
mailto:fr...@dtu.dk>>> 
于2020年10月29日周四 下午3:27写道:
Your problem is the overall cluster health. The MONs store cluster history 
information that will be trimmed once it reaches HEALTH_OK. Restarting the MONs 
only makes things worse right now. The health status is a mess, no MGR, a bunch 
of PGs inactive, etc. This is what you need to resolve. How did your cluster 
end up like this?

It looks like all OSDs are up and in. You need to find out

- why there are inactive PGs
- why there are incomplete PGs

This usually happens when OSDs go missing.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Zhenshi Zhou 
mailto:deader...@gmail.com>>>
Sent: 29 October 2020 07:37:19
To: ceph-users
Subject: [ceph-users] monitor sst files continue growing

Hi all,

My cluster is in wrong state. SST files in /var/lib/ceph/mon/xxx/store.db
continue growing. It claims mon are using a lot of disk space.

I set "mon compact on start = true" and restart one of the monitors. But
it started and campacting for a long time, seems it has no end.

[image.png]
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Very high read IO during backfilling

2020-10-29 Thread Eugen Block

Hi,

you could lower the recovery settings to the default and see if that helps:

osd_max_backfills = 1
osd_recovery_max_active = 3


Regards,
Eugen

Zitat von Kamil Szczygieł :


Hi,

We're running Octopus and we've 3 control plane nodes (12 core, 64  
GB memory each) that are running mon, mds and mgr and also 4 data  
nodes (12 core, 256 GB memory, 13x10TB HDDs each). We've increased  
number of PGs inside our pool, which resulted in all OSDs going  
crazy and reading the average of 900 M/s constantly (based on iotop).


This has resulted in slow ops and very low recovery speed. Any tips  
on how to handle this kind of situation? We've  
osd_recovery_sleep_hdd set to 0.2, osd_recovery_max_active set to 5  
and osd_max_backfills set to 4. Some OSDs are reporting slow ops  
constantly and iowait on machines is at 70-80% constantly.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to reset Log Levels

2020-10-29 Thread Patrick Donnelly
On Thu, Oct 29, 2020 at 9:26 AM Ml Ml  wrote:
>
> Hello,
> i played around with some log level i can´t remember and my logs are
> now getting bigger than my DVD-Movie collection.
> E.g.: journalctl -b -u
> ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@mon.ceph03.service >
> out.file is 1,1GB big.
>
> I did already try:
> ceph tell mon.ceph03 config set debug_mon 0/10
> ceph tell mon.ceph03 config set debug_osd 0/10
> ceph tell mon.ceph03 config set debug_mgr 0/10
> ceph tell mon.ceph03 config set "mon_health_to_clog" false
> ceph tell mon.ceph03 config set "mon_health_log_update_period" 30
> ceph tell mon.ceph03 config set "debug_mgr" "0/0"
>
> which made it better, but i really cant remember it all and would like
> to have the default values.
>
> Is there a way to reset those Log Values?

ceph config dump

^ Find changes

ceph config rm ...

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Not all OSDs in rack marked as down when the rack fails

2020-10-29 Thread Wido den Hollander

Hi,

I'm investigating an issue where 4 to 5 OSDs in a rack aren't marked as 
down when the network is cut to that rack.


Situation:

- Nautilus cluster
- 3 racks
- 120 OSDs, 40 per rack

We performed a test where we turned off the network Top-of-Rack for each 
rack. This worked as expected with two racks, but with the third 
something weird happened.


From the 40 OSDs which were supposed to be marked as down only 36 were 
marked as down.


In the end it took 15 minutes for all 40 OSDs to be marked as down.

$ ceph config set mon mon_osd_reporter_subtree_level rack

That setting is set to make sure that we only accept reports from other 
racks.


What we saw in the logs for example:

2020-10-29T03:49:44.409-0400 7fbda185e700 10 
mon.CEPH2-MON1-206-U39@0(leader).osd e107102  osd.51 has 54 reporters, 
239.856038 grace (20.00 + 219.856 + 7.43801e-23), max_failed_since 
2020-10-29T03:47:22.374857-0400


But osd.51 was still not marked as down after 54 reporters have reported 
that it is actually down.


I checked, no ping or other traffic possible to osd.51. Host is unreachable.

Another osd was marked as down, but it took a couple of minutes as well:

2020-10-29T03:50:54.455-0400 7fbda185e700 10 
mon.CEPH2-MON1-206-U39@0(leader).osd e107102  osd.37 has 48 reporters, 
221.378970 grace (20.00 + 201.379 + 6.34437e-23), max_failed_since 
2020-10-29T03:47:12.761584-0400
2020-10-29T03:50:54.455-0400 7fbda185e700  1 
mon.CEPH2-MON1-206-U39@0(leader).osd e107102  we have enough reporters 
to mark osd.37 down


In the end osd.51 was marked as down, but only after the MON decided to 
do so:


2020-10-29T03:53:44.631-0400 7fbda185e700  0 log_channel(cluster) log 
[INF] : osd.51 marked down after no beacon for 903.943390 seconds
2020-10-29T03:53:44.631-0400 7fbda185e700 -1 
mon.CEPH2-MON1-206-U39@0(leader).osd e107104 no beacon from osd.51 since 
2020-10-29T03:38:40.689062-0400, 903.943390 seconds ago.  marking down


I haven't seen this happen before in any cluster. It's also strange that 
this only happens in this rack, the other two racks work fine.


IDCLASS  WEIGHT  TYPE NAME
  -1 1545.35999  root default 

-206  515.12000  rack 206 


  -7   27.94499  host CEPH2-206-U16
...
-207  515.12000  rack 207 


 -17   27.94499  host CEPH2-207-U16
...
-208  515.12000  rack 208 


 -31   27.94499  host CEPH2-208-U16
...

That's how the CRUSHMap looks like. Straight forward and 3x replication 
over 3 racks.


This issue only occurs in rack *207*.

Has anybody seen this before or knows where to start?

Wido
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Not all OSDs in rack marked as down when the rack fails

2020-10-29 Thread Dan van der Ster
Hi Wido,

Could it be one of these?

mon osd min up ratio
mon osd min in ratio

36/120 is 0.3 so it might be one of those magic ratios at play.

Cheers,

Dan


On Thu, 29 Oct 2020, 18:05 Wido den Hollander,  wrote:

> Hi,
>
> I'm investigating an issue where 4 to 5 OSDs in a rack aren't marked as
> down when the network is cut to that rack.
>
> Situation:
>
> - Nautilus cluster
> - 3 racks
> - 120 OSDs, 40 per rack
>
> We performed a test where we turned off the network Top-of-Rack for each
> rack. This worked as expected with two racks, but with the third
> something weird happened.
>
>  From the 40 OSDs which were supposed to be marked as down only 36 were
> marked as down.
>
> In the end it took 15 minutes for all 40 OSDs to be marked as down.
>
> $ ceph config set mon mon_osd_reporter_subtree_level rack
>
> That setting is set to make sure that we only accept reports from other
> racks.
>
> What we saw in the logs for example:
>
> 2020-10-29T03:49:44.409-0400 7fbda185e700 10
> mon.CEPH2-MON1-206-U39@0(leader).osd e107102  osd.51 has 54 reporters,
> 239.856038 grace (20.00 + 219.856 + 7.43801e-23), max_failed_since
> 2020-10-29T03:47:22.374857-0400
>
> But osd.51 was still not marked as down after 54 reporters have reported
> that it is actually down.
>
> I checked, no ping or other traffic possible to osd.51. Host is
> unreachable.
>
> Another osd was marked as down, but it took a couple of minutes as well:
>
> 2020-10-29T03:50:54.455-0400 7fbda185e700 10
> mon.CEPH2-MON1-206-U39@0(leader).osd e107102  osd.37 has 48 reporters,
> 221.378970 grace (20.00 + 201.379 + 6.34437e-23), max_failed_since
> 2020-10-29T03:47:12.761584-0400
> 2020-10-29T03:50:54.455-0400 7fbda185e700  1
> mon.CEPH2-MON1-206-U39@0(leader).osd e107102  we have enough reporters
> to mark osd.37 down
>
> In the end osd.51 was marked as down, but only after the MON decided to
> do so:
>
> 2020-10-29T03:53:44.631-0400 7fbda185e700  0 log_channel(cluster) log
> [INF] : osd.51 marked down after no beacon for 903.943390 seconds
> 2020-10-29T03:53:44.631-0400 7fbda185e700 -1
> mon.CEPH2-MON1-206-U39@0(leader).osd e107104 no beacon from osd.51 since
> 2020-10-29T03:38:40.689062-0400, 903.943390 seconds ago.  marking down
>
> I haven't seen this happen before in any cluster. It's also strange that
> this only happens in this rack, the other two racks work fine.
>
> IDCLASS  WEIGHT  TYPE NAME
>-1 1545.35999  root default
>
> -206  515.12000  rack 206
>
>-7   27.94499  host CEPH2-206-U16
> ...
> -207  515.12000  rack 207
>
>   -17   27.94499  host CEPH2-207-U16
> ...
> -208  515.12000  rack 208
>
>   -31   27.94499  host CEPH2-208-U16
> ...
>
> That's how the CRUSHMap looks like. Straight forward and 3x replication
> over 3 racks.
>
> This issue only occurs in rack *207*.
>
> Has anybody seen this before or knows where to start?
>
> Wido
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Monitor persistently out-of-quorum

2020-10-29 Thread Ki Wong
Thanks, David.

I just double checked and they can all connect to one another,
on both v1 and v2 ports.

-kc

> On Oct 29, 2020, at 12:41 AM, David Caro  wrote:
> 
> On 10/28 17:26, Ki Wong wrote:
>> Hello,
>> 
>> I am at my wit's end.
>> 
>> So I made a mistake in the configuration of my router and one
>> of the monitors (out of 3) dropped out of the quorum and nothing
>> I’ve done allow it to rejoin. That includes reinstalling the
>> monitor with ceph-ansible.
>> 
>> The connectivity issue is fixed. I’ve tested it using “nc” and
>> the host can connect to both port 3300 and 6789 of the other
>> monitors. But the wayward monitor continue to stay out of quorum.
> 
> Just to make sure, have you tried from mon1->mon3, mon2->mon3, mon3->mon1 and
> mon3->mon2?
> 
>> 
>> What is wrong? I see a bunch of “EBUSY” errors in the log, with
>> the message:
>> 
>>  e1 handle_auth_request haven't formed initial quorum, EBUSY
>> 
>> How do I fix this? Any help would be greatly appreciated.
>> 
>> Many thanks,
>> 
>> -kc
>> 
>> 
> 
> -- 
> David Caro

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: monitor sst files continue growing

2020-10-29 Thread Zhenshi Zhou
Hi Alex,

We found that there were a huge number of keys in the "logm" and "osdmap"
table
while using ceph-monstore-tool. I think that could be the root cause.

Well, some pages also say that disable 'insight' module can resolve this
issue, but
I checked our cluster and we didn't enable this module. check this page
.

Anyway, our cluster is unhealthy though, it just need time keep recovering
data :)

Thanks

Alex Gracie  于2020年10月29日周四 下午10:57写道:

> We hit this issue over the weekend on our HDD backed EC Nautilus cluster
> while removing a single OSD. We also did not have any luck using
> compaction. The mon-logs filled up our entire root disk on the mon servers
> and we were running on a single monitor for hours while we tried to finish
> recovery and reclaim space. The past couple weeks we also noticed "pg not
> scubbed in time" errors but are unsure if they are related. I'm still the
> exact cause of this(other than the general misplaced/degraded objects) and
> what kind of growth is acceptable for these store.db files.
>
> In order to get our downed mons restarted, we ended up backing up and
> coping the /var/lib/ceph/mon/* contents to a remote host, setting up an
> sshfs mount to that new host with large NVME and SSDs, ensuring the mount
> paths were owned by ceph, then clearing up enough space on the monitor host
> to start the service. This allowed our store.db directory to grow freely
> until the misplaced/degraded objects could recover and monitors all
> rejoined eventually.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Very high read IO during backfilling

2020-10-29 Thread Kamil Szczygieł
Hi,

We're running Octopus and we've 3 control plane nodes (12 core, 64 GB memory 
each) that are running mon, mds and mgr and also 4 data nodes (12 core, 256 GB 
memory, 13x10TB HDDs each). We've increased number of PGs inside our pool, 
which resulted in all OSDs going crazy and reading the average of 900 M/s 
constantly (based on iotop).

This has resulted in slow ops and very low recovery speed. Any tips on how to 
handle this kind of situation? We've osd_recovery_sleep_hdd set to 0.2, 
osd_recovery_max_active set to 5 and osd_max_backfills set to 4. Some OSDs are 
reporting slow ops constantly and iowait on machines is at 70-80% constantly.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-29 Thread Ing . Luis Felipe Domínguez Vega

Thanks for response...

I dont have the old OSDs (and not backups because this cluster is not so 
important, this is the develop cluster, so the unknown PGs i need to 
delete it (how i can do that?). But i dont want wipe all the Ceph 
cluster, if i can delete the unkown and incomplete PGs, well some data 
will be losted, but not all i think.


i will do that, minimize the replicates copies and stabilize.

El 2020-10-29 13:11, Frank Schilder escribió:

... i will use now only one site, but need first stabilice the
cluster to remove the EC erasure coding and use replicate ...


If you change to one site only, there is no point in getting rid of
the EC pool. Your main problem will be restoring the lost data. Do you
have backup of everything? Do you still have the old OSDs? You never
answered these questions.

To give you an idea why this is important, with ceph, loosing 1% of
data on an rbd pool does *not* mean you loose 1% of the disks. It
means that, on average, every disk looses 1% of its blocks. In other
words, getting everything up again will be a lot of work either way.

The best path to follow is what Eugen suggested: add mons to have at
least 3 and dig out the old disks to be able to export and import PGs.
Look at Eugen's last 2 e-mails, its a starting point. You might be
able to recover more by reducing temporarily min_size to 1 on the
replicated pools and to 4 on the EC pool. If possible, make sure there
is no client access during that time. The missing rest needs to be
scraped off the OSDs you deleted from the cluster.

If you have backup of everything, starting from scratch and populating
the ceph cluster from backup might be the fastest option.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: 28 October 2020 07:23:09
To: Ing. Luis Felipe Domínguez Vega
Cc: Ceph Users
Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT]

If you have that many spare hosts I would recommend to deploy two more
MONs on them, and probably also additional MGRs so they can failover.

What is the EC profile for the data_storage pool?

Can you also share

ceph pg dump pgs | grep -v "active+clean"

to see which PGs are affected.
The remaining issue with unfound objects and unkown PGs could be
because you removed OSDs. That could mean data loss, but maybe there's
a chance to recover anyway.


Zitat von "Ing. Luis Felipe Domínguez Vega" :


Well recovering not working yet... i was started 6 servers more and
the cluster not yet recovered.
Ceph status not show any recover progress

ceph -s : https://pastebin.ubuntu.com/p/zRQPbvGzbw/
ceph osd tree   : https://pastebin.ubuntu.com/p/sTDs8vd7Sk/
ceph osd df : https://pastebin.ubuntu.com/p/ysbh8r2VVz/
ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
crush rules : (ceph osd crush rule dump)
https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/

El 2020-10-27 09:59, Eugen Block escribió:

Your pool 'data_storage' has a size of 7 (or 7 chunks since it's
erasure-coded) and the rule requires each chunk on a different host
but you currently have only 5 hosts available, that's why the 
recovery

is not progressing. It's waiting for two more hosts. Unfortunately,
you can't change the EC profile or the rule of that pool. I'm not 
sure

if it would work in the current cluster state, but if you can't add
two more hosts (which would be your best option for recovery) it 
might

be possible to create a new replicated pool (you seem to have enough
free space) and copy the contents from that EC pool. But as I said,
I'm not sure if that would work in a degraded state, I've never tried
that.

So your best bet is to get two more hosts somehow.



pool 4 'data_storage' erasure profile desoft size 7 min_size 5
crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32
autoscale_mode off last_change 154384 lfor 0/121016/121014 flags
hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384
application rbd



Zitat von "Ing. Luis Felipe Domínguez Vega" 
:



Needed data:

ceph -s : https://pastebin.ubuntu.com/p/S9gKjyZtdK/
ceph osd tree   : https://pastebin.ubuntu.com/p/SCZHkk6Mk4/
ceph osd df : (later, because i'm waiting since 10
minutes and not output yet)
ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
crush rules : (ceph osd crush rule dump)
https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/

El 2020-10-27 07:14, Eugen Block escribió:

I understand, but i delete the OSDs from CRUSH map, so ceph
don't   wait for these OSDs, i'm right?


It depends on your actual crush tree and rules. Can you share 
(maybe

you already did)

ceph osd tree
ceph osd df
ceph osd pool ls detail

and a dump of your crush rules?

As I already said, if you have rules in place that distribute data
across 2 DCs and one of them is down the PGs will never recover 
even

if you delete the OSDs from the failed 

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-29 Thread Ing . Luis Felipe Domínguez Vega

Uff.. now two of the OSD are crashing with...

https://pastebin.ubuntu.com/p/qd6Tc2rpfm/

El 2020-10-29 13:11, Frank Schilder escribió:

... i will use now only one site, but need first stabilice the
cluster to remove the EC erasure coding and use replicate ...


If you change to one site only, there is no point in getting rid of
the EC pool. Your main problem will be restoring the lost data. Do you
have backup of everything? Do you still have the old OSDs? You never
answered these questions.

To give you an idea why this is important, with ceph, loosing 1% of
data on an rbd pool does *not* mean you loose 1% of the disks. It
means that, on average, every disk looses 1% of its blocks. In other
words, getting everything up again will be a lot of work either way.

The best path to follow is what Eugen suggested: add mons to have at
least 3 and dig out the old disks to be able to export and import PGs.
Look at Eugen's last 2 e-mails, its a starting point. You might be
able to recover more by reducing temporarily min_size to 1 on the
replicated pools and to 4 on the EC pool. If possible, make sure there
is no client access during that time. The missing rest needs to be
scraped off the OSDs you deleted from the cluster.

If you have backup of everything, starting from scratch and populating
the ceph cluster from backup might be the fastest option.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: 28 October 2020 07:23:09
To: Ing. Luis Felipe Domínguez Vega
Cc: Ceph Users
Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT]

If you have that many spare hosts I would recommend to deploy two more
MONs on them, and probably also additional MGRs so they can failover.

What is the EC profile for the data_storage pool?

Can you also share

ceph pg dump pgs | grep -v "active+clean"

to see which PGs are affected.
The remaining issue with unfound objects and unkown PGs could be
because you removed OSDs. That could mean data loss, but maybe there's
a chance to recover anyway.


Zitat von "Ing. Luis Felipe Domínguez Vega" :


Well recovering not working yet... i was started 6 servers more and
the cluster not yet recovered.
Ceph status not show any recover progress

ceph -s : https://pastebin.ubuntu.com/p/zRQPbvGzbw/
ceph osd tree   : https://pastebin.ubuntu.com/p/sTDs8vd7Sk/
ceph osd df : https://pastebin.ubuntu.com/p/ysbh8r2VVz/
ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
crush rules : (ceph osd crush rule dump)
https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/

El 2020-10-27 09:59, Eugen Block escribió:

Your pool 'data_storage' has a size of 7 (or 7 chunks since it's
erasure-coded) and the rule requires each chunk on a different host
but you currently have only 5 hosts available, that's why the 
recovery

is not progressing. It's waiting for two more hosts. Unfortunately,
you can't change the EC profile or the rule of that pool. I'm not 
sure

if it would work in the current cluster state, but if you can't add
two more hosts (which would be your best option for recovery) it 
might

be possible to create a new replicated pool (you seem to have enough
free space) and copy the contents from that EC pool. But as I said,
I'm not sure if that would work in a degraded state, I've never tried
that.

So your best bet is to get two more hosts somehow.



pool 4 'data_storage' erasure profile desoft size 7 min_size 5
crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32
autoscale_mode off last_change 154384 lfor 0/121016/121014 flags
hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384
application rbd



Zitat von "Ing. Luis Felipe Domínguez Vega" 
:



Needed data:

ceph -s : https://pastebin.ubuntu.com/p/S9gKjyZtdK/
ceph osd tree   : https://pastebin.ubuntu.com/p/SCZHkk6Mk4/
ceph osd df : (later, because i'm waiting since 10
minutes and not output yet)
ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
crush rules : (ceph osd crush rule dump)
https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/

El 2020-10-27 07:14, Eugen Block escribió:

I understand, but i delete the OSDs from CRUSH map, so ceph
don't   wait for these OSDs, i'm right?


It depends on your actual crush tree and rules. Can you share 
(maybe

you already did)

ceph osd tree
ceph osd df
ceph osd pool ls detail

and a dump of your crush rules?

As I already said, if you have rules in place that distribute data
across 2 DCs and one of them is down the PGs will never recover 
even

if you delete the OSDs from the failed DC.



Zitat von "Ing. Luis Felipe Domínguez Vega" 
:



I understand, but i delete the OSDs from CRUSH map, so ceph
don't   wait for these OSDs, i'm right?

El 2020-10-27 04:06, Eugen Block escribió:

Hi,

just to clarify so I don't miss anything: you have two DCs and 
one of

them is down. And two of the MONs were i

[ceph-users] Re: frequent Monitor down

2020-10-29 Thread Tony Liu
Typically, the number of nodes is 2n+1 to cover n failures.
It's OK to have 4 nodes, from failure covering POV, it's the same
as 3 nodes. 4 nodes will cover 1 failure. If 2 nodes down, the
cluster is down. It works, just not make much sense.

Thanks!
Tony
> -Original Message-
> From: Marc Roos 
> Sent: Thursday, October 29, 2020 1:42 AM
> To: andrei ; eblock 
> Cc: ceph-users 
> Subject: [ceph-users] Re: frequent Monitor down
> 
> Really? First time I read this here, afaik you can get a split brain
> like this.
> 
> 
> 
> -Original Message-
> Sent: Thursday, October 29, 2020 12:16 AM
> To: Eugen Block
> Cc: ceph-users
> Subject: [ceph-users] Re: frequent Monitor down
> 
> Eugen, I've got four physical servers and I've installed mon on all of
> them. I've discussed it with Wido and a few other chaps from ceph and
> there is no issue in doing it. The quorum issues would happen if you
> have 2 mons. If you've got more than 2 you should be fine.
> 
> Andrei
> 
> - Original Message -
> > From: "Eugen Block" 
> > To: "Andrei Mikhailovsky" 
> > Cc: "ceph-users" 
> > Sent: Wednesday, 28 October, 2020 20:19:15
> > Subject: Re: [ceph-users] Re: frequent Monitor down
> 
> > Why do you have 4 MONs in the first place? That way a quorum is
> > difficult to achieve, could it be related to that?
> >
> > Zitat von Andrei Mikhailovsky :
> >
> >> Yes, I have, Eugen, I see no obvious reason / error / etc. I see a
> >> lot of entries relating to Compressing as well as monitor going down.
> >>
> >> Andrei
> >>
> >>
> >>
> >> - Original Message -
> >>> From: "Eugen Block" 
> >>> To: "ceph-users" 
> >>> Sent: Wednesday, 28 October, 2020 11:51:20
> >>> Subject: [ceph-users] Re: frequent Monitor down
> >>
> >>> Have you looked into syslog and mon logs?
> >>>
> >>>
> >>> Zitat von Andrei Mikhailovsky :
> >>>
>  Hello everyone,
> 
>  I am having regular messages that the Monitors are going down and
> up:
> 
>  2020-10-27T09:50:49.032431+ mon .arh-ibstorage2-ib ( mon .1)
>  2248 : cluster [WRN] Health check failed: 1/4 mons down, quorum
>  arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)
>  2020-10-27T09:50:49.123511+ mon .arh-ibstorage2-ib ( mon .1)
>  2250 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing
>  BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum
>  arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout
>  flag(s) set; 43 pgs not deep-scrubbed in time; 12 pgs not scrubbed
>  in time 2020-10-27T09:50:52.735457+ mon .arh-ibstorage1-ib (
>  mon .0)
>  31287 : cluster [INF] Health check cleared: MON_DOWN (was: 1/4 mons
> 
>  down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib)
>  2020-10-27T12:35:20.556458+ mon .arh-ibstorage2-ib ( mon .1)
>  2260 : cluster [WRN] Health check failed: 1/4 mons down, quorum
>  arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN)
>  2020-10-27T12:35:20.643282+ mon .arh-ibstorage2-ib ( mon .1)
>  2262 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing
>  BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum
>  arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout
>  flag(s) set; 47 pgs not deep-scrubbed in time; 14 pgs not scrubbed
>  in time
> 
> 
>  This happens on a daily basis several times a day.
> 
>  Could you please let me know how to fix this annoying problem?
> 
>  I am running ceph version 15.2.4
>  (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) on
>  Ubuntu 18.04 LTS with latest updates.
> 
>  Thanks
> 
>  Andrei
>  ___
>  ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send
>  an email to ceph-users-le...@ceph.io
> >>>
> >>>
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> > >> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to reset Log Levels

2020-10-29 Thread Ml Ml
Hello,
i played around with some log level i can´t remember and my logs are
now getting bigger than my DVD-Movie collection.
E.g.: journalctl -b -u
ceph-5436dd5d-83d4-4dc8-a93b-60ab5db145df@mon.ceph03.service >
out.file is 1,1GB big.

I did already try:
ceph tell mon.ceph03 config set debug_mon 0/10
ceph tell mon.ceph03 config set debug_osd 0/10
ceph tell mon.ceph03 config set debug_mgr 0/10
ceph tell mon.ceph03 config set "mon_health_to_clog" false
ceph tell mon.ceph03 config set "mon_health_log_update_period" 30
ceph tell mon.ceph03 config set "debug_mgr" "0/0"

which made it better, but i really cant remember it all and would like
to have the default values.

Is there a way to reset those Log Values?

Cheers,
Michael
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: frequent Monitor down

2020-10-29 Thread Janne Johansson
Den tors 29 okt. 2020 kl 20:16 skrev Tony Liu :

> Typically, the number of nodes is 2n+1 to cover n failures.
> It's OK to have 4 nodes, from failure covering POV, it's the same
> as 3 nodes. 4 nodes will cover 1 failure. If 2 nodes down, the
> cluster is down. It works, just not make much sense.
>
>
Well, you can see it the other way around, with 3 configured mons, and only
2 up, you know you have a majority and can go on with writes.
With 4 configured mons and only 2 up, it stops because you get the split
brain scenario. For a 2DC setup with 2 mons at each place, a split is still
fatal.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-29 Thread Frank Schilder
> ... i will use now only one site, but need first stabilice the
> cluster to remove the EC erasure coding and use replicate ...

If you change to one site only, there is no point in getting rid of the EC 
pool. Your main problem will be restoring the lost data. Do you have backup of 
everything? Do you still have the old OSDs? You never answered these questions.

To give you an idea why this is important, with ceph, loosing 1% of data on an 
rbd pool does *not* mean you loose 1% of the disks. It means that, on average, 
every disk looses 1% of its blocks. In other words, getting everything up again 
will be a lot of work either way.

The best path to follow is what Eugen suggested: add mons to have at least 3 
and dig out the old disks to be able to export and import PGs. Look at Eugen's 
last 2 e-mails, its a starting point. You might be able to recover more by 
reducing temporarily min_size to 1 on the replicated pools and to 4 on the EC 
pool. If possible, make sure there is no client access during that time. The 
missing rest needs to be scraped off the OSDs you deleted from the cluster.

If you have backup of everything, starting from scratch and populating the ceph 
cluster from backup might be the fastest option.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Eugen Block 
Sent: 28 October 2020 07:23:09
To: Ing. Luis Felipe Domínguez Vega
Cc: Ceph Users
Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT]

If you have that many spare hosts I would recommend to deploy two more
MONs on them, and probably also additional MGRs so they can failover.

What is the EC profile for the data_storage pool?

Can you also share

ceph pg dump pgs | grep -v "active+clean"

to see which PGs are affected.
The remaining issue with unfound objects and unkown PGs could be
because you removed OSDs. That could mean data loss, but maybe there's
a chance to recover anyway.


Zitat von "Ing. Luis Felipe Domínguez Vega" :

> Well recovering not working yet... i was started 6 servers more and
> the cluster not yet recovered.
> Ceph status not show any recover progress
>
> ceph -s : https://pastebin.ubuntu.com/p/zRQPbvGzbw/
> ceph osd tree   : https://pastebin.ubuntu.com/p/sTDs8vd7Sk/
> ceph osd df : https://pastebin.ubuntu.com/p/ysbh8r2VVz/
> ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
> crush rules : (ceph osd crush rule dump)
> https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/
>
> El 2020-10-27 09:59, Eugen Block escribió:
>> Your pool 'data_storage' has a size of 7 (or 7 chunks since it's
>> erasure-coded) and the rule requires each chunk on a different host
>> but you currently have only 5 hosts available, that's why the recovery
>> is not progressing. It's waiting for two more hosts. Unfortunately,
>> you can't change the EC profile or the rule of that pool. I'm not sure
>> if it would work in the current cluster state, but if you can't add
>> two more hosts (which would be your best option for recovery) it might
>> be possible to create a new replicated pool (you seem to have enough
>> free space) and copy the contents from that EC pool. But as I said,
>> I'm not sure if that would work in a degraded state, I've never tried
>> that.
>>
>> So your best bet is to get two more hosts somehow.
>>
>>
>>> pool 4 'data_storage' erasure profile desoft size 7 min_size 5
>>> crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32
>>> autoscale_mode off last_change 154384 lfor 0/121016/121014 flags
>>> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384
>>> application rbd
>>
>>
>> Zitat von "Ing. Luis Felipe Domínguez Vega" :
>>
>>> Needed data:
>>>
>>> ceph -s : https://pastebin.ubuntu.com/p/S9gKjyZtdK/
>>> ceph osd tree   : https://pastebin.ubuntu.com/p/SCZHkk6Mk4/
>>> ceph osd df : (later, because i'm waiting since 10
>>> minutes and not output yet)
>>> ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/
>>> crush rules : (ceph osd crush rule dump)
>>> https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/
>>>
>>> El 2020-10-27 07:14, Eugen Block escribió:
> I understand, but i delete the OSDs from CRUSH map, so ceph
> don't   wait for these OSDs, i'm right?

 It depends on your actual crush tree and rules. Can you share (maybe
 you already did)

 ceph osd tree
 ceph osd df
 ceph osd pool ls detail

 and a dump of your crush rules?

 As I already said, if you have rules in place that distribute data
 across 2 DCs and one of them is down the PGs will never recover even
 if you delete the OSDs from the failed DC.



 Zitat von "Ing. Luis Felipe Domínguez Vega" :

> I understand, but i delete the OSDs from CRUSH map, so ceph
> don't   wait for these OSDs, i'm right?
>
> El 2020-10-27 04:06, Eugen Block escribió:
>> Hi,
>

[ceph-users] Re: pgs stuck backfill_toofull

2020-10-29 Thread Stefan Kooman
On 2020-10-29 06:55, Mark Johnson wrote:
> I've been struggling with this one for a few days now.  We had an OSD report 
> as near full a few days ago.  Had this happen a couple of times before and a 
> reweight-by-utilization has sorted it out in the past.  Tried the same again 
> but this time we ended up with a couple of pgs in a state of backfill_toofull 
> and a handful of misplaced objects as a result.

Consider upgrading to luminous (and then later nautilus).

Why? There you can use ceph balancer in upmap mode (at least when your
clients are new enough). No need to do any manual re weighting anymore.

^^ this besides the tips Frank gave you.

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] bluefs mount failed(crash) after a long time

2020-10-29 Thread Elians Wan
Anyone can help? Bluefs mount failed after a long time
The error message:

2020-10-30 05:33:54.906725 7f1ad73f5e00 1 bluefs add_block_device bdev 1
path /var/lib/ceph/osd/ceph-30/block size 7.28TiB
2020-10-30 05:33:54.906758 7f1ad73f5e00 1 bluefs mount
2020-10-30 06:00:32.881850 7f1ad73f5e00 -1 *** Caught signal (Segmentation
fault) **
 in thread 7f1ad73f5e00 thread_name:ceph-osd ceph version 12.2.12
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)
 1: (()+0xaa2044) [0x5570d12af044]
 2: (()+0x11390) [0x7f1ad56d2390]
 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned
long, unsigned long, ceph::buffer::list*, char*)+0xad4) [0x5570d125ea34]
 4: (BlueFS::_replay(bool)+0x409) [0x5570d1267599]
 5: (BlueFS::mount()+0x209) [0x5570d126b659]
 6: (BlueStore::_open_db(bool)+0x169c) [0x5570d117acdc]
 7: (BlueStore::_mount(bool)+0x3ad) [0x5570d11aeded]
 8: (OSD::init()+0x3e2) [0x5570d0d00f12]
 9: (main()+0x2f0a) [0x5570d0c0a0ca]
 10: (__libc_start_main()+0xf0) [0x7f1ad4658830]
 11: (_start()+0x29) [0x5570d0c97329]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Fix PGs states

2020-10-29 Thread Ing . Luis Felipe Domínguez Vega

Hi:

I have this ceph status:
-
cluster:
id: 039bf268-b5a6-11e9-bbb7-d06726ca4a78
health: HEALTH_WARN
noout flag(s) set
1 osds down
Reduced data availability: 191 pgs inactive, 2 pgs down, 35 
pgs incomplete, 290 pgs stale

5 pgs not deep-scrubbed in time
7 pgs not scrubbed in time
327 slow ops, oldest one blocked for 233398 sec, daemons 
[osd.12,osd.36,osd.5] have slow ops.


  services:
mon: 1 daemons, quorum fond-beagle (age 23h)
mgr: fond-beagle(active, since 7h)
osd: 48 osds: 45 up (since 95s), 46 in (since 8h); 4 remapped pgs
 flags noout

  data:
pools:   7 pools, 2305 pgs
objects: 350.37k objects, 1.5 TiB
usage:   3.0 TiB used, 38 TiB / 41 TiB avail
pgs: 6.681% pgs unknown
 1.605% pgs not active
 1835 active+clean
 279  stale+active+clean
 154  unknown
 22   incomplete
 10   stale+incomplete
 2down
 2remapped+incomplete
 1stale+remapped+incomplete


How can i fix all of unknown, incomplete, remmaped+incomplete, etc... i 
dont care if i need remove PGs

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fix PGs states

2020-10-29 Thread 胡 玮文
Hi,

I have not tried, but maybe this will help with the unknown PGs, if you don’t 
care any data loss.


ceph osd force-create-pg 


在 2020年10月30日,10:46,Ing. Luis Felipe Domínguez Vega  
写道:

Hi:

I have this ceph status:
-
cluster:
   id: 039bf268-b5a6-11e9-bbb7-d06726ca4a78
   health: HEALTH_WARN
   noout flag(s) set
   1 osds down
   Reduced data availability: 191 pgs inactive, 2 pgs down, 35 pgs 
incomplete, 290 pgs stale
   5 pgs not deep-scrubbed in time
   7 pgs not scrubbed in time
   327 slow ops, oldest one blocked for 233398 sec, daemons 
[osd.12,osd.36,osd.5] have slow ops.

 services:
   mon: 1 daemons, quorum fond-beagle (age 23h)
   mgr: fond-beagle(active, since 7h)
   osd: 48 osds: 45 up (since 95s), 46 in (since 8h); 4 remapped pgs
flags noout

 data:
   pools:   7 pools, 2305 pgs
   objects: 350.37k objects, 1.5 TiB
   usage:   3.0 TiB used, 38 TiB / 41 TiB avail
   pgs: 6.681% pgs unknown
1.605% pgs not active
1835 active+clean
279  stale+active+clean
154  unknown
22   incomplete
10   stale+incomplete
2down
2remapped+incomplete
1stale+remapped+incomplete


How can i fix all of unknown, incomplete, remmaped+incomplete, etc... i dont 
care if i need remove PGs
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fix PGs states

2020-10-29 Thread Ing . Luis Felipe Domínguez Vega
Great and thanks, i fixed all unknowns with the command, now left the 
incomplete, down, etc.


El 2020-10-29 23:57, 胡 玮文 escribió:

Hi,

I have not tried, but maybe this will help with the unknown PGs, if
you don’t care any data loss.


ceph osd force-create-pg 


在 2020年10月30日,10:46,Ing. Luis Felipe Domínguez Vega
 写道:

Hi:

I have this ceph status:
-
cluster:
   id: 039bf268-b5a6-11e9-bbb7-d06726ca4a78
   health: HEALTH_WARN
   noout flag(s) set
   1 osds down
   Reduced data availability: 191 pgs inactive, 2 pgs down, 35
pgs incomplete, 290 pgs stale
   5 pgs not deep-scrubbed in time
   7 pgs not scrubbed in time
   327 slow ops, oldest one blocked for 233398 sec, daemons
[osd.12,osd.36,osd.5] have slow ops.

 services:
   mon: 1 daemons, quorum fond-beagle (age 23h)
   mgr: fond-beagle(active, since 7h)
   osd: 48 osds: 45 up (since 95s), 46 in (since 8h); 4 remapped pgs
flags noout

 data:
   pools:   7 pools, 2305 pgs
   objects: 350.37k objects, 1.5 TiB
   usage:   3.0 TiB used, 38 TiB / 41 TiB avail
   pgs: 6.681% pgs unknown
1.605% pgs not active
1835 active+clean
279  stale+active+clean
154  unknown
22   incomplete
10   stale+incomplete
2down
2remapped+incomplete
1stale+remapped+incomplete


How can i fix all of unknown, incomplete, remmaped+incomplete, etc...
i dont care if i need remove PGs
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MDS restarts after enabling msgr2

2020-10-29 Thread Stefan Kooman
Hi List,

After a successful upgrade from Mimic 13.2.8 to Nautilus 14.2.12 we
enabled msgr2. Soon after that both of the MDS servers (active /
active-standby) restarted.

We did not hit any ASSERTS this time, so that's good :>.

However, I have not seen this happening on four different test clusters
(while running a slightly older Nautilus release), so I certainly did
not expect that.

Most of the connections switched over to 3300 (apart from the cephfs
kernel clients) and that all kept on working.

Anybody else has seen this behavior before?

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Corrupted RBD image

2020-10-29 Thread Ing . Luis Felipe Domínguez Vega

Hi:

I tried get info from a RBD image but:

-
root@fond-beagle:/# rbd list --pool cinder-ceph | grep 
volume-dfcca6c8-cb96-4b79-bc85-b200a061dcda

volume-dfcca6c8-cb96-4b79-bc85-b200a061dcda


root@fond-beagle:/# rbd info --pool cinder-ceph 
volume-dfcca6c8-cb96-4b79-bc85-b200a061dcda
rbd: error opening image volume-dfcca6c8-cb96-4b79-bc85-b200a061dcda: 
(2) No such file or directory

--

THis is that the metadata show the image but the content was removed?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io