h-users] Re: Huge HDD ceph monitor usage [EXT]
If you have that many spare hosts I would recommend to deploy two more
MONs on them, and probably also additional MGRs so they can failover.
What is the EC profile for the data_storage pool?
Can you also share
ceph pg dump pgs | grep -v "active+clea
From: Eugen Block
Sent: 28 October 2020 07:23:09
To: Ing. Luis Felipe Domínguez Vega
Cc: Ceph Users
Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT]
If you have that many spare hosts I would recommend to deploy two more
MONs on them, and probably also additional MGRs so they can fai
m S14
From: Eugen Block
Sent: 28 October 2020 07:23:09
To: Ing. Luis Felipe Domínguez Vega
Cc: Ceph Users
Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT]
If you have that many spare hosts I would recommend to deploy two more
MONs on them, and probably also additional MG
28 October 2020 05:14:27
To: Eugen Block
Cc: Ceph Users
Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT]
Well recovering not working yet... i was started 6 servers more and the
cluster not yet recovered.
Ceph status not show any recover progress
ceph -s : https://past
z Vega
Sent: 28 October 2020 05:14:27
To: Eugen Block
Cc: Ceph Users
Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT]
Well recovering not working yet... i was started 6 servers more and the
cluster not yet recovered.
Ceph status not show any recover progress
ceph -s
EC profile: https://pastebin.ubuntu.com/p/kjbdQXbs85/
ceph pg dump pgs | grep -v "active+clean":
https://pastebin.ubuntu.com/p/g6TdZXNXBR/
El 2020-10-28 02:23, Eugen Block escribió:
If you have that many spare hosts I would recommend to deploy two more
MONs on them, and probably also addition
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Ing. Luis Felipe Domínguez Vega
Sent: 28 October 2020 05:14:27
To: Eugen Block
Cc: Ceph Users
Subject: [ceph-users] Re: Huge HDD ceph monitor usage [EXT]
Well recovering not working yet... i was sta
If you have that many spare hosts I would recommend to deploy two more
MONs on them, and probably also additional MGRs so they can failover.
What is the EC profile for the data_storage pool?
Can you also share
ceph pg dump pgs | grep -v "active+clean"
to see which PGs are affected.
The remai
Well recovering not working yet... i was started 6 servers more and the
cluster not yet recovered.
Ceph status not show any recover progress
ceph -s : https://pastebin.ubuntu.com/p/zRQPbvGzbw/
ceph osd tree : https://pastebin.ubuntu.com/p/sTDs8vd7Sk/
ceph osd df
Well. 7 hosts up and recovery start and stop in 3 hours more or less,
now the cluster is not recovering any more... can be that needs more
hosts?
El 2020-10-27 13:58, Eugen Block escribió:
Hm, that would be new to me that the mgr service is required for
recovery, but maybe I missed something a
Your pool 'data_storage' has a size of 7 (or 7 chunks since it's
erasure-coded) and the rule requires each chunk on a different host
but you currently have only 5 hosts available, that's why the recovery
is not progressing. It's waiting for two more hosts. Unfortunately,
you can't change th
Needed data:
ceph -s : https://pastebin.ubuntu.com/p/S9gKjyZtdK/
ceph osd tree : https://pastebin.ubuntu.com/p/SCZHkk6Mk4/
ceph osd df : (later, because i'm waiting since 10 minutes
and not output yet)
ceph osd pool ls detail : https://pastebin.ubuntu.com/p
I understand, but i delete the OSDs from CRUSH map, so ceph don't
wait for these OSDs, i'm right?
It depends on your actual crush tree and rules. Can you share (maybe
you already did)
ceph osd tree
ceph osd df
ceph osd pool ls detail
and a dump of your crush rules?
As I already said, if y
I understand, but i delete the OSDs from CRUSH map, so ceph don't wait
for these OSDs, i'm right?
El 2020-10-27 04:06, Eugen Block escribió:
Hi,
just to clarify so I don't miss anything: you have two DCs and one of
them is down. And two of the MONs were in that failed DC? Now you
removed all O
Hi,
just to clarify so I don't miss anything: you have two DCs and one of
them is down. And two of the MONs were in that failed DC? Now you
removed all OSDs and two MONs from the failed DC hoping that your
cluster will recover? If you have reasonable crush rules in place
(e.g. to recover
The ceph mon logs... many of this unstoppable on my log:
--
2020-10-26T15:40:28.875729-0400 osd.23 [WRN] slow request
osd_op(client.86168166.0:9023356 5.56 5.1cd5a6d6 (undecoded)
ondisk+retry+write+known_if_redirected e159644) initiated
2020-
I was 3 mons, but i have 2 physical datacenters, one of them breaks with
not short term fix, so i remove all osds and ceph mon (2 of them) and
now i have only the osds of 1 datacenter with the monitor. I was stopped
the ceph manager, but i was see that when i restart a ceph manager then
ceph -s
The recovery process (ceph -s) is independent of the MGR service but
only depends on the MON service. It seems you only have the one MON,
if the MGR is overloading it (not clear why) it could help to leave
MGR off and see if the MON service then has enough RAM to proceed with
the recovery.
El 2020-10-26 15:16, Eugen Block escribió:
You could stop the MGRs and wait for the recovery to finish, MGRs are
not a critical component. You won’t have a dashboard or metrics
during/of that time but it would prevent the high RAM usage.
Zitat von "Ing. Luis Felipe Domínguez Vega" :
El 2020-10
You could stop the MGRs and wait for the recovery to finish, MGRs are
not a critical component. You won’t have a dashboard or metrics
during/of that time but it would prevent the high RAM usage.
Zitat von "Ing. Luis Felipe Domínguez Vega" :
El 2020-10-26 12:23, 胡 玮文 escribió:
在 2020年10月26日,
El 2020-10-26 12:23, 胡 玮文 escribió:
在 2020年10月26日,23:29,Ing. Luis Felipe Domínguez Vega
写道:
mgr: fond-beagle(active, since 39s)
Your manager seems crash looping, it only started since 39s. Looking
at mgr logs may help you identify why your cluster is not recovering.
You may hit some bug in m
> 在 2020年10月26日,23:29,Ing. Luis Felipe Domínguez Vega
> 写道:
>
> mgr: fond-beagle(active, since 39s)
Your manager seems crash looping, it only started since 39s. Looking at mgr
logs may help you identify why your cluster is not recovering. You may hit some
bug in mgr.
_
Exactly the cluster is recovering from a huge break, but i dont see any
progress on "recovering", not show the progress of recovering
--
cluster:
id: 039bf268-b5a6-11e9-bbb7-d06726ca4a78
h
On 26/10/2020 14:13, Ing. Luis Felipe Domínguez Vega wrote:
How can i free the store of ceph monitor?:
root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle# du -h -d1
542G ./store.db
542G .
Only 9M
--
root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle/store.db# ls -lh
*.log
-rw--- 1 ceph ceph 9.9M Oct 26 10:57 1443554.log
---
25 matches
Mail list logo