aller pools in a RGW (like log and meta).
How long ago did you recreate the earliest OSD?
Cheers
Boris
Am Di., 8. März 2022 um 10:03 Uhr schrieb Francois Legrand
:
Hi,
We also had this kind of problems after upgrading to octopus.
Maybe you
can play with the hearthbeat grace
Hi,
We also had this kind of problems after upgrading to octopus. Maybe you
can play with the hearthbeat grace time (
https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/
) to tell osds to wait a little more before declaring another osd down !
We also try to fix the problem
No, our osds are hdd (no ssd) and we have everything (data and metadata)
on them (no nvme).
Le 17/11/2021 à 16:49, Arthur Outhenin-Chalandre a écrit :
Hi,
On 11/17/21 16:09, Francois Legrand wrote:
Now we are investingating this snapshot issue and I noticed that as
long as we remove one
Hello,
We recently upgraded our ceph+cephfs cluster from nautilus to octopus.
After the upgrade, we noticed that removal of snapshots was causing a
lot of problems (lot of slow ops, osd marked down, crashs etc...) so we
suspended the snapshots for a while so the cluster get stable again for
m
Hi Franck,
I totally agree with your point 3 (also with 1 and 2 indeed). Generally
speaking, the release cycle of many softwares tends to become faster and
faster (not only for ceph, but also openstack etc...) and it's really
hard and tricky to maintain an infrastructure up to date in such
co
Le 06/11/2021 à 16:57, Francois Legrand a écrit :
Hi,
Can you confirm that the changing bluefs_buffered_io to true solved
your problem ?
Because I have a rather similar problem. My Nautilus cluster was with
bluefs_buffered_io = false. It was working (even with snaptrim lasting
a lot, i.e
Hello,
Can you confirm that the bug only affects pacific and not octopus ?
Thanks.
F.
Le 29/10/2021 à 16:39, Neha Ojha a écrit :
On Thu, Oct 28, 2021 at 8:11 AM Igor Fedotov wrote:
On 10/28/2021 12:36 AM, mgrzybowski wrote:
Hi Igor
I'm very happy that You ware able to reproduce and find t
tu",
"os_name": "Ubuntu",
"os_version": "20.04.3 LTS (Focal Fossa)",
"os_version_id": "20.04",
"process_name": "ceph-mgr",
"stack_sig":
"9a65d0019b8102fdaee8fd29c30e3aef3b86660d33fc6cd9
Hi,
I am testing an upgrade (from 14.2.16 to 16.2.5) on my ceph test
cluster (bar metal).
I noticed (when reaching the mds upgrade) that after I stopped all the
mds, opening the "file system" page on the dashboard result in a crash
of the dashboard (and also of the mgr). Does someone had th
Hello everybody,
I have a "stupid" question. Why is it recommended in the docs to set the
osd flag to noout during an upgrade/maintainance (and especially during
an osd upgrade/maintainance) ?
In my understanding, if an osd goes down, after a while (600s by
default) it's marked out and the c
m": since there are just 2 rooms vs. 3 replicas, it
doesn't allow you to create a pool with a rule that might not
optimally work (keep in mind that Dashboard tries to perform some
extra validations compared to the Ceph CLI).
Kind Regards,
Ernesto
On Thu, Sep 9, 2021 at 12:29 PM Fran
Hi all,
I have a test ceph cluster with 4 osd servers containing each 3 osds.
The crushmap uses 2 rooms with 2 servers in each room.
We use replica 3 for pools.
I have the following custom crushrule to ensure that I have at least one
copy of each data in each room.
rule replicated3over2room
-users] Re: Howto upgrade AND change distro
To: ceph-users@ceph.io
Message-ID: <654262bf-b621-d534-7067-62a3a2abb...@wikimedia.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Hi,
On 27/08/2021 16:16, Francois Legrand wrote:
We are running a ceph nautilus cluster under centos
Hello,
We are running a ceph nautilus cluster under centos 7. To upgrade to
pacific we need to change to a more recent distro (probably debian or
ubuntu because of the recent announcement about centos 8, but the distro
doesn't matter very much).
However, I could'nt find a clear procedure to
3072 96 96 96 192 96 192 |
F.
Le 01/02/2021 à 10:26, Dan van der Ster a écrit :
On Mon, Feb 1, 2021 at 10:03 AM Francois Legrand wrote:
Hi,
Actually we have no EC pools... all are replica 3. And we have only 9 pools.
The average number og pg/osd is not very high (40.6).
Here is the
2 on most.
(this is a cluster with the balancer disabled).
The other explanation I can think of is that you have relatively wide
EC pools and few hosts. In that case there would be very little that
the balancer could do to flatten the distribution.
If in doubt, please share your pool details and cr
the upmap_max_deviation from the
default of 5. On our clusters we do:
ceph config set mgr mgr/balancer/upmap_max_deviation 1
Cheers, Dan
On Fri, Jan 29, 2021 at 11:25 PM Francois Legrand wrote:
Hi Dan,
Here is the output of ceph balancer status :
/ceph balancer status//
/
You might need simply to decrease the upmap_max_deviation from the
default of 5. On our clusters we do:
ceph config set mgr mgr/balancer/upmap_max_deviation 1
Cheers, Dan
On Fri, Jan 29, 2021 at 11:25 PM Francois Legrand wrote:
Hi Dan,
Here is the output of ceph balancer statu
he output of `ceph balancer status` ?
Also, can you increase the debug_mgr to 4/5 then share the log file of
the active mgr?
Best,
Dan
On Fri, Jan 29, 2021 at 10:54 AM Francois Legrand wrote:
Thanks for your suggestion. I will have a look !
But I am a bit surprised that the "official" b
lts:
https://github.com/TheJJ/ceph-balancer
After you run it, it echoes the PG movements it suggests. You can then just run
those commands the cluster will balance more.
It's kinda work in progress, so I'm glad about your feedback.
Maybe it helps you :)
-- Jonas
On 27/01/2021 17.15, Francois Le
Nope !
Le 27/01/2021 à 17:40, Anthony D'Atri a écrit :
Do you have any override reweights set to values less than 1.0?
The REWEIGHT column when you run `ceph osd df`
On Jan 27, 2021, at 8:15 AM, Francois Legrand wrote:
Hi all,
I have a cluster with 116 disks (24 new disks of 16TB add
Hi all,
I have a cluster with 116 disks (24 new disks of 16TB added in december
and the rest of 8TB) running nautilus 14.2.16.
I moved (8 month ago) from crush_compat to upmap balancing.
But the cluster seems not well balanced, with a number of pgs on the 8TB
disks varying from 26 to 52 ! And a
13-S-0034 rack=SJ04 host=cephdata20b-b7e4a773b6
Does that help?
Cheers, Dan
On Wed, Dec 2, 2020 at 11:29 PM Francois Legrand <mailto:f...@lpnhe.in2p3.fr>> wrote:
Hello,
I have a ceph nautilus cluster. The crushmap is organized with 2 rooms,
servers in these rooms and osd in these servers,
Hello,
I have a ceph nautilus cluster. The crushmap is organized with 2 rooms,
servers in these rooms and osd in these servers, I have a crush rule to
replicate data over the servers in different rooms.
Now, I want to add a new server in one of the rooms. My point is that I
would like to spe
:59, Wido den Hollander a écrit :
On 31/08/2020 15:44, Francois Legrand wrote:
Thanks Igor for your answer,
We could try do a compaction of RocksDB manually, but it's not clear
to me if we have to compact on the mon with something like
ceph-kvstore-tool rocksdb /var/lib/ceph/mon/mon01/sto
you have standalone fast(SSD/NVMe) drive for
DB/WAL? Aren't there any BlueFS spillovers which might be relevant?
Thanks,
Igor
On 8/28/2020 11:33 AM, Francois Legrand wrote:
Hi all,
We have a ceph cluster in production with 6 osds servers (with 16x8TB
disks), 3 mons/mgrs and 3 mdss
We tried to rise the osd_memory_target from 4 to 8G but the problem
still occurs (osd wrongly marked down few times a day).
Does somebody have any clue ?
F.
On Fri, Aug 28, 2020 at 10:34 AM Francois Legrand
mailto:f...@lpnhe.in2p3.fr>> wrote:
Hi all,
We have
Hi all,
We have a ceph cluster in production with 6 osds servers (with 16x8TB
disks), 3 mons/mgrs and 3 mdss. Both public and cluster networks are in
10GB and works well.
After a major crash in april, we turned the option bluefs_buffered_io to
false to workaround the large write bug when bl
Hi all,
*** Short version ***
Is there a way to repair a rocksdb from errors "Encountered error while
reading data from compression dictionary block Corruption: block
checksum mismatch" and "_open_db erroring opening db" ?
*** Long version ***
We operate a nautilus ceph cluster (with 100 dis
Hello,
It seems that the index of
https://download.ceph.com/rpm-nautilus/el7/x86_64/ repository is wrong.
Only the 14.2.10-0.el7 version is available (all previous versions are
missing despite the fact that the rpms are present in the repository).
It thus seems that the index needs to be corre
st notably, performance of the MDS was improved a lot.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
____
From: Francois Legrand
Sent: 26 June 2020 15:03:23
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Rem
Does somebody uses mclock in a production cluster ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
/2020 à 09:46, Frank Schilder a écrit :
I'm using
osd_op_queue = wpq
osd_op_queue_cut_off = high
and these settings are recommended.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Francois Legrand
Sent: 26
gards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Francois Legrand
Sent: 26 June 2020 09:44:00
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Removing pool in nautilus is incredibly slow
We are now
Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Francois Legrand
Sent: 25 June 2020 19:25:14
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Removing pool in nautilus is incredibly slow
I also had this kind of symptoms with nautilus.
Replacing a
I think he means that after disk failure he waits for the cluster to get back
to ok (so all data on the lost disk have been reconstructed elsewhere) and then
the disk is changed. In that case it's normal to have misplaced objects
(because with the new disk some pgs needs to be migrated to popula
Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Francois Legrand
Sent: 25 June 2020 19:25:14
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Removing pool in nautilus is incredibly slow
I also had this kind of symptoms with nautilus.
Replacing a failed
I also had this kind of symptoms with nautilus.
Replacing a failed disk (from cluster ok) generates degraded objects.
Also, we have a proxmox cluster accessing vm images stored in our ceph storage
with rbd.
Each time I had some operation on the ceph cluster like adding or removing a
pool, most
_sleep"? It will not
speed up the process but I will give you some control over your
cluster performance.
Something like:
ceph tell osd.\* injectargs '--osd_delete_sleep1'
kind regards,
Wout
42on
On 25-06-2020 09:57, Francois Legrand wr
Does someone have an idea ?
F.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hello,
I am running ceph nautilus 14.2.8
I had to remove 2 pools (old cephfs data and metadata pool with 1024 pgs).
The removal of the pools seems to take a incredible time to free the
space (the data pool I deleted was more than 100 TB and in 36h I got
back only 10TB). In the meantime, the clus
Thanks a lot. It works.
I could delete the filesystem and remove the pools (data and metadata).
But now I am facing another problem which is that the removal of the
pools seems to take a incredible time to free the space (the pool I
deleted was about 100TB and in 36h I got back only 10TB). In th
Hello,
I have a ceph cluster (nautilus 14.2.8) with 2 filesystems and 3 mds.
mds1 is managing fs1
mds2 manages fs2
mds3 is standby
I want to completely remove fs1.
It seems that the command to use is ceph fs rm fs1 --yes-i-really-mean-it
and then delete the data and metadata pools with ceph osd p
ons?
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
____
From: Francois Legrand
Sent: 08 June 2020 16:38:18
To: Frank Schilder; ceph-users
Subject: Re: [ceph-users] Re: mds behind on trimming - replay until memory
exhausted
I a
Hi all,
We have a cephfs with data_pool in erasure coding (3+2) ans 1024 pg
(nautilus 14.2.8).
One of the pgs is partially destroyed (we lost 3 osd thus 3 shards), it
have 143 objects unfound and is stuck in state
"active+recovery_unfound+undersized+degraded+remapped".
We then lost some datas (
amage.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________
From: Francois Legrand
Sent: 08 June 2020 16:00:28
To: Frank Schilder; ceph-users
Subject: Re: [ceph-users] Re: mds behind on trimming - replay until memory
exhausted
Ther
mpus
Bygning 109, rum S14
________
From: Francois Legrand
Sent: 08 June 2020 15:27:59
To: Frank Schilder; ceph-users
Subject: Re: [ceph-users] Re: mds behind on trimming - replay until memory
exhausted
Thanks again for the hint !
Indeed, I did a
ceph daemon mds.lpnceph-mds02.in2p3.fr objecter_requests
an
regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
____
From: Francois Legrand
Sent: 08 June 2020 14:45:13
To: Frank Schilder; ceph-users
Subject: Re: [ceph-users] Re: mds behind on trimming - replay until memory
exhausted
Hi Franck,
Final
109, rum S14
____
From: Francois Legrand
Sent: 06 June 2020 11:11
To: Frank Schilder; ceph-users
Subject: Re: [ceph-users] Re: mds behind on trimming - replay until memory
exhausted
Thanks for the tip,
I will try that. For now vm.min_free_kbytes = 90112
Indeed, yesterday after your la
nd watch the cluster health closely.
Good luck and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
____
From: Francois Legrand
Sent: 05 June 2020 23:51:04
To: Frank Schilder; ceph-users
Subject: Re: [ceph-users] mds behind on
grade.
My best bet right now is to try to add swap. Maybe someone else reading this
has a better idea or you find a hint in one of the other threads.
Good luck!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Francois Legrand
trim
the logs. Will take a while, but it will do eventually.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
____
From: Francois Legrand
Sent: 05 June 2020 13:46:03
To: Frank Schilder; ceph-users
Subject: Re: [ceph-users] md
. Since I
reduced mds_cache_memory_limit to not more than half of what is physically
available, I have not had any problems again.
Hope that helps.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
____
From: Francois Legrand
again.
Hope that helps.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
____
From: Francois Legrand
Sent: 05 June 2020 11:42:42
To: ceph-users
Subject: [ceph-users] mds behind on trimming - replay until memory exhausted
Hi a
Hi all,
We have a ceph nautilus cluster (14.2.8) with two cephfs filesystem and
3 mds (1 active for each fs + one failover).
We are transfering all the datas (~600M files) from one FS (which was in
EC 3+2) to the other FS (in R3).
On the old FS we first removed the snapshots (to avoid strays pro
Hello,
We run nautilus 14.2.8 ceph cluster.
After a big crash in which we lost some disks we had a PG down (Erasure
coding 3+2 pool) and trying to fix it we followed this
https://medium.com/opsops/recovering-ceph-from-reduced-data-availability-3-pgs-inactive-3-pgs-incomplete-b97cbcb4b5a1
As the
Hi,
After a major crash in which we lost few osds, we are stucked with
incomplete pgs.
At first, peering was blocked with peering_blocked_by_history_les_bound.
Thus we set osd_find_best_info_ignore_history_les true for all osds
involved in the pg and set the primary osd down to force repeering.
Hi all,
During a crash disaster we had destroyed and reinstalled with a
different number a few osds.
As an example osd 3 was destroyed and recreated with id 101 by command
ceph osd purge 3 --yes-i-really-mean-it + ceph osd create (to block id
3) + ceph-deploy osd create --data /dev/sdxx and fi
Hi,
We had a major crash which ended with ~1/3 of our osd downs.
Trying to fix it we reinstalled a few down osd (that was a mistake, I
agree) and destroy the datas on it.
Finally, we could fix the problem (thanks to Igor Fedotov) and restart
almost all of our osds except one for which the rocksd
Is there a way to purge the crashs ?
For example is it safe and sufficient to delete everything in
/var/lib/ceph/crash on the nodes ?
F.
Le 30/04/2020 à 17:14, Paul Emmerich a écrit :
Best guess: the recovery process doesn't really stop, but it's just
that the mgr is dead and it no longer repo
Hi everybody (again),
We recently had a lot of osd crashs (more than 30 osd crashed). This is
now fixed, but it triggered a huge rebalancing+recovery.
More or less in the same time, we noticed that the ceph crash ls (or
whatever other ceph crash command) hangs forever and never returns.
And fina
slowdown,
high memory utilization and finally huge reads and/or writes from
RocksDB.
Don't know how to deal with this at the moment...
Thanks,
Igor
On 4/29/2020 5:33 PM, Francois Legrand wrote:
Here are the logs of the newly crashed osd.
F.
Le 29/04/2020 à 16:21, Igor Fedotov a é
on and finally huge reads and/or writes from RocksDB.
Don't know how to deal with this at the moment...
Thanks,
Igor
On 4/29/2020 5:33 PM, Francois Legrand wrote:
Here are the logs of the newly crashed osd.
F.
Le 29/04/2020 à 16:21, Igor Fedotov a écrit :
Sounds interesting - could
ed io" can be injected on the fly but I expect it to
help when OSD isn't starting up only.
On 4/29/2020 5:17 PM, Francois Legrand wrote:
Ok we will try that.
Indeed, restarting osd.5 triggered the falling down of two other osds
in the cluster.
Thus we will set bluefs buffered io = fal
t one failing
OSD (i.e. do not start it with the disabled buffered io) for future
experiments/troubleshooting for a while if possible.
Thanks,
Igor
On 4/29/2020 4:50 PM, Francois Legrand wrote:
Hi,
It seems much better with theses options. The osd is now up since
10mn without crashing (befo
with the following settings
now:
debug-bluefs abd debug-bdev = 20
bluefs sync write = false
bluefs buffered io = false
Thanks,
Igor
On 4/29/2020 3:35 PM, Francois Legrand wrote:
Hi Igor,
Here is what we did :
First, as other osd were falling down, we stopped all operations with
ceph os
lect debug logs for OSD startup (with both current and
updated bdev-aio parameter) and --debug-bdev/debug-bluefs set to 20.
You can omit --debug-bluestore increase for now to reduce log size.
Thanks,
Igor
On 4/28/2020 5:16 PM, Francois Legrand wrote:
Here is the output of ceph-bluestore
nd
share the output.
Thanks,
Igor
On 4/28/2020 12:55 AM, Francois Legrand wrote:
Hi all,
*** Short version ***
Is there a way to repair a rocksdb from errors "Encountered error
while reading data from compression dictionary block Corruption:
block checksum mismatch" and "_open_db
Hi all,
*** Short version ***
Is there a way to repair a rocksdb from errors "Encountered error while
reading data from compression dictionary block Corruption: block
checksum mismatch" and "_open_db erroring opening db" ?
*** Long version ***
We operate a nautilus ceph cluster (with 100 dis
moving objects and prevent its deletion.
F.
Le 14/01/2020 à 07:54, Konstantin Shalygin a écrit :
On 1/6/20 5:50 PM, Francois Legrand wrote:
I still have few questions before going on.
It seems that some metadata should remains on the original data pool,
preventing it's deletion
(http://cep
e new pool) ?
How snapshots are affected (do I have to remove all of them before the
operation) ?
Happy new year.
F.
Le 24/12/2019 à 03:53, Konstantin Shalygin a écrit :
On 12/19/19 10:22 PM, Francois Legrand wrote:
Thus my question is *how can I migrate a data pool in EC of a cephfs
to anoth
EC pool. EC pools can store data but not metadata. The header
object of the RBD will fail to be flushed. The same applies for CephFS.
Thus my question is *how can I migrate a data pool in EC of a cephfs to
another EC pool ?*
Thanks for your advices.
F.
Le 03/12/2019 à 04:07, Konstantin Shaly
Thanks.
For replica, what is the best way to change crush profile ? Is it to
create a new replica profile, and set this profile as crush rulest for
the pool (something like ceph osd pool set {pool-name} crush_ruleset
my_new_rule) ?
For erasure coding, I would thus have to change the profile a
Hi,
I have a cephfs in production based on 2 pools (data+metadata).
Data is in erasure coding with the profile :
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=3
m=2
plugin=jerasure
technique=reed_sol_van
w=8
Metadata is in replicated mode with k=3
The crush
74 matches
Mail list logo