Hi,
if I'm not mistaken, setting a cert/key combination with
ceph dashboard set-ssl-certificate[-key] -i cert[key]
only populates this config-keys:
mgr/dashboard/crt
mgr/dashboard/key
This cert/key pair should then contain either a wildcard to be
applicable to all mgr daemons. If you need
Hi,
this is well known, years ago this was discussed on this list as well.
One could argue that since it's not supported to change the EC
parameters of a pool, you shouldn't change the profile. But the EC
profile is only referenced during pool creation, so you can edit the
profile and cre
#x27;m very grateful
Vivien
________
De : Eugen Block
Envoyé : vendredi 1 août 2025 15:27:56
À : GLE, Vivien
Cc : ceph-users@ceph.io
Objet : Re: [ceph-users] Re: Pgs troubleshooting
That’s why I mentioned this two days ago:
cephadm shell -- ceph-objectstore-tool --op li
quite as well.
Zitat von "GLE, Vivien" :
I was using ceph-objectstore-tool the wrong way by doing it on host
instead of inside container via cephadm shell --name osd.x
De : GLE, Vivien
Envoyé : vendredi 1 août 2025 09:02:59
À : Eugen Block
Cc :
Can you clarify a bit more? Are you surprised that there are already
OSDs deployed although you just added the new (blank) disks? In that
case you might have already an osd service in place which
automatically deploys OSDs as soon as available devices are added. To
confirm that, please add
Hi *,
I have a VM which I use frequently to test cephadm bootstrap
operations as well as upgrades, it's a single node with a few devices
attached. After successfully testing the upgrade to 19.2.3, I wanted
to test the bootstrap again, but removing the cluster with the
--zap-osds flag does
error occurred
________
De : Eugen Block
Envoyé : jeudi 31 juillet 2025 13:27:51
À : GLE, Vivien
Cc : ceph-users@ceph.io
Objet : Re: [ceph-users] Re: Pgs troubleshooting
Why did you look at OSD.2? According to the query output you provided
I would have looked at OSD.1 (acting set). And you pa
th because there is nothing
and this is the command I used to check bluestore
ceph-objectstore-tool --data-path /var/lib/ceph/"ID"/osd.2 --op list
--pgid 2.1 --no-mon-config
De : GLE, Vivien
Envoyé : jeudi 31 juillet 2025 09:38:25
À : Eugen Block
Cc
;shards": "3,4,5",
"objects": 2
}
],
"blocked_by": [],
"up_primary": 1,
"acting_primary": 1,
"purged_snaps": []
},
Thanks
Vivien
_
pool
via rados put ?
________
De : Eugen Block
Envoyé : mercredi 30 juillet 2025 13:01:14
À : GLE, Vivien
Cc : ceph-users@ceph.io
Objet : [ceph-users] Re: Pgs troubleshooting
Not move but import as a second and third replica.
Zitat von "GLE, Vivien" :
Hi,
did
"up_primary": 1,
"acting_primary": 1,
"purged_snaps": []
},
Thanks
Vivien
De : Eugen Block
Envoyé : mardi 29 juillet 2025 16:48:41
À : ceph-users@ceph.io
Objet : [ceph-users] Re: Pgs troubleshooting
Hi,
I assume you see the duplicate OSD in 'ceph orch ps | grep -w osd.1'
as well? Are they both supposed to run on the same host?
You might have an orphaned daemon there, check 'cephadm ls
--no-detail' on the host (probably noc3), maybe there's one "legacy"
osd.1? If that is the case, remov
Hi,
did the two replaced OSDs fail at the sime time (before they were
completely drained)? This would most likely mean that both those
failed OSDs contained the other two replicas of this PG. A pg query
should show which OSDs are missing.
You could try with objectstore-tool to export the PG
at was a little unexpected but I'll leave it alone.
I think we can consider this thread closed as "invalid" (for now).
But thanks again for your response, Adam!
Zitat von Eugen Block :
Thanks, Adam.
Before I purged the nodes again, I looked at the current output of
'ceph or
ost for osd.6
or you have a consistent way to reproduce the failed removal, I can take a
look.
On Fri, Jul 25, 2025 at 8:01 AM Eugen Block wrote:
Hi *,
an unexpected issue occurred today, at least twice, so it seems kind
of reproducable. I've been preparing a demo in a (virtual) lab cluster
Hi *,
an unexpected issue occurred today, at least twice, so it seems kind
of reproducable. I've been preparing a demo in a (virtual) lab cluster
(19.2.2) and wanted to drain multiple hosts. The first time I didn't
pay much attention, but the draining seemed stuck (kind of a common
issue
Hi,
I don't use ansible, but I just redeployed a single-node Pacific
cluster with cephadm, without dashboard. Then I followed the docs you
referred to until
https://docs.ceph.com/en/pacific/mgr/dashboard/#enabling-the-object-gateway-management-frontend, where it
says:
When RGW is deploy
Hi,
Zitat von Stéphane Barthes :
Hello,
Thank you very much to every one for helping and giving advice, my
cluster is backup online with HEALTH_OK, and it looks like no data
was lost.
I have not been able to convince the cluster to run on 1 mon, as all
ceph/cephadm comman
Hi,
that hasn't been an issue for me yet. Which prometheus version has
been deployed? Do you see any errors in the prometheus and/or ceph-mgr
log? I'd ignore grafana for now since it only displays what prometheus
is supposed to collect. To get fresh logs, I would fail the mgr and
probably
Bangalore, India
On Wed, Jul 16, 2025 at 3:31 PM Eugen Block wrote:
No, it's definitely not safe. If you remove the overlay without
flushing the dirty objects, you will face data loss. Unfortunately,
the cache tier hasn't been supported for a while and even when it was,
it was discou
Hi,
I agree, trying to fix a broken test cluster is absolutely helpful. I
recommend to read the docs [0], especially [1] and [2]. For [2] you'll
have to adopt the commands to cephadm shell since it's still written
for non-cephadm clusters. But there are threads on this list that
cover tho
Good morning,
what Ceph version is this? Apparently it's not cephadm managed? If it
is, there's no need to fiddle with ceph-volume yourself, the
orchestrator can handle that for you, either by using a suitable spec
file or via command line. Every now and then users on this list
discuss ab
For now I set the service to "unmanaged" to prevent further log
flooding. But I would still like to know why the cache is not updated
properly.
Zitat von Eugen Block :
Good morning,
I noticed something strange on a 18.2.7 cluster, running on Ubuntu
22.04, deployed by cephadm.
Good morning,
I noticed something strange on a 18.2.7 cluster, running on Ubuntu
22.04, deployed by cephadm. There are 10 hosts in total, 5 of them are
all-flash and those aren't affected. The other 5 hosts are hdd-only,
and only 4 of those are affected:
The /var/log/ceph/{FSID}/ceph-volu
Ltd
Bangalore, India
On Wed, Jul 16, 2025 at 2:23 PM Eugen Block wrote:
Hi (got a bounce, resending),
Zitat von Vishnu Bhaskar :
> Hi Eugen,
>
> I wanted to provide an update regarding the volumes. I've confirmed that
> none of my volumes are mapped to multiple machine
askar
Acceleron Labs Pvt Ltd
Bangalore, India
On Wed, Jul 16, 2025 at 1:00 PM Eugen Block wrote:
Just because you seem to be able to write from a client perspective
doesn't mean that the data is actually written onto the OSD. For
example, if you attach an RBD image to two VMs simultaneousl
Pvt Ltd
Bangalore, India
On Tue, Jul 15, 2025 at 3:34 PM Eugen Block wrote:
Hi,
we've been there a couple of months ago.
Since my attempts of flushing the cache objects didn't complete, I
experimented a bit. I gradually decreased the target bytes of the
cache which then caused aut
Hi,
we've been there a couple of months ago.
Since my attempts of flushing the cache objects didn't complete, I
experimented a bit. I gradually decreased the target bytes of the
cache which then caused automatic flushing of most of the remaining
data objects. But all the header objects wer
ady exists, skipping create. Use --force-init to
overwrite the existing object.
Zitat von Robert Sander :
Hi,
Am 7/14/25 um 15:06 schrieb Eugen Block:
# cephfs-table-tool secondfs:0 show session
That works. But what about cephfs-data-scan?
# cephfs-data-scan init
Specify a filesystem wit
Or even without --rank:
# cephfs-table-tool secondfs:0 show session
{
"0": {
"data": {
"sessions": []
},
"result": 0
}
}
Zitat von Uwe Richter :
Hi,
https://docs.ceph.com/en/reef/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-obje
Hi,
changing the scheduler requires a OSD restart, and it is done in a
staggered manner by default. So the command you mentioned will do that
for you.
https://docs.clyso.com/blog/2023/03/22/ceph-how-do-disable-mclock-scheduler/
Zitat von Anthony D'Atri :
I don’t *think* OSD restarts are
Hi Kasper,
that's exactly what we usually do if we have identified some
misbehavior, trying to find the right setting to mitigate the issue.
If you see cache pressure messages, it might be more helpful to rather
decrease mds_recall_max_caps (default: 3) than to increase it
(your setting
Hi,
I asked this question [0] five years ago, I haven't noticed anything
wrt rbd quotas in the last releases. Has anyone an update on this
topic? I'd appreciate it!
Thanks,
Eugen
[0]
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ZCZ6MTFS645EQ73RZDZ7AFXJEFSA3OB3/#ZCZ6MTF
quot;.
So may be this is somehow related.
Thanks
Dietmar
On 7/10/25 09:12, Eugen Block wrote:
Hi,
every thread I found so far mentioned that this resolved itself
after some time. Maybe you can confirm?
Zitat von Dietmar Rieder :
Hi,
our ceph cluster reported an inconsistent pg, so we
Hi Robert,
were you able to resolve this issue? I haven't faced that error myself
yet, so I can't really comment. But it would be interesting to know if
and how you got out of it.
Thanks,
Eugen
Zitat von Robert Sander :
Hi,
Am 6/30/25 um 16:50 schrieb Robert Sander:
With marking the MD
Hi,
every thread I found so far mentioned that this resolved itself after
some time. Maybe you can confirm?
Zitat von Dietmar Rieder :
Hi,
our ceph cluster reported an inconsistent pg, so we set it to repair:
# ceph pg repair 4.b10
# ceph health detail
HEALTH_ERR 1 scrub errors; Possible
Hi,
personally, I like to have the daemon logs as files in
/var/log/ceph/{FSID}/ and propose that to every customer as well. The
docs [0] have some guidance how to do that.
Regards,
Eugen
[0] https://docs.ceph.com/en/latest/cephadm/operations/#logging-to-files
Zitat von Sinan Polat :
Hi
Hi,
that is correct, no need to specifiy wal, they will be automatically
colocated on the db devices.
Zitat von Steven Vacaroaia :
Hello
I have redeployed the cluster
I am planning to using bellow spec file
--dry-run shows that DB partitions will be created BUT not WAL ones
My understa
Hi,
one of our use cases for CephFS is home directories for our LDAP
users. The user's VMs use kernel mount with a autofs user which has
the CephFS auth caps. So we don't have each user as a client but one
main CephFS client. Maybe that helps as a workaround?
Regards,
Eugen
Zitat von Bur
Can you show the overall cluster status (ceph -s)? If there's
something else going on, it might block (some?) operations. And I'd
scan the mgr logs, maybe in debug mode to see why it fails to operate
properly.
Zitat von Holger Naundorf :
On 27.06.25 14:16, Eugen Block wrote:
effect now as well - or
should I reissue the OSD rm command as well?
Is there something in the queue (ceph orch osd rm status)? Sometimes
the queue clears after a mgr restart, so it might be necessary to
restart the rm command as well.
Regards,
Holger
On 27.06.25 12:26, Eugen Block
Hi,
have you retried it after restarting/failing the mgr?
ceph mgr fail
Quite often this (still) helps.
Zitat von Holger Naundorf :
Hello,
we are running a ceph cluster at version:
ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable)
and since a few weeks the orche
aid 19.2.3 but meant the next reef
unfortunately, a mistake was made in backporting some changes related
to thread names and the radosgw process gets renamed to
"notif-worker0" as a result. so commands like pkill expect that string
instead of radosgw
On Wed, Jun 25, 2025
Hi,
we work with Openstack and Ceph as well, and we also support customers
with such deployments, but in 10 years I haven't had to rebuild any
object maps yet, ever. So I'm wondering what exactly you're seeing
when you do have to rebuild them.
One of our customers has a middle sized cloud (
Hi,
in a previous thread you wrote that you had multiple simultaneous disk
failures, and you replaced all of the drives. I assume that the
failures happened across different hosts? And the remaining hosts and
OSDs were not able to recover? I'm just trying to get a better idea of
what exac
Yes, that worked as expected. I can't see any negative impact yet.
Zitat von Eugen Block :
Thanks a lot, Casey. I'm still not sure why I couldn't find that
myself, but thanks anyway. I have added notif-worker0 to the
logrotate file in both a test cluster and one production c
d, 25 Jun 2025 at 11:58, Eugen Block wrote:
Thanks Frédéric.
The customer found the sticky flag, too. I must admit, I haven't used
the mute command too often yet, usually I try to get to the bottom of
a warning and rather fix the underlying issue. :-D
So the mute clears if the number
ing some changes related
> to thread names and the radosgw process gets renamed to
> "notif-worker0" as a result. so commands like pkill expect that string
> instead of radosgw
>
> On Wed, Jun 25, 2025 at 7:00 AM Eugen Block wrote:
> >
> > Interesting, it seems like
Hi,
after upgrading multiple clusters from 18.2.4. some weeks ago, I
noticed that the RGWs stop logging to file after the nightly
logrotate. Other daemons don't seem to be affected, they continue
logging to file. Restarting an RGW daemon helps until the next
logrotate.
I could reproduce
ntil the process is restarted. Is there some workaround
possible until we upgrade?
Zitat von Eugen Block :
Hi,
after upgrading multiple clusters from 18.2.4. some weeks ago, I
noticed that the RGWs stop logging to file after the nightly
logrotate. Other daemons don't seem to be a
er of affected PGs increased (which was
decided to be a good reason to alert the admin).
Have you tried to use the --sticky argument on the 'ceph health
mute' command?
Cheers,
Frédéric.
- Le 25 Juin 25, à 9:21, Eugen Block ebl...@nde.ag a écrit :
Hi,
I'm trying to und
Hohenzollernstr. 27, 80801 Munich
Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306
Eugen Block schrieb am Mi., 25. Juni 2025, 10:05:
Hi,
after upgrading multiple clusters from 18.2.4. some weeks ago, I
noticed that the RGWs stop logging to file after the nightly
logrotate. Other
Hi,
I'm trying to understand the "ceph health mute" behavior. In this
case, I'm referring to the warning PG_NOT_DEEP_SCRUBBED. If you mute
it for a week and the cluster continues deep-scrubbing, the "mute"
will clear at some point although there are still PGs not
deep-scrubbed in time war
The default OSD memory cache size is 4 GB, it’s not recommended to
reduce it to such low values, especially if there’s real load on the
cluster. I am not a developer, so I can’t really comment on the code.
Zitat von Hector Martin :
Hi all,
I have a small 3-node cluster (4 HDD + 1 SSD OSD p
Maybe you should ask this additionally on the devs mailing list.
Zitat von Hector Martin :
On 2025/06/23 0:21, Anthony D'Atri wrote:
DIMMs are cheap.
No DIMMs on Apple Macs.
You’re running virtualized in VMs or containers, with OSDs, mons,
mgr, and the constellation of other daemons
un 22, 2025, at 9:22 AM, Eugen Block wrote:
The command 'ceph osd find ' is not the right one to query an
OSD for the cluster network, it just shows the public address of an
OSD (like a client would need to). Just use 'ceph osd dump' and
look at the OSD output.
Zi
The command 'ceph osd find ' is not the right one to query an OSD
for the cluster network, it just shows the public address of an OSD
(like a client would need to). Just use 'ceph osd dump' and look at
the OSD output.
Zitat von Devender Singh :
Hello
I checked on my all clusters everywh
What's the output of
ceph config dump | grep cluster_network
and
ceph config get osd cluster_network
Is it only some OSDs or all not using cluster_network? It's not
entirely clear from your question. OSDs automatically use the
public_network as a fallback, so if all of them use the
publi
Cool, that's fantastic news! And a great analysis, too! I'm glad you
got it back up and client operations could resume. Happy to help!
Zitat von Miles Goodhew :
On Thu, 19 Jun 2025, at 18:39, Eugen Block wrote:
Zitat von Miles Goodhew :
> On Thu, 19 Jun 2025, at 17:48, Euge
that pool. Now I have 167 omap objects that are not
quite as big, but still too large.
Sincerely
Niklaus Hofer
On 19/06/2025 14.48, Eugen Block wrote:
Hi,
the warnings about large omap objects are reported when
deep-scrubs happen. So if you resharded the bucket (or Ceph did
that for you), you&
us Hofer
On 19/06/2025 14.48, Eugen Block wrote:
Hi,
the warnings about large omap objects are reported when deep-scrubs
happen. So if you resharded the bucket (or Ceph did that for you),
you'll either have to wait for the deep-scrub schedule to scrub the
affected PGs, or you issue a
Default question: have you tried to fail the mgr? ;-)
ceph mgr fail
Zitat von Niklaus Hofer :
Dear all
After upgrading to Pacific, we are now getting health warnings from
the auto scaler:
10 pools have too few placement groups
8 pools have too many placement groups
Hi,
the warnings about large omap objects are reported when deep-scrubs
happen. So if you resharded the bucket (or Ceph did that for you),
you'll either have to wait for the deep-scrub schedule to scrub the
affected PGs, or you issue a manual deep-scrub on that PG or the
entire pool.
Re
Zitat von Miles Goodhew :
On Thu, 19 Jun 2025, at 17:48, Eugen Block wrote:
Too bad. :-/ Could you increase the debug log level to 20? Maybe it
gets a bit clearer where exactly it fails.
I guess that's in `ceph.conf` with:
[mon]
debug_mon = 20
?
Correct.
Good thinking: I
Too bad. :-/ Could you increase the debug log level to 20? Maybe it
gets a bit clearer where exactly it fails.
Just to understand the current situation, you did reduce the monmap to
1 (mon3), then you tried the same with mon2. Because when you write:
I'm guessing that mon2 is only "running" b
mon store. But let's
see how far you get before exploring this option.
[1]
https://heiterbiswolkig.blogs.nde.ag/2023/08/14/how-to-migrate-from-suse-enterprise-storage-to-upstream-ceph/
Zitat von Miles Goodhew :
On Wed, 18 Jun 2025, at 18:09, Eugen Block wrote:
That does look strange
https://github.com/ceph/ceph/blob/v14.2.22/src/mon/MDSMonitor.cc#L1801
Zitat von Eugen Block :
That does look strange indeed, either an upgrade went wrong or
someone already fiddled with the monmap, I'd say. But anyway, I
wouldn't try to deploy a 4th mon since it would want to sync the
27;m just in a bit of decision paralysis about which mon to take as
the survivor. All can run _individually_, but only mon2 will
survive a group start. mon3 was the last one working, but it has
the mysterious "failed to assign global ID" errors. I'm leaning
toward using mon
Hi,
correct, SUSE's Ceph product was Salt-based, in this case 14.2.22 was
shipped with SES 6. ;-)
Do you also have some mon logs from right before the crash, maybe with
a higher debug level? It could make sense to stop client traffic and
OSDs as well to be able to recover. But unfortunate
Besides Michels response regarding the default of 24 hours after which
the warning usually would disappear, I wanted to mention that we also
saw this warning during some network issues we had. So if the disks
seem okay, I'd recommend to check the network components.
Zitat von Jan Kasprzak :
80801 Munich
Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306
Eugen Block schrieb am Mo., 16. Juni 2025, 16:09:
I just noticed that the options crush_location and read_from_replica
from the rbd man page apparently only apply to rbd mapping options.
That doesn't really he
duction of rados_replica_read_policy will make those localized
reads available in general.
Zitat von Eugen Block :
Hi Frédéric,
thanks a lot for looking into that, I appreciate it. Until a year
ago or so we used custom location hooks for a few OSDs, but not for
clients (yet).
I hav
ack1|rack:myrack2|datacenter:mydc
If you happen to test rados_replica_read_policy = localize, let us
know how it works. ;-)
Cheers,
Frédéric.
[1] https://github.com/ceph/ceph/blob/main/doc/man/8/rbd.rst
- Le 13 Juin 25, à 10:56, Eugen Block ebl...@nde.ag a écrit :
And a follow-up quest
? I'd
appreciate any insights.
Zitat von Eugen Block :
Hi *,
I have a question regarding the upcoming feature to optimize read
performance [0] by reading from the nearest OSD, especially in a
stretch cluster across two sites (or more). Anthony pointed me to
[1], looks like a new c
Hi *,
I have a question regarding the upcoming feature to optimize read
performance [0] by reading from the nearest OSD, especially in a
stretch cluster across two sites (or more). Anthony pointed me to [1],
looks like a new config option will be introduced in Tentacle:
rados_replica_read
I created:
https://tracker.ceph.com/issues/71635
Zitat von Eugen Block :
I think this is a bug. Looking at the mon log when creating such a
pool, it appears that it's parsing the crush_rule as an erasure-code
profile and then selects the default rule 0 (default
replicated
ze": 4}]': finished
If I use "default" for the ec profile, it (incorrectly) assumes it's a
rule name:
soc9-ceph:~ # ceph osd pool create temp8 4 replicated default 1 0 4
Error ENOENT: specified rule default doesn't exist
Although the mon command is parsed as
Hi,
I didn't read the entire thread in detail, but to get some file mapped
into the containers you can utilize extra-entrypoint-args [0].
[0] https://docs.ceph.com/en/reef/cephadm/services/#extra-entrypoint-arguments
Zitat von Albert Shih :
Le 10/06/2025 à 16:46:28+0200, Albert Shih a écri
the source code.
I assume that the "workaround" for Squid is to deploy manually the
certifcates, right?
Cheers
Iztok
On 10/06/25 12:31, Eugen Block wrote:
I assume it's a mistake in the docs. Comparing the branches for
20.0.0 [0] and 19.2.2 [1] reveals that the generate_cert
Hi,
did you only run the recover_dentries command or did you follow the
entire procedure from your first message?
If the cluster reports a healthy status, I assume that all is good.
Zitat von b...@nocloud.ch:
I think i was luky...
```sh
[root@ceph1 ~]# cephfs-journal-tool --rank=cephfs:0
Setting the config-key manually is in addition to using
rgw_frontend_ssl_certificate, it's not either or. But good that it
works for you that way as well.
Zitat von Albert Shih :
Le 06/06/2025 à 18:14:52+0000, Eugen Block a écrit
Hi,
I don't have a good explanation for y
I assume it's a mistake in the docs. Comparing the branches for 20.0.0
[0] and 19.2.2 [1] reveals that the generate_cert parameter is not
present in Squid but will be in Tentacle.
[0]
https://github.com/ceph/ceph/blob/v20.0.0/src/python-common/ceph/deployment/service_spec.py#L1235
[1]
htt
Hi,
I don't have a good explanation for you, but it should be a
workaround. I've been looking into all kinds of variations with
concatenated certs etc., but what works for me is to set the mentioned
config-key. You can find an example in the (old-ish) SUSE docs [0].
ceph config-key set rg
Forgot to add that it's version 19.2.2 (also tried it on 19.2.0).
Zitat von Eugen Block :
Hi,
without having checked the tracker, does anyone have an explanation
why the size parameter is not applied when creating a pool via CLI?
According to the help output for 'ceph osd pool
Hi,
without having checked the tracker, does anyone have an explanation
why the size parameter is not applied when creating a pool via CLI?
According to the help output for 'ceph osd pool create -h' you can
specify expected_num_objects (btw. I don't understand what impact that
has, all I
cephfs-table-tool --cluster --rank=:all reset session
And then finally bring the FS back up.
And lastly,, conclussion in regards to my understanding of WA on
61009 is important in order to avoid this issue in the future.
From: Eugen Block
Sent:
Is that image in the trash?
`rbd -p pool trash ls`
Zitat von Gaël THEROND :
Hi folks,
I've a quick question. On one of our pool we found out an image that
doesn't exist anymore physically (This image doesn't exist, have no snap
attached, is not parent of another image) but is still listed whe
/issues/71501#note-4
Respectfully,
*Wes Dillingham*
LinkedIn <http://www.linkedin.com/in/wesleydillingham>
w...@wesdillingham.com
On Fri, May 30, 2025 at 12:34 PM Eugen Block wrote:
Okay, and a hardware issue can be ruled out, I assume?
To get the cluster up again I would also consider starting on
#x27;m not sure how to do that right now.
Presumably those syncing tunables you tweaked only come into play if/when a
mon reaches synchronizing?
Respectfully,
*Wes Dillingham*
LinkedIn <http://www.linkedin.com/in/wesleydillingham>
w...@wesdillingham.com
On Fri, May 30, 2025 at 11:15
Hi Wes,
although I don't have seen this exact issue, we did investigate a mon
sync issue two years ago. The customer also has 5 MONs and two of them
get out of quorum regularly in addition to the long sync times. For
the syncing issue we found some workarounds (paxos settings), but we
nev
Just a note on db_slots, I don’t think it has ever worked properly,
and last time I checked it still wasn’t implemented
(https://www.spinics.net/lists/ceph-users/msg83189.html).
This option should probably be entirely removed from the docs, unless
it’s coming soon.
Zitat von Anthony D'Atri
It’s reported by the mgr, so you’ll either have to pass global or mgr
and osd to the configuration change. You can also check ‚ceph config
help {CONFIG}‘ to check which services are related to that
configuration value.
Zitat von Michel Jouvin :
The page I checked,
https://docs.ceph.com/e
e with an incredibly high number of
late deep scrubs that can be worrying...
Michel
Le 26/05/2025 à 09:56, Eugen Block a écrit :
It’s reported by the mgr, so you’ll either have to pass global or
mgr and osd to the configuration change. You can also check ‚ceph
config help {CONFIG}‘ to check whic
ntion of the devs?
BR. Kasper
____
From: Eugen Block
Sent: Tuesday, May 20, 2025 15:51
To: Kasper Rasmussen
Cc: Alexander Patrakov ; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: MDS Repeatedly Crashing/Restarting -
Unable to get CephFS Active
In that case I would back up both journals, ju
rank=:all --journal=mdlog
journal inspect
cephfs-journal-tool --rank=:all --journal=purge_queue
journal inspect
return:
Overall journal integrity: OK
From: Kasper Rasmussen
Sent: Tuesday, May 20, 2025 09:48
To: Eugen Block ; Alexander Patrakov
Cc: ceph-users
w to use such a backup if
disaster recovery fails. Do you know the procedure?
On Tue, May 20, 2025 at 1:23 AM Eugen Block wrote:
Hi,
not sure if it was related to journal replay, but have you checked for
memory issues? What's the mds memory target? Any traces of an oom
killer?
Next I wou
Just a quick update: I set auth_allow_insecure_global_id_reclaim to
false because all the client sessions we had showed either new_ok or
reclaim_ok in global_id_status. No complaints so far. :-)
Zitat von Eugen Block :
The mon sessions dump also shows the global_id_status, this could
help
Hi,
not sure if it was related to journal replay, but have you checked for
memory issues? What's the mds memory target? Any traces of an oom
killer?
Next I would do is inspect the journals for both purge_queue and md_log:
cephfs-journal-tool journal inspect --rank= --journal=md_log
cephfs-
ts for session properties without ellipsing.
For this purpose, in my search I found the command "ceph daemon
mon-name sessions" were I saw the "luminous" word that in my mind
was "wrong" from my top post of this thread.
Il 16/05/2025 14:56, Eugen Block ha
Hi,
which Ceph version is this? It's apparently not managed by cephadm.
Zitat von "Konold, Martin" :
Hi,
I am working on a small 3 node ceph cluster which used to work as expected.
When creating a new ceph osd the ceph-volume command throws some
errors and filestore instead of bluestore is
1 - 100 of 1442 matches
Mail list logo