[ceph-users] Re: tls certs per manager - does it work?

2025-08-03 Thread Eugen Block
Hi, if I'm not mistaken, setting a cert/key combination with ceph dashboard set-ssl-certificate[-key] -i cert[key] only populates this config-keys: mgr/dashboard/crt mgr/dashboard/key This cert/key pair should then contain either a wildcard to be applicable to all mgr daemons. If you need

[ceph-users] Re: Changing the failure domain of an EC cluster still shows old profile

2025-08-03 Thread Eugen Block
Hi, this is well known, years ago this was discussed on this list as well. One could argue that since it's not supported to change the EC parameters of a pool, you shouldn't change the profile. But the EC profile is only referenced during pool creation, so you can edit the profile and cre

[ceph-users] Re: Pgs troubleshooting

2025-08-01 Thread Eugen Block
#x27;m very grateful Vivien ________ De : Eugen Block Envoyé : vendredi 1 août 2025 15:27:56 À : GLE, Vivien Cc : ceph-users@ceph.io Objet : Re: [ceph-users] Re: Pgs troubleshooting That’s why I mentioned this two days ago: cephadm shell -- ceph-objectstore-tool --op li

[ceph-users] Re: Pgs troubleshooting

2025-08-01 Thread Eugen Block
quite as well. Zitat von "GLE, Vivien" : I was using ceph-objectstore-tool the wrong way by doing it on host instead of inside container via cephadm shell --name osd.x De : GLE, Vivien Envoyé : vendredi 1 août 2025 09:02:59 À : Eugen Block Cc :

[ceph-users] Re: Has a FileSystem, Insufficient space (<10 extents) on vgs, LVM detected on all newly added osd disks

2025-08-01 Thread Eugen Block
Can you clarify a bit more? Are you surprised that there are already OSDs deployed although you just added the new (blank) disks? In that case you might have already an osd service in place which automatically deploys OSDs as soon as available devices are added. To confirm that, please add

[ceph-users] Squid 19.2.3 rm-cluster does not zap OSDs

2025-08-01 Thread Eugen Block
Hi *, I have a VM which I use frequently to test cephadm bootstrap operations as well as upgrades, it's a single node with a few devices attached. After successfully testing the upgrade to 19.2.3, I wanted to test the bootstrap again, but removing the cluster with the --zap-osds flag does

[ceph-users] Re: Pgs troubleshooting

2025-07-31 Thread Eugen Block
error occurred ________ De : Eugen Block Envoyé : jeudi 31 juillet 2025 13:27:51 À : GLE, Vivien Cc : ceph-users@ceph.io Objet : Re: [ceph-users] Re: Pgs troubleshooting Why did you look at OSD.2? According to the query output you provided I would have looked at OSD.1 (acting set). And you pa

[ceph-users] Re: Pgs troubleshooting

2025-07-31 Thread Eugen Block
th because there is nothing and this is the command I used to check bluestore ceph-objectstore-tool --data-path /var/lib/ceph/"ID"/osd.2 --op list --pgid 2.1 --no-mon-config De : GLE, Vivien Envoyé : jeudi 31 juillet 2025 09:38:25 À : Eugen Block Cc

[ceph-users] Re: Pgs troubleshooting

2025-07-30 Thread Eugen Block
;shards": "3,4,5", "objects": 2 } ], "blocked_by": [], "up_primary": 1, "acting_primary": 1, "purged_snaps": [] }, Thanks Vivien _

[ceph-users] Re: Pgs troubleshooting

2025-07-30 Thread Eugen Block
pool via rados put ? ________ De : Eugen Block Envoyé : mercredi 30 juillet 2025 13:01:14 À : GLE, Vivien Cc : ceph-users@ceph.io Objet : [ceph-users] Re: Pgs troubleshooting Not move but import as a second and third replica. Zitat von "GLE, Vivien" : Hi, did

[ceph-users] Re: Pgs troubleshooting

2025-07-30 Thread Eugen Block
"up_primary": 1, "acting_primary": 1, "purged_snaps": [] }, Thanks Vivien De : Eugen Block Envoyé : mardi 29 juillet 2025 16:48:41 À : ceph-users@ceph.io Objet : [ceph-users] Re: Pgs troubleshooting

[ceph-users] Re: Upgrade from 19.2.2 to .3 pauses on 'phantom' duplicate osd?

2025-07-30 Thread Eugen Block
Hi, I assume you see the duplicate OSD in 'ceph orch ps | grep -w osd.1' as well? Are they both supposed to run on the same host? You might have an orphaned daemon there, check 'cephadm ls --no-detail' on the host (probably noc3), maybe there's one "legacy" osd.1? If that is the case, remov

[ceph-users] Re: Pgs troubleshooting

2025-07-29 Thread Eugen Block
Hi, did the two replaced OSDs fail at the sime time (before they were completely drained)? This would most likely mean that both those failed OSDs contained the other two replicas of this PG. A pg query should show which OSDs are missing. You could try with objectstore-tool to export the PG

[ceph-users] Re: Squid: successfully drained host can't be removed

2025-07-26 Thread Eugen Block
at was a little unexpected but I'll leave it alone. I think we can consider this thread closed as "invalid" (for now). But thanks again for your response, Adam! Zitat von Eugen Block : Thanks, Adam. Before I purged the nodes again, I looked at the current output of 'ceph or

[ceph-users] Re: Squid: successfully drained host can't be removed

2025-07-26 Thread Eugen Block
ost for osd.6 or you have a consistent way to reproduce the failed removal, I can take a look. On Fri, Jul 25, 2025 at 8:01 AM Eugen Block wrote: Hi *, an unexpected issue occurred today, at least twice, so it seems kind of reproducable. I've been preparing a demo in a (virtual) lab cluster

[ceph-users] Squid: successfully drained host can't be removed

2025-07-25 Thread Eugen Block
Hi *, an unexpected issue occurred today, at least twice, so it seems kind of reproducable. I've been preparing a demo in a (virtual) lab cluster (19.2.2) and wanted to drain multiple hosts. The first time I didn't pay much attention, but the draining seemed stuck (kind of a common issue

[ceph-users] Re: [ceph-user]ceph-ansible pacific || RGW integration with ceph dashboard

2025-07-24 Thread Eugen Block
Hi, I don't use ansible, but I just redeployed a single-node Pacific cluster with cephadm, without dashboard. Then I followed the docs you referred to until https://docs.ceph.com/en/pacific/mgr/dashboard/#enabling-the-object-gateway-management-frontend, where it says: When RGW is deploy

[ceph-users] Re: Newby woes with ceph

2025-07-24 Thread Eugen Block
Hi, Zitat von Stéphane Barthes : Hello, Thank you very much to every one for helping and giving advice, my cluster is backup online with HEALTH_OK, and it looks like no data was lost.  I have not been able to convince the cluster to run on 1 mon, as all ceph/cephadm comman

[ceph-users] Re: squid 19.2.2 deployed with cephadmin - no grafana data on some dashboards ( RGW, MDS)

2025-07-23 Thread Eugen Block
Hi, that hasn't been an issue for me yet. Which prometheus version has been deployed? Do you see any errors in the prometheus and/or ceph-mgr log? I'd ignore grafana for now since it only displays what prometheus is supposed to collect. To get fresh logs, I would fail the mgr and probably

[ceph-users] Re: Cache removal cause vms to crash

2025-07-22 Thread Eugen Block
Bangalore, India On Wed, Jul 16, 2025 at 3:31 PM Eugen Block wrote: No, it's definitely not safe. If you remove the overlay without flushing the dirty objects, you will face data loss. Unfortunately, the cache tier hasn't been supported for a while and even when it was, it was discou

[ceph-users] Re: Newby woes with ceph

2025-07-22 Thread Eugen Block
Hi, I agree, trying to fix a broken test cluster is absolutely helpful. I recommend to read the docs [0], especially [1] and [2]. For [2] you'll have to adopt the commands to cephadm shell since it's still written for non-cephadm clusters. But there are threads on this list that cover tho

[ceph-users] Re: ceph-volume partial failure with multiple OSDs per device

2025-07-21 Thread Eugen Block
Good morning, what Ceph version is this? Apparently it's not cephadm managed? If it is, there's no need to fiddle with ceph-volume yourself, the orchestrator can handle that for you, either by using a suitable spec file or via command line. Every now and then users on this list discuss ab

[ceph-users] Re: Reef: cephadm tries to apply specs frequently

2025-07-18 Thread Eugen Block
For now I set the service to "unmanaged" to prevent further log flooding. But I would still like to know why the cache is not updated properly. Zitat von Eugen Block : Good morning, I noticed something strange on a 18.2.7 cluster, running on Ubuntu 22.04, deployed by cephadm.

[ceph-users] Reef: cephadm tries to apply specs frequently

2025-07-16 Thread Eugen Block
Good morning, I noticed something strange on a 18.2.7 cluster, running on Ubuntu 22.04, deployed by cephadm. There are 10 hosts in total, 5 of them are all-flash and those aren't affected. The other 5 hosts are hdd-only, and only 4 of those are affected: The /var/log/ceph/{FSID}/ceph-volu

[ceph-users] Re: Cache removal cause vms to crash

2025-07-16 Thread Eugen Block
Ltd Bangalore, India On Wed, Jul 16, 2025 at 2:23 PM Eugen Block wrote: Hi (got a bounce, resending), Zitat von Vishnu Bhaskar : > Hi Eugen, > > I wanted to provide an update regarding the volumes. I've confirmed that > none of my volumes are mapped to multiple machine

[ceph-users] Re: Cache removal cause vms to crash

2025-07-16 Thread Eugen Block
askar Acceleron Labs Pvt Ltd Bangalore, India On Wed, Jul 16, 2025 at 1:00 PM Eugen Block wrote: Just because you seem to be able to write from a client perspective doesn't mean that the data is actually written onto the OSD. For example, if you attach an RBD image to two VMs simultaneousl

[ceph-users] Re: Cache removal cause vms to crash

2025-07-16 Thread Eugen Block
Pvt Ltd Bangalore, India On Tue, Jul 15, 2025 at 3:34 PM Eugen Block wrote: Hi, we've been there a couple of months ago. Since my attempts of flushing the cache objects didn't complete, I experimented a bit. I gradually decreased the target bytes of the cache which then caused aut

[ceph-users] Re: Cache removal cause vms to crash

2025-07-15 Thread Eugen Block
Hi, we've been there a couple of months ago. Since my attempts of flushing the cache objects didn't complete, I experimented a bit. I gradually decreased the target bytes of the cache which then caused automatic flushing of most of the remaining data objects. But all the header objects wer

[ceph-users] Re: CephFS: no MDS does join the filesystem

2025-07-14 Thread Eugen Block
ady exists, skipping create. Use --force-init to overwrite the existing object. Zitat von Robert Sander : Hi, Am 7/14/25 um 15:06 schrieb Eugen Block: # cephfs-table-tool secondfs:0 show session That works. But what about cephfs-data-scan? # cephfs-data-scan init Specify a filesystem wit

[ceph-users] Re: CephFS: no MDS does join the filesystem

2025-07-14 Thread Eugen Block
Or even without --rank: # cephfs-table-tool secondfs:0 show session { "0": { "data": { "sessions": [] }, "result": 0 } } Zitat von Uwe Richter : Hi, https://docs.ceph.com/en/reef/cephfs/disaster-recovery-experts/#recovery-from-missing-metadata-obje

[ceph-users] Re: squid 19.2.2 - troubleshooting pgs in active+remapped+backfill - no pictures

2025-07-11 Thread Eugen Block
Hi, changing the scheduler requires a OSD restart, and it is done in a staggered manner by default. So the command you mentioned will do that for you. https://docs.clyso.com/blog/2023/03/22/ceph-how-do-disable-mclock-scheduler/ Zitat von Anthony D'Atri : I don’t *think* OSD restarts are

[ceph-users] Re: Experience with high mds_max_caps_per_client

2025-07-11 Thread Eugen Block
Hi Kasper, that's exactly what we usually do if we have identified some misbehavior, trying to find the right setting to mitigate the issue. If you see cache pressure messages, it might be more helpful to rather decrease mds_recall_max_caps (default: 3) than to increase it (your setting

[ceph-users] RBD quota per namespace

2025-07-10 Thread Eugen Block
Hi, I asked this question [0] five years ago, I haven't noticed anything wrt rbd quotas in the last releases. Has anyone an update on this topic? I'd appreciate it! Thanks, Eugen [0] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ZCZ6MTFS645EQ73RZDZ7AFXJEFSA3OB3/#ZCZ6MTF

[ceph-users] Re: [EXTERN] Re: pg repair starts (endless loop ?)

2025-07-10 Thread Eugen Block
quot;. So may be this is somehow related. Thanks Dietmar On 7/10/25 09:12, Eugen Block wrote: Hi, every thread I found so far mentioned that this resolved itself after some time. Maybe you can confirm? Zitat von Dietmar Rieder : Hi, our ceph cluster reported an inconsistent pg, so we

[ceph-users] Re: CephFS: no MDS does join the filesystem

2025-07-10 Thread Eugen Block
Hi Robert, were you able to resolve this issue? I haven't faced that error myself yet, so I can't really comment. But it would be interesting to know if and how you got out of it. Thanks, Eugen Zitat von Robert Sander : Hi, Am 6/30/25 um 16:50 schrieb Robert Sander: With marking the MD

[ceph-users] Re: pg repair starts (endless loop ?)

2025-07-10 Thread Eugen Block
Hi, every thread I found so far mentioned that this resolved itself after some time. Maybe you can confirm? Zitat von Dietmar Rieder : Hi, our ceph cluster reported an inconsistent pg, so we set it to repair: # ceph pg repair 4.b10 # ceph health detail HEALTH_ERR 1 scrub errors; Possible

[ceph-users] Re: Managing RGW container logs filling up disk space

2025-07-10 Thread Eugen Block
Hi, personally, I like to have the daemon logs as files in /var/log/ceph/{FSID}/ and propose that to every customer as well. The docs [0] have some guidance how to do that. Regards, Eugen [0] https://docs.ceph.com/en/latest/cephadm/operations/#logging-to-files Zitat von Sinan Polat : Hi

[ceph-users] Re: ceph squid - huge difference between capacity reported by "ceph -s" and "ceph df "

2025-07-02 Thread Eugen Block
Hi, that is correct, no need to specifiy wal, they will be automatically colocated on the db devices. Zitat von Steven Vacaroaia : Hello I have redeployed the cluster I am planning to using bellow spec file --dry-run shows that DB partitions will be created BUT not WAL ones My understa

[ceph-users] Re: CephFS with Ldap

2025-07-01 Thread Eugen Block
Hi, one of our use cases for CephFS is home directories for our LDAP users. The user's VMs use kernel mount with a autofs user which has the CephFS auth caps. So we don't have each user as a client but one main CephFS client. Maybe that helps as a workaround? Regards, Eugen Zitat von Bur

[ceph-users] Re: orchestrator behaving strangely

2025-06-28 Thread Eugen Block
Can you show the overall cluster status (ceph -s)? If there's something else going on, it might block (some?) operations. And I'd scan the mgr logs, maybe in debug mode to see why it fails to operate properly. Zitat von Holger Naundorf : On 27.06.25 14:16, Eugen Block wrote:

[ceph-users] Re: orchestrator behaving strangely

2025-06-27 Thread Eugen Block
effect now as well - or should I reissue the OSD rm command as well? Is there something in the queue (ceph orch osd rm status)? Sometimes the queue clears after a mgr restart, so it might be necessary to restart the rm command as well. Regards, Holger On 27.06.25 12:26, Eugen Block

[ceph-users] Re: orchestrator not behaving strangely

2025-06-27 Thread Eugen Block
Hi, have you retried it after restarting/failing the mgr? ceph mgr fail Quite often this (still) helps. Zitat von Holger Naundorf : Hello, we are running a ceph cluster at version: ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid (stable) and since a few weeks the orche

[ceph-users] Re: Reef: RGW logrotate "breaks" log_to_file

2025-06-26 Thread Eugen Block
aid 19.2.3 but meant the next reef unfortunately, a mistake was made in backporting some changes related to thread names and the radosgw process gets renamed to "notif-worker0" as a result. so commands like pkill expect that string instead of radosgw On Wed, Jun 25, 2025

[ceph-users] Re: Question about object maps and index rebuilding.

2025-06-26 Thread Eugen Block
Hi, we work with Openstack and Ceph as well, and we also support customers with such deployments, but in 10 years I haven't had to rebuild any object maps yet, ever. So I'm wondering what exactly you're seeing when you do have to rebuild them. One of our customers has a middle sized cloud (

[ceph-users] Re: Incomplete PG's

2025-06-25 Thread Eugen Block
Hi, in a previous thread you wrote that you had multiple simultaneous disk failures, and you replaced all of the drives. I assume that the failures happened across different hosts? And the remaining hosts and OSDs were not able to recover? I'm just trying to get a better idea of what exac

[ceph-users] Re: Reef: RGW logrotate "breaks" log_to_file

2025-06-25 Thread Eugen Block
Yes, that worked as expected. I can't see any negative impact yet. Zitat von Eugen Block : Thanks a lot, Casey. I'm still not sure why I couldn't find that myself, but thanks anyway. I have added notif-worker0 to the logrotate file in both a test cluster and one production c

[ceph-users] Re: ceph health mute behavior

2025-06-25 Thread Eugen Block
d, 25 Jun 2025 at 11:58, Eugen Block wrote: Thanks Frédéric. The customer found the sticky flag, too. I must admit, I haven't used the mute command too often yet, usually I try to get to the bottom of a warning and rather fix the underlying issue. :-D So the mute clears if the number

[ceph-users] Re: Reef: RGW logrotate "breaks" log_to_file

2025-06-25 Thread Eugen Block
ing some changes related > to thread names and the radosgw process gets renamed to > "notif-worker0" as a result. so commands like pkill expect that string > instead of radosgw > > On Wed, Jun 25, 2025 at 7:00 AM Eugen Block wrote: > > > > Interesting, it seems like

[ceph-users] Reef: RGW logrotate "breaks" log_to_file

2025-06-25 Thread Eugen Block
Hi, after upgrading multiple clusters from 18.2.4. some weeks ago, I noticed that the RGWs stop logging to file after the nightly logrotate. Other daemons don't seem to be affected, they continue logging to file. Restarting an RGW daemon helps until the next logrotate. I could reproduce

[ceph-users] Re: Reef: RGW logrotate "breaks" log_to_file

2025-06-25 Thread Eugen Block
ntil the process is restarted. Is there some workaround possible until we upgrade? Zitat von Eugen Block : Hi, after upgrading multiple clusters from 18.2.4. some weeks ago, I noticed that the RGWs stop logging to file after the nightly logrotate. Other daemons don't seem to be a

[ceph-users] Re: ceph health mute behavior

2025-06-25 Thread Eugen Block
er of affected PGs increased (which was decided to be a good reason to alert the admin). Have you tried to use the --sticky argument on the 'ceph health mute' command? Cheers, Frédéric. - Le 25 Juin 25, à 9:21, Eugen Block ebl...@nde.ag a écrit : Hi, I'm trying to und

[ceph-users] Re: Reef: RGW logrotate "breaks" log_to_file

2025-06-25 Thread Eugen Block
Hohenzollernstr. 27, 80801 Munich Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306 Eugen Block schrieb am Mi., 25. Juni 2025, 10:05: Hi, after upgrading multiple clusters from 18.2.4. some weeks ago, I noticed that the RGWs stop logging to file after the nightly logrotate. Other

[ceph-users] ceph health mute behavior

2025-06-25 Thread Eugen Block
Hi, I'm trying to understand the "ceph health mute" behavior. In this case, I'm referring to the warning PG_NOT_DEEP_SCRUBBED. If you mute it for a week and the cluster continues deep-scrubbing, the "mute" will clear at some point although there are still PGs not deep-scrubbed in time war

[ceph-users] Re: Debugging OSD cache thrashing

2025-06-22 Thread Eugen Block
The default OSD memory cache size is 4 GB, it’s not recommended to reduce it to such low values, especially if there’s real load on the cluster. I am not a developer, so I can’t really comment on the code. Zitat von Hector Martin : Hi all, I have a small 3-node cluster (4 HDD + 1 SSD OSD p

[ceph-users] Re: Debugging OSD cache thrashing

2025-06-22 Thread Eugen Block
Maybe you should ask this additionally on the devs mailing list. Zitat von Hector Martin : On 2025/06/23 0:21, Anthony D'Atri wrote: DIMMs are cheap. No DIMMs on Apple Macs. You’re running virtualized in VMs or containers, with OSDs, mons, mgr, and the constellation of other daemons

[ceph-users] Re: CEPH upgrade from 18.2.7 to 19.2.2 -- Hung from last 24h at 66%

2025-06-22 Thread Eugen Block
un 22, 2025, at 9:22 AM, Eugen Block wrote: The command 'ceph osd find ' is not the right one to query an OSD for the cluster network, it just shows the public address of an OSD (like a client would need to). Just use 'ceph osd dump' and look at the OSD output. Zi

[ceph-users] Re: CEPH upgrade from 18.2.7 to 19.2.2 -- Hung from last 24h at 66%

2025-06-22 Thread Eugen Block
The command 'ceph osd find ' is not the right one to query an OSD for the cluster network, it just shows the public address of an OSD (like a client would need to). Just use 'ceph osd dump' and look at the OSD output. Zitat von Devender Singh : Hello I checked on my all clusters everywh

[ceph-users] Re: OSD network Issue -- Not using cluster Network

2025-06-22 Thread Eugen Block
What's the output of ceph config dump | grep cluster_network and ceph config get osd cluster_network Is it only some OSDs or all not using cluster_network? It's not entirely clear from your question. OSDs automatically use the public_network as a fallback, so if all of them use the publi

[ceph-users] Re: Ceph 14.2.22 MONs keep crashing in MDSMonitor::maybe_resize_cluster (out of range)

2025-06-20 Thread Eugen Block
Cool, that's fantastic news! And a great analysis, too! I'm glad you got it back up and client operations could resume. Happy to help! Zitat von Miles Goodhew : On Thu, 19 Jun 2025, at 18:39, Eugen Block wrote: Zitat von Miles Goodhew : > On Thu, 19 Jun 2025, at 17:48, Euge

[ceph-users] Re: RadosGW: Even more large omap objects after resharding

2025-06-19 Thread Eugen Block
that pool. Now I have 167 omap objects that are not quite as big, but still too large. Sincerely Niklaus Hofer On 19/06/2025 14.48, Eugen Block wrote: Hi, the warnings about large omap objects are reported when deep-scrubs happen. So if you resharded the bucket (or Ceph did that for you), you&

[ceph-users] Re: RadosGW: Even more large omap objects after resharding

2025-06-19 Thread Eugen Block
us Hofer On 19/06/2025 14.48, Eugen Block wrote: Hi, the warnings about large omap objects are reported when deep-scrubs happen. So if you resharded the bucket (or Ceph did that for you), you'll either have to wait for the deep-scrub schedule to scrub the affected PGs, or you issue a

[ceph-users] Re: Autoscale warnings depite autoscaler being off

2025-06-19 Thread Eugen Block
Default question: have you tried to fail the mgr? ;-) ceph mgr fail Zitat von Niklaus Hofer : Dear all After upgrading to Pacific, we are now getting health warnings from the auto scaler: 10 pools have too few placement groups 8 pools have too many placement groups

[ceph-users] Re: RadosGW: Even more large omap objects after resharding

2025-06-19 Thread Eugen Block
Hi, the warnings about large omap objects are reported when deep-scrubs happen. So if you resharded the bucket (or Ceph did that for you), you'll either have to wait for the deep-scrub schedule to scrub the affected PGs, or you issue a manual deep-scrub on that PG or the entire pool. Re

[ceph-users] Re: Ceph 14.2.22 MONs keep crashing in MDSMonitor::maybe_resize_cluster (out of range)

2025-06-19 Thread Eugen Block
Zitat von Miles Goodhew : On Thu, 19 Jun 2025, at 17:48, Eugen Block wrote: Too bad. :-/ Could you increase the debug log level to 20? Maybe it gets a bit clearer where exactly it fails. I guess that's in `ceph.conf` with: [mon] debug_mon = 20 ? Correct. Good thinking: I&#

[ceph-users] Re: Ceph 14.2.22 MONs keep crashing in MDSMonitor::maybe_resize_cluster (out of range)

2025-06-19 Thread Eugen Block
Too bad. :-/ Could you increase the debug log level to 20? Maybe it gets a bit clearer where exactly it fails. Just to understand the current situation, you did reduce the monmap to 1 (mon3), then you tried the same with mon2. Because when you write: I'm guessing that mon2 is only "running" b

[ceph-users] Re: Ceph 14.2.22 MONs keep crashing in MDSMonitor::maybe_resize_cluster (out of range)

2025-06-18 Thread Eugen Block
mon store. But let's see how far you get before exploring this option. [1] https://heiterbiswolkig.blogs.nde.ag/2023/08/14/how-to-migrate-from-suse-enterprise-storage-to-upstream-ceph/ Zitat von Miles Goodhew : On Wed, 18 Jun 2025, at 18:09, Eugen Block wrote: That does look strange

[ceph-users] Re: Ceph 14.2.22 MONs keep crashing in MDSMonitor::maybe_resize_cluster (out of range)

2025-06-18 Thread Eugen Block
https://github.com/ceph/ceph/blob/v14.2.22/src/mon/MDSMonitor.cc#L1801 Zitat von Eugen Block : That does look strange indeed, either an upgrade went wrong or someone already fiddled with the monmap, I'd say. But anyway, I wouldn't try to deploy a 4th mon since it would want to sync the

[ceph-users] Re: Ceph 14.2.22 MONs keep crashing in MDSMonitor::maybe_resize_cluster (out of range)

2025-06-18 Thread Eugen Block
27;m just in a bit of decision paralysis about which mon to take as the survivor. All can run _individually_, but only mon2 will survive a group start. mon3 was the last one working, but it has the mysterious "failed to assign global ID" errors. I'm leaning toward using mon

[ceph-users] Re: Ceph 14.2.22 MONs keep crashing in MDSMonitor::maybe_resize_cluster (out of range)

2025-06-18 Thread Eugen Block
Hi, correct, SUSE's Ceph product was Salt-based, in this case 14.2.22 was shipped with SES 6. ;-) Do you also have some mon logs from right before the crash, maybe with a higher debug level? It could make sense to stop client traffic and OSDs as well to be able to recover. But unfortunate

[ceph-users] Re: How to clear the "slow operations" warning?

2025-06-17 Thread Eugen Block
Besides Michels response regarding the default of 24 hours after which the warning usually would disappear, I wanted to mention that we also saw this warning during some network issues we had. So if the disks seem okay, I'd recommend to check the network components. Zitat von Jan Kasprzak :

[ceph-users] Re: Use local reads with rados_replica_read_policy

2025-06-17 Thread Eugen Block
80801 Munich Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306 Eugen Block schrieb am Mo., 16. Juni 2025, 16:09: I just noticed that the options crush_location and read_from_replica from the rbd man page apparently only apply to rbd mapping options. That doesn't really he

[ceph-users] Re: Use local reads with rados_replica_read_policy

2025-06-16 Thread Eugen Block
duction of rados_replica_read_policy will make those localized reads available in general. Zitat von Eugen Block : Hi Frédéric, thanks a lot for looking into that, I appreciate it. Until a year ago or so we used custom location hooks for a few OSDs, but not for clients (yet). I hav

[ceph-users] Re: Use local reads with rados_replica_read_policy

2025-06-16 Thread Eugen Block
ack1|rack:myrack2|datacenter:mydc If you happen to test rados_replica_read_policy = localize, let us know how it works. ;-) Cheers, Frédéric. [1] https://github.com/ceph/ceph/blob/main/doc/man/8/rbd.rst - Le 13 Juin 25, à 10:56, Eugen Block ebl...@nde.ag a écrit : And a follow-up quest

[ceph-users] Re: Use local reads with rados_replica_read_policy

2025-06-13 Thread Eugen Block
? I'd appreciate any insights. Zitat von Eugen Block : Hi *, I have a question regarding the upcoming feature to optimize read performance [0] by reading from the nearest OSD, especially in a stretch cluster across two sites (or more). Anthony pointed me to [1], looks like a new c

[ceph-users] Use local reads with rados_replica_read_policy

2025-06-13 Thread Eugen Block
Hi *, I have a question regarding the upcoming feature to optimize read performance [0] by reading from the nearest OSD, especially in a stretch cluster across two sites (or more). Anthony pointed me to [1], looks like a new config option will be introduced in Tentacle: rados_replica_read

[ceph-users] Re: pool create via cli ignores size parameter

2025-06-11 Thread Eugen Block
I created: https://tracker.ceph.com/issues/71635 Zitat von Eugen Block : I think this is a bug. Looking at the mon log when creating such a pool, it appears that it's parsing the crush_rule as an erasure-code profile and then selects the default rule 0 (default replicated

[ceph-users] Re: pool create via cli ignores size parameter

2025-06-11 Thread Eugen Block
ze": 4}]': finished If I use "default" for the ec profile, it (incorrectly) assumes it's a rule name: soc9-ceph:~ # ceph osd pool create temp8 4 replicated default 1 0 4 Error ENOENT: specified rule default doesn't exist Although the mon command is parsed as

[ceph-users] Re: rgw + LDAP

2025-06-10 Thread Eugen Block
Hi, I didn't read the entire thread in detail, but to get some file mapped into the containers you can utilize extra-entrypoint-args [0]. [0] https://docs.ceph.com/en/reef/cephadm/services/#extra-entrypoint-arguments Zitat von Albert Shih : Le 10/06/2025 à 16:46:28+0200, Albert Shih a écri

[ceph-users] Re: Ceph Orchestrator error

2025-06-10 Thread Eugen Block
the source code. I assume that the "workaround" for Squid is to deploy manually the certifcates, right? Cheers Iztok On 10/06/25 12:31, Eugen Block wrote: I assume it's a mistake in the docs. Comparing the branches for 20.0.0 [0] and 19.2.2 [1] reveals that the generate_cert

[ceph-users] Re: mds daemon damaged

2025-06-10 Thread Eugen Block
Hi, did you only run the recover_dentries command or did you follow the entire procedure from your first message? If the cluster reports a healthy status, I assume that all is good. Zitat von b...@nocloud.ch: I think i was luky... ```sh [root@ceph1 ~]# cephfs-journal-tool --rank=cephfs:0

[ceph-users] Re: Confuse by rgw and certificate

2025-06-10 Thread Eugen Block
Setting the config-key manually is in addition to using rgw_frontend_ssl_certificate, it's not either or. But good that it works for you that way as well. Zitat von Albert Shih : Le 06/06/2025 à 18:14:52+0000, Eugen Block a écrit Hi, I don't have a good explanation for y

[ceph-users] Re: Ceph Orchestrator error

2025-06-10 Thread Eugen Block
I assume it's a mistake in the docs. Comparing the branches for 20.0.0 [0] and 19.2.2 [1] reveals that the generate_cert parameter is not present in Squid but will be in Tentacle. [0] https://github.com/ceph/ceph/blob/v20.0.0/src/python-common/ceph/deployment/service_spec.py#L1235 [1] htt

[ceph-users] Re: Confuse by rgw and certificate

2025-06-06 Thread Eugen Block
Hi, I don't have a good explanation for you, but it should be a workaround. I've been looking into all kinds of variations with concatenated certs etc., but what works for me is to set the mentioned config-key. You can find an example in the (old-ish) SUSE docs [0]. ceph config-key set rg

[ceph-users] Re: pool create via cli ignores size parameter

2025-06-06 Thread Eugen Block
Forgot to add that it's version 19.2.2 (also tried it on 19.2.0). Zitat von Eugen Block : Hi, without having checked the tracker, does anyone have an explanation why the size parameter is not applied when creating a pool via CLI? According to the help output for 'ceph osd pool

[ceph-users] pool create via cli ignores size parameter

2025-06-05 Thread Eugen Block
Hi, without having checked the tracker, does anyone have an explanation why the size parameter is not applied when creating a pool via CLI? According to the help output for 'ceph osd pool create -h' you can specify expected_num_objects (btw. I don't understand what impact that has, all I

[ceph-users] Re: MDS Repeatedly Crashing/Restarting - Unable to get CephFS Active

2025-06-05 Thread Eugen Block
cephfs-table-tool --cluster --rank=:all reset session And then finally bring the FS back up. And lastly,, conclussion in regards to my understanding of WA on 61009 is important in order to avoid this issue in the future. From: Eugen Block Sent:

[ceph-users] Re: Force remove undeletable image on RBD pool?

2025-06-05 Thread Eugen Block
Is that image in the trash? `rbd -p pool trash ls` Zitat von Gaël THEROND : Hi folks, I've a quick question. On one of our pool we found out an image that doesn't exist anymore physically (This image doesn't exist, have no snap attached, is not parent of another image) but is still listed whe

[ceph-users] Re: Mon Election Leader goes unresponsive with 100% cpu usage on fn_monstore

2025-06-01 Thread Eugen Block
/issues/71501#note-4 Respectfully, *Wes Dillingham* LinkedIn <http://www.linkedin.com/in/wesleydillingham> w...@wesdillingham.com On Fri, May 30, 2025 at 12:34 PM Eugen Block wrote: Okay, and a hardware issue can be ruled out, I assume? To get the cluster up again I would also consider starting on

[ceph-users] Re: Mon Election Leader goes unresponsive with 100% cpu usage on fn_monstore

2025-05-30 Thread Eugen Block
#x27;m not sure how to do that right now. Presumably those syncing tunables you tweaked only come into play if/when a mon reaches synchronizing? Respectfully, *Wes Dillingham* LinkedIn <http://www.linkedin.com/in/wesleydillingham> w...@wesdillingham.com On Fri, May 30, 2025 at 11:15 

[ceph-users] Re: Mon Election Leader goes unresponsive with 100% cpu usage on fn_monstore

2025-05-30 Thread Eugen Block
Hi Wes, although I don't have seen this exact issue, we did investigate a mon sync issue two years ago. The customer also has 5 MONs and two of them get out of quorum regularly in addition to the long sync times. For the syncing issue we found some workarounds (paxos settings), but we nev

[ceph-users] Re: Adding OSD with separate DB via "ceph orch daemon add osd"

2025-05-28 Thread Eugen Block
Just a note on db_slots, I don’t think it has ever worked properly, and last time I checked it still wasn’t implemented (https://www.spinics.net/lists/ceph-users/msg83189.html). This option should probably be entirely removed from the docs, unless it’s coming soon. Zitat von Anthony D'Atri

[ceph-users] Re: 18.2.7: modification of osd_deep_scrub_interval in config ignored by mon

2025-05-26 Thread Eugen Block
It’s reported by the mgr, so you’ll either have to pass global or mgr and osd to the configuration change. You can also check ‚ceph config help {CONFIG}‘ to check which services are related to that configuration value. Zitat von Michel Jouvin : The page I checked, https://docs.ceph.com/e

[ceph-users] Re: 18.2.7: modification of osd_deep_scrub_interval in config ignored by mon

2025-05-26 Thread Eugen Block
e with an incredibly high number of late deep scrubs that can be worrying... Michel Le 26/05/2025 à 09:56, Eugen Block a écrit : It’s reported by the mgr, so you’ll either have to pass global or mgr and osd to the configuration change. You can also check ‚ceph config help {CONFIG}‘ to check whic

[ceph-users] Re: MDS Repeatedly Crashing/Restarting - Unable to get CephFS Active

2025-05-20 Thread Eugen Block
ntion of the devs? BR. Kasper ____ From: Eugen Block Sent: Tuesday, May 20, 2025 15:51 To: Kasper Rasmussen Cc: Alexander Patrakov ; ceph-users@ceph.io Subject: Re: [ceph-users] Re: MDS Repeatedly Crashing/Restarting - Unable to get CephFS Active In that case I would back up both journals, ju

[ceph-users] Re: MDS Repeatedly Crashing/Restarting - Unable to get CephFS Active

2025-05-20 Thread Eugen Block
rank=:all --journal=mdlog journal inspect cephfs-journal-tool --rank=:all --journal=purge_queue journal inspect return: Overall journal integrity: OK From: Kasper Rasmussen Sent: Tuesday, May 20, 2025 09:48 To: Eugen Block ; Alexander Patrakov Cc: ceph-users

[ceph-users] Re: MDS Repeatedly Crashing/Restarting - Unable to get CephFS Active

2025-05-19 Thread Eugen Block
w to use such a backup if disaster recovery fails. Do you know the procedure? On Tue, May 20, 2025 at 1:23 AM Eugen Block wrote: Hi, not sure if it was related to journal replay, but have you checked for memory issues? What's the mds memory target? Any traces of an oom killer? Next I wou

[ceph-users] Re: Help in upgrading CEPH

2025-05-19 Thread Eugen Block
Just a quick update: I set auth_allow_insecure_global_id_reclaim to false because all the client sessions we had showed either new_ok or reclaim_ok in global_id_status. No complaints so far. :-) Zitat von Eugen Block : The mon sessions dump also shows the global_id_status, this could help

[ceph-users] Re: MDS Repeatedly Crashing/Restarting - Unable to get CephFS Active

2025-05-19 Thread Eugen Block
Hi, not sure if it was related to journal replay, but have you checked for memory issues? What's the mds memory target? Any traces of an oom killer? Next I would do is inspect the journals for both purge_queue and md_log: cephfs-journal-tool journal inspect --rank= --journal=md_log cephfs-

[ceph-users] Re: Help in upgrading CEPH

2025-05-19 Thread Eugen Block
ts for session properties without ellipsing. For this purpose, in my search I found the command  "ceph daemon mon-name sessions" were I saw the "luminous" word that in my mind was "wrong" from my top post of this thread. Il 16/05/2025 14:56, Eugen Block ha

[ceph-users] Re: 19.2.1: filestore created when bluestore requested

2025-05-16 Thread Eugen Block
Hi, which Ceph version is this? It's apparently not managed by cephadm. Zitat von "Konold, Martin" : Hi, I am working on a small 3 node ceph cluster which used to work as expected. When creating a new ceph osd the ceph-volume command throws some errors and filestore instead of bluestore is

  1   2   3   4   5   6   7   8   9   10   >