[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Jonas Schwab
Yes mgrs are running as intended. It just seems that mons and osd don't recongnize each other, because the monitors map is outdated. On 2025-04-11 07:07, Eugen Block wrote: Is at least one mgr running? PG states are reported by the mgr daemon. Zitat von Jonas Schwab : I solved the problem wit

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Eugen Block
Is at least one mgr running? PG states are reported by the mgr daemon. Zitat von Jonas Schwab : I solved the problem with executing ceph-mon. Among others, -i mon.rgw2-06 was not the correct option, but rather -i rgw2-06. Unfortunately, that brought the next problem: The cluster now shows "100

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Jonas Schwab
I solved the problem with executing ceph-mon. Among others, -i mon.rgw2-06 was not the correct option, but rather -i rgw2-06. Unfortunately, that brought the next problem: The cluster now shows "100.000% pgs unknown", which is probably because the monitor data is not complete up to date, but rath

[ceph-users] Re: OSDs ignore memory limit

2025-04-10 Thread Mark Nelson
Hi Jonas, Anthony gave some good advice for some things to check.  You can also dump the mempool statistics for OSDs that you identify are over their memory target using: "ceph daemon osd.NNN dump_mempools" The osd_memory_target code basically looks at the memory usage of the process and the

[ceph-users] Re: Repo name bug?

2025-04-10 Thread kefu chai
On Fri, Apr 11, 2025 at 10:39 AM Alex wrote: > I created a pull request, not sure what the etiquette is if I can > merge it. First timer here. > hi Alex, I cannot find your pull request in https://github.com/ceph/cephadm-ansible/ . did you create it in this project? > _

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

2025-04-10 Thread Anthony D'Atri
Link please. > On Apr 10, 2025, at 10:59 PM, Alex wrote: > > I made a Pull Request for cephadm.log set DEBUG. > Not sure if I should merge it. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ce

[ceph-users] Re: nodes with high density of OSDs

2025-04-10 Thread Anthony D'Atri
Filestore IIRC used partitions, with cute hex GPT types for various states and roles. Udev activation was sometimes problematic, and LVM tags are more flexible and reliable than the prior approach. There no doubt is more to it but that’s what I recall. > On Apr 10, 2025, at 9:11 PM, Tim Hol

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

2025-04-10 Thread Alex
I made a Pull Request for cephadm.log set DEBUG. Not sure if I should merge it. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Repo name bug?

2025-04-10 Thread Alex
I created a pull request, not sure what the etiquette is if I can merge it. First timer here. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: nodes with high density of OSDs

2025-04-10 Thread Tim Holloway
Peter, I don't think udev factors in based on the original question. Firstly, because I'm not sure udev deals with permanently-attached devices (it's more for hot-swap items). Secondly, because the original complaint mentioned LVM specifically. I agree that the hosts seem overloaded, by the

[ceph-users] Re: v19.2.2 Squid released

2025-04-10 Thread Harry G Coin
19.2.2 Installed! # ceph -s  cluster:    id:     ,,,    health: HEALTH_ERR    27 osds(s) are not reachable ...    osd: 27 osds: 27 up (since 32m), 27 in (since 5w) ... It's such a 'bad look' something so visible, in such an often given command. 10/4/25 06:00 PM[ERR]osd.27's public a

[ceph-users] Ceph squid fresh install

2025-04-10 Thread quag...@bol.com.br
Hi, I just did a new Ceph installation and would like to enable the "read balancer". However, the documentation requires that the minimum client version be reef. I checked this information through "ceph features" and came across the situation of having 2 luminous clients. # ceph featur

[ceph-users] Re: NIH Datasets

2025-04-10 Thread Tim Holloway
Sounds like a discussion for a discord server. Or BlueSky or something that's very definitely NOT what used to be known as twitter. My viewpoint is a little different. I really didn't consider HIPAA stuff, although since technically that is info that shouldn't be accessible to anyone but autho

[ceph-users] Re: Ceph squid fresh install

2025-04-10 Thread quag...@bol.com.br
___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: NIH Datasets

2025-04-10 Thread Laimis Juzeliūnas
Super cool idea - I too wanted to refer to blockchain methods to avoid data being tampered. Ceph would need a completely different distribution coded for such storage, however we could say that the fundamentals are already in place? Best, Laimis J. > On 7 Apr 2025, at 18:23, Tim Holloway wrot

[ceph-users] Diskprediction_local mgr module removal - Call for feedback

2025-04-10 Thread Yaarit Hatuka
Hi everyone, On today's Ceph Steering Committee call we discussed the idea of removing the diskprediction_local mgr module, as the current prediction model is obsolete and not maintained. We would like to gather feedback from the community about the usage of this module, and find out if anyone is

[ceph-users] Re: NIH Datasets

2025-04-10 Thread Tim Holloway
Hi Alex, "Cost concerns" is the fig leaf that is being used in many cases, but often a closer look indicates political motivations. The current US administration is actively engaged in the destruction of anything that would conflict with their view of the world. That includes health practic

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Jonas Schwab
I edited the monmap to include only rgw2-06 and then followed https://docs.ceph.com/en/squid/rados/operations/add-or-rm-mons/#adding-a-monitor-manual to create a new monitor. Unfortunately, `ceph-mon -i mon.rgw2-06 --public-addr 10.127.239.63 -f` crashed with the traceback seen in the attachment.

[ceph-users] v18.2.5 Reef released

2025-04-10 Thread Yuri Weinstein
We're happy to announce the 5th point release in the Reef series. We recommend users to update to this release. For detailed release notes with links & changelog please refer to the official blog entry at https://ceph.io/en/news/blog/2025/v18-2-5-reef-released/ Notable Changes --- *

[ceph-users] Re: nodes with high density of OSDs

2025-04-10 Thread Peter Grandi
> I have a 4 nodes with 112 OSDs each [...] As an aside I rekon that is not such a good idea as Ceph was designed for one-small-OSD per small-server and lots of them, but lots of people of course know better. > Maybe you can gimme a hint how to struggle it over? That is not so much a Ceph questi

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

2025-04-10 Thread Eugen Block
That was my assumption, yes. Zitat von Alex : Is this bit of code responsible for hardcoding DEBUG to cephadm.log? 'loggers': { '': { 'level': 'DEBUG', 'handlers': ['console', 'log_file'], } } in /var/lib/ceph//cephadm.* ? ___

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

2025-04-10 Thread Alex
I think it's the same block of code Eugen found. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] v19.2.2 Squid released

2025-04-10 Thread Yuri Weinstein
We're happy to announce the 2nd backport release in the Squid series. https://ceph.io/en/news/blog/2025/v19-2-2-squid-released/ Notable Changes --- - This hotfix release resolves an RGW data loss bug when CopyObject is used to copy an object onto itself. S3 clients typically do this

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

2025-04-10 Thread Laimis Juzeliūnas
Hey all, Just confirming that the same debug level has been in Reef and Squid. We got so used to it that just decided not to care anymore. Best, Laimis J. > On 8 Apr 2025, at 14:21, Alex wrote: > > Interesting. So it's like that for everybody? > Meaning cephadm.log logs debug messages. >

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

2025-04-10 Thread Alex
Is this bit of code responsible for hardcoding DEBUG to cephadm.log? 'loggers': { '': { 'level': 'DEBUG', 'handlers': ['console', 'log_file'], } } in /var/lib/ceph//cephadm.* ? ___ ceph-users mailing list

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Eugen Block
It depends a bit. Which mon do the OSDs still know about? You can check /var/lib/ceph//osd.X/config to retrieve that piece of information. I'd try to revive one of them. Do you still have the mon store.db for all of the mons or at least one of them? Just to be safe, back up all the store.db d

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Jonas Schwab
Again, thank you very much for your help! The container is not there any more, but I discovered that the "old" mon data still exists. I have the same situation for two mons I removed at the same time: $ monmaptool --print monmap1 monmaptool: monmap file monmap1 epoch 29 fsid 6d0d4ed4-0052-4eb9-9

[ceph-users] Re: Diskprediction_local mgr module removal - Call for feedback

2025-04-10 Thread Anthony D'Atri
>> anthonydatri@Mac models % pwd >> /Users/anthonydatri/git/ceph/src/pybind/mgr/diskprediction_local/models >> anthonydatri@Mac models % file redhat/* >> redhat/config.json: JSON data >> redhat/hgst_predictor.pkl:data >> redhat/hgst_scaler.pkl: data >> redhat/seagate_predictor

[ceph-users] Re: Diskprediction_local mgr module removal - Call for feedback

2025-04-10 Thread Lukasz Borek
+1 I wasn't aware that this module is obsolete and was trying to start it a few weeks ago. We develop a home-made solution some time ago to monitor smart data from both HDD (uncorrected errors, grown defect list) and SSD (WLC/TBW). But keeping it up to date with non-unified disk models is a nigh

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Eugen Block
It can work, but it might be necessary to modify the monmap first, since it's complaining that it has been removed from it. Are you familiar with the monmap-tool (https://docs.ceph.com/en/latest/man/8/monmaptool/)? The procedure is similar to changing a monitor's IP address the "messy way

[ceph-users] Re: Ceph squid fresh install

2025-04-10 Thread quag...@bol.com.br
More complete description: 1-) I formatted and installed the operating system 2-) This is "ceph installed": curl --silent --remote-name --location https://download.ceph.com/rpm-19.2.1/el9/noarch/cephadm chmod +x cephadm ./cephadm add-repo --release squid ./cephadm install cephadm -v bootstrap

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

2025-04-10 Thread Alex
I did have to add "su root root" to the log rotate script to fix the permissions issue. There's a RH KB article and Ceph github pull requests to fix it. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@cep

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Jonas Schwab
Thank you very much! I now stated the first step, namely "Collect the map from each OSD host". As I have a cephadm deployment, I will have to execute ceph-objectstore-tool within each container. Unfortunately, this produces the error "Mount failed with '(11) Resource temporarily unavailable'". Doe

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Jonas Schwab
I realized, I have access to a data directory of a monitor I removed just before the oopsie happened. Can I launch a ceph-mon from that? If I try just to launch ceph-mon, it commits suicide: 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 not in monmap and have been in a

[ceph-users] Re: nodes with high density of OSDs

2025-04-10 Thread Tim Holloway
That's quite a large number of storage units per machine. My suspicion is that since you have apparently an unusually high number of LVs coming online at boot, the time it takes to linearly activate them is long enough to overlap with the point in time that ceph starts bringing up its storage-

[ceph-users] Re: Image Live-Migration does not respond to OpenStack Glance images

2025-04-10 Thread Eugen Block
Hi, has it worked for any other glance image? The snapshot shouldn't make any difference, I just tried the same in a lab cluster. Have you checked on the client side (OpenStack) for anything in dmesg etc.? Can you query any information from that image? For example: rbd info images_meta/im

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

2025-04-10 Thread Alex
Thanks! ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Jonas Schwab
No, didn't issue any commands to the OSDs. On 2025-04-10 17:28, Eugen Block wrote: Did you stop the OSDs? Zitat von Jonas Schwab : Thank you very much! I now stated the first step, namely "Collect the map from each OSD host". As I have a cephadm deployment, I will have to execute ceph-object

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Eugen Block
No, you have to run the objectstore-tool command within the cephadm shell: cephadm shell --name osd.x -- ceph-objectstore-tool There are plenty examples online. I’m on my mobile phone right now Zitat von Jonas Schwab : Thank you for the help! Does that mean stopping the container and mountin

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Šarūnas Burdulis
On 4/10/25 10:01 AM, Jonas Schwab wrote: Hello everyone, I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? Depends on how really “nuked.” Are there monitor directories with data still under /var/lib/ceph/ by a chance

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Eugen Block
You have to stop the OSDs in order to mount them with the objectstore tool. Zitat von Jonas Schwab : No, didn't issue any commands to the OSDs. On 2025-04-10 17:28, Eugen Block wrote: Did you stop the OSDs? Zitat von Jonas Schwab : Thank you very much! I now stated the first step, namely

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Jonas Schwab
Thank you for the help! Does that mean stopping the container and mounting the lv? On 2025-04-10 17:38, Eugen Block wrote: You have to stop the OSDs in order to mount them with the objectstore tool. Zitat von Jonas Schwab : No, didn't issue any commands to the OSDs. On 2025-04-10 17:28, Euge

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Eugen Block
Did you stop the OSDs? Zitat von Jonas Schwab : Thank you very much! I now stated the first step, namely "Collect the map from each OSD host". As I have a cephadm deployment, I will have to execute ceph-objectstore-tool within each container. Unfortunately, this produces the error "Mount faile

[ceph-users] Cannot reinstate ceph fs mirror because i destroyed the ceph fs mirror peer/ target server

2025-04-10 Thread Jan Zeinstra
Hi, This is my first post to the forum and I don't know if it's appropriate, but I'd like to express my gratitude to all people working hard on ceph because I think it's a fantastic piece of software. The problem I'm having is caused by me; we had a well working ceph fs mirror solution; let's call

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Eugen Block
Can you bring back at least one of them? In that case you could reduce the monmap to 1 mon and bring the cluster back up. If the MONs are really dead, you can recover using OSDs [0]. I've never had to use that myself, but people have reported that to work. [0] https://docs.ceph.com/en/lat

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Robert Sander
Hi Jonas, Am 4/10/25 um 16:01 schrieb Jonas Schwab: I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. There is a procedure to recover the MON-DB from the OSDs: https://docs.ceph.com/en/reef/r

[ceph-users] Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Jonas Schwab
Hello everyone, I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. I am very grateful for all help! Best regards, Jonas Schwab ___ ceph-users mailing l

[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0

2025-04-10 Thread Eugen Block
Glad I could help! I'm also waiting for 18.2.5 to upgrade our own cluster from Pacific after getting rid of our cache tier. :-D Zitat von Jeremy Hansen : This seems to have worked to get the orch back up and put me back to 16.2.15. Thank you. Debating on waiting for 18.2.5 to move forward.

[ceph-users] Re: OSDs ignore memory limit

2025-04-10 Thread Frédéric Nass
Hi Jonas, Is swap enabled on OSD nodes? I've seen OSDs using way more memory than osd_memory_target and being OOM-killed from time to time just because swap was enabled. If that's the case, please disable swap in /etc/fstab and reboot the system. Regards, Frédéric. ___

[ceph-users] Re: nodes with high density of OSDs

2025-04-10 Thread Alex from North
Hello Dominique! Os is quite new - Ubuntu 22.04 with all the latest upgrades. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: nodes with high density of OSDs

2025-04-10 Thread Dominique Ramaekers
Hi Alex, Which OS? I had the same problem regarding not automatic activation of LVM's on an older version of Ubuntu. I never found a workaround except by upgrading to a newer release. > -Oorspronkelijk bericht- > Van: Alex from North > Verzonden: donderdag 10 april 2025 13:17 > Aan: ce

[ceph-users] nodes with high density of OSDs

2025-04-10 Thread Alex from North
Hello everybody! I have a 4 nodes with 112 OSDs each and 18.2.4. OSD consist of db on SSD and data on HDD For some reason, when I reboot node, not all OSDs get up because some VG or LV are not active. To make it alive again I manually do vgchange -ay $VG_NAME or lvchange -ay $LV_NAME. I suspect