[ceph-users] Re: Cephadm upgrade from 16.2.15 -> 17.2.0
Glad I could help! I'm also waiting for 18.2.5 to upgrade our own cluster from Pacific after getting rid of our cache tier. :-D Zitat von Jeremy Hansen : This seems to have worked to get the orch back up and put me back to 16.2.15. Thank you. Debating on waiting for 18.2.5 to move forward. -jeremy On Monday, Apr 07, 2025 at 1:26 AM, Eugen Block (mailto:ebl...@nde.ag)> wrote: Still no, just edit the unit.run file for the MGRs to use a different image. See Frédéric's instructions (now that I'm re-reading it, there's a little mistake with dots and hyphens): # Backup the unit.run file $ cp /var/lib/ceph/$(ceph fsid)/mgr.ceph01.eydqvm/unit.run{,.bak} # Change container image's signature. You can get the signature of the version you want to reach from https://quay.io/repository/ceph/ceph?tab=tags. It's in the URL of a version. $ sed -i 's/ceph@sha256:e40c19cd70e047d14d70f5ec3cf501da081395a670cd59ca881ff56119660c8f/ceph@sha256:d26c11e20773704382946e34f0d3d2c0b8bb0b7b37d9017faa9dc11a0196c7d9/g' /var/lib/ceph/$(ceph fsid)/mgr.ceph01.eydqvm/unit.run # Restart the container (systemctl daemon-reload not needed) $ systemctl restart ceph-$(ceph fsid)(a)mgr.ceph01.eydqvm.service # Run this command a few times and it should show the new version ceph orch ps --refresh --hostname ceph01 | grep mgr To get the image signature, you can also look into the other unit.run files, a version tag would also work. It depends on how often you need the orchestrator to maintain the cluster. If you have the time, you could wait a bit longer for other responses. If you need the orchestrator in the meantime, you can roll back the MGRs. https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/32APKOXKRAIZ7IDCNI25KVYFCCCF6RJG/ Zitat von Jeremy Hansen : > Thank you. The only thing I’m unclear on is the rollback to pacific. > > Are you referring to > > > > > > > https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-manager-daemon > > Thank you. I appreciate all the help. Should I wait for Adam to > comment? At the moment, the cluster is functioning enough to > maintain running vms, so if it’s wise to wait, I can do that. > > -jeremy > > > On Monday, Apr 07, 2025 at 12:23 AM, Eugen Block > (mailto:ebl...@nde.ag)> wrote: > > I haven't tried it this way yet, and I had hoped that Adam would chime > > in, but my approach would be to remove this key (it's not present when > > no upgrade is in progress): > > > > ceph config-key rm mgr/cephadm/upgrade_state > > > > Then rollback the two newer MGRs to Pacific as described before. If > > they come up healthy, test if the orchestrator works properly first. > > For example, remove a node-exporter or crash or anything else > > uncritical and let it redeploy. > > If that works, try a staggered upgrade, starting with the MGRs only: > > > > ceph orch upgrade start --image --daemon-types mgr > > > > Since there's no need to go to Quincy, I suggest to upgrade to Reef > > 18.2.4 (or you wait until 18.2.5 is released, which should be very > > soon), so set the respective in the above command. > > > > If all three MGRs successfully upgrade, you can continue with the > > MONs, or with the entire rest. > > > > In production clusters, I usually do staggered upgrades, e. g. I limit > > the number of OSD daemons first just to see if they come up healthy, > > then I let it upgrade all other OSDs automatically. > > > > https://docs.ceph.com/en/latest/cephadm/upgrade/#staggered-upgrade > > > > Zitat von Jeremy Hansen : > > > > > Snipped some of the irrelevant logs to keep message size down. > > > > > > ceph config-key get mgr/cephadm/upgrade_state > > > > > > {"target_name": "quay.io/ceph/ceph:v17.2.0", "progress_id": > > > "e7e1a809-558d-43a7-842a-c6229fdc57af", "target_id": > > > "e1d6a67b021eb077ee22bf650f1a9fb1980a2cf5c36bdb9cba9eac6de8f702d9", > > > "target_digests": > > > > > ["quay.io/ceph/ceph@sha256:12a0a4f43413fd97a14a3d47a3451b2d2df50020835bb93db666209f3f77617a", "quay.io/ceph/ceph@sha256:cb4d698cb769b6aba05bf6ef04f41a7fe694160140347576e13bd9348514b667"], "target_version": "17.2.0", "fs_original_max_mds": null, "fs_original_allow_standby_replay": null, "error": null, "paused": false, "daemon_types": null, "hosts": null, "services": null, "total_count": null, > > "remaining_count": > > > null} > > > > > > What should I do next? > > > > > > Thank you! > > > -jeremy > > > > > > > On Sunday, Apr 06, 2025 at 1:38 AM, Eugen Block > > > (mailto:ebl...@nde.ag)> wrote: > > > > Can you check if you have this config-key? > > > > > > > > ceph config-key get mgr/cephadm/upgrade_state > > > > > > > > If you reset the MGRs, it might be necessary to clear this key, > > > > otherwise you might end up in some inconsistency. Just to be sure. > > > > > > > > Zitat von Jeremy Hansen : > > > > > > > > > Thanks. I’m trying to be extra careful since this cluster is > > > > > actually in use. I’ll wait for your feedback. > > > > > > > > > > -jeremy > > > > > > > > > > > On
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
Can you bring back at least one of them? In that case you could reduce the monmap to 1 mon and bring the cluster back up. If the MONs are really dead, you can recover using OSDs [0]. I've never had to use that myself, but people have reported that to work. [0] https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds Zitat von Jonas Schwab : Hello everyone, I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. I am very grateful for all help! Best regards, Jonas Schwab ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Cannot reinstate ceph fs mirror because i destroyed the ceph fs mirror peer/ target server
Hi, This is my first post to the forum and I don't know if it's appropriate, but I'd like to express my gratitude to all people working hard on ceph because I think it's a fantastic piece of software. The problem I'm having is caused by me; we had a well working ceph fs mirror solution; let's call it source cluster A, and target cluster B. Source cluster A is a modest cluster consisting of 6 instances, 3 OSD instances, and 3 mon instances. The OSD instances all have 3 disks (HDD's) and 3 OSD demons, totalling 9 OSD daemons and 9 HDD's. Target cluster B is a single node system having 3 OSD daemons and 3 HDD's. Both clusters run ceph 18.2.4 reef. Both clusters use Ubuntu 22.04 as OS throughout. Both systems are installed using cephadm. I have destroyed cluster B, and have built it from the ground up (I made a mistake in PG sizing in the original cluster) Now i find i cannot create/ reinstate the mirroring between 2 ceph fs filesystems, and i suspect there is a peer left behind in the filesystem of the source, pointing to the now non-existent target cluster. When i do 'ceph fs snapshot mirror peer_list prodfs', i get: '{"f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5": {"client_name": "client.mirror_remote", "site_name": "bk-site", "fs_name": "prodfs"}}' When i try to delete it: 'ceph fs snapshot mirror peer_remove prodfs f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5', i get: 'Error EACCES: failed to remove peeraccess denied: does your client key have mgr caps? See http://docs.ceph.com/en/latest/mgr/administrator/#client-authentication', but the logging of the daemon points to the more likely reason of failure: Apr 08 12:54:26 s1mon systemd[1]: Started Ceph cephfs-mirror.s1mon.lvlkwp for d0ea284a-8a16-11ee-9232-5934f0f00ec2. Apr 08 12:54:26 s1mon cephfs-mirror[310088]: set uid:gid to 167:167 (ceph:ceph) Apr 08 12:54:26 s1mon cephfs-mirror[310088]: ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable), process cephfs-mirror, pid 2 Apr 08 12:54:26 s1mon cephfs-mirror[310088]: pidfile_write: ignore empty --pid-file Apr 08 12:54:26 s1mon cephfs-mirror[310088]: mgrc service_daemon_register cephfs-mirror.22849497 metadata {arch=x86_64,ceph_release=reef,ceph_version=ceph version 18.2.4 (e7ad5345525c7a> Apr 08 12:54:30 s1mon cephfs-mirror[310088]: cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5) init: remote monitor host=[v2:172.17.16.12:3300/0,v1:172.17.16.12:6789/0] Apr 08 12:54:30 s1mon conmon[310082]: 2025-04-08T10:54:30.365+ 7f57c51ba640 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1] Apr 08 12:54:30 s1mon conmon[310082]: 2025-04-08T10:54:30.365+ 7f57d81e0640 -1 cephfs::mirror::Utils connect: error connecting to bk-site: (13) Permission denied Apr 08 12:54:30 s1mon cephfs-mirror[310088]: cephfs::mirror::Utils connect: error connecting to bk-site: (13) Permission denied Apr 08 12:54:30 s1mon conmon[310082]: 2025-04-08T10:54:30.365+ 7f57d81e0640 -1 cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5) init: error connecting to remote cl> Apr 08 12:54:30 s1mon cephfs-mirror[310088]: cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5) init: error connecting to remote cluster: (13) Permission denied Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 Apr 09 00:00:16 s1mon conmon[310082]: 2025-04-08T22:00:16.362+ 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm()> Apr 09 00:00:16 s1mon conmon[310082]: 2025-04-08T22:00:16.386+ 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm()> Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 Apr 09 00:00:16 s1mon conmon[310082]: 2025-04-08T22:00:16.430+ 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm()> Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 Apr 09 00:00:16 s1mon conmon[310082]: 2025-04-08T22:00:16.466+ 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm()> Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 Apr 10 00:00:01 s1mon cephfs-mirror[310088]: received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 Apr 10 00:00:01 s1mon conmon[310082]: 2025-04-09T22:00:01.767+ 7f57d99e3640 -1 received signal: Hangup from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm()> Apr 10 00:00:01
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
No, you have to run the objectstore-tool command within the cephadm shell: cephadm shell --name osd.x -- ceph-objectstore-tool There are plenty examples online. I’m on my mobile phone right now Zitat von Jonas Schwab : Thank you for the help! Does that mean stopping the container and mounting the lv? On 2025-04-10 17:38, Eugen Block wrote: You have to stop the OSDs in order to mount them with the objectstore tool. Zitat von Jonas Schwab : No, didn't issue any commands to the OSDs. On 2025-04-10 17:28, Eugen Block wrote: Did you stop the OSDs? Zitat von Jonas Schwab : Thank you very much! I now stated the first step, namely "Collect the map from each OSD host". As I have a cephadm deployment, I will have to execute ceph-objectstore-tool within each container. Unfortunately, this produces the error "Mount failed with '(11) Resource temporarily unavailable'". Does anybody know how to solve this? Best regards, Jonas On 2025-04-10 16:04, Robert Sander wrote: Hi Jonas, Am 4/10/25 um 16:01 schrieb Jonas Schwab: I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. There is a procedure to recover the MON-DB from the OSDs: https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds Regards ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
No, didn't issue any commands to the OSDs. On 2025-04-10 17:28, Eugen Block wrote: Did you stop the OSDs? Zitat von Jonas Schwab : Thank you very much! I now stated the first step, namely "Collect the map from each OSD host". As I have a cephadm deployment, I will have to execute ceph-objectstore-tool within each container. Unfortunately, this produces the error "Mount failed with '(11) Resource temporarily unavailable'". Does anybody know how to solve this? Best regards, Jonas On 2025-04-10 16:04, Robert Sander wrote: Hi Jonas, Am 4/10/25 um 16:01 schrieb Jonas Schwab: I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. There is a procedure to recover the MON-DB from the OSDs: https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds Regards ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
Did you stop the OSDs? Zitat von Jonas Schwab : Thank you very much! I now stated the first step, namely "Collect the map from each OSD host". As I have a cephadm deployment, I will have to execute ceph-objectstore-tool within each container. Unfortunately, this produces the error "Mount failed with '(11) Resource temporarily unavailable'". Does anybody know how to solve this? Best regards, Jonas On 2025-04-10 16:04, Robert Sander wrote: Hi Jonas, Am 4/10/25 um 16:01 schrieb Jonas Schwab: I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. There is a procedure to recover the MON-DB from the OSDs: https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds Regards ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
Thank you for the help! Does that mean stopping the container and mounting the lv? On 2025-04-10 17:38, Eugen Block wrote: You have to stop the OSDs in order to mount them with the objectstore tool. Zitat von Jonas Schwab : No, didn't issue any commands to the OSDs. On 2025-04-10 17:28, Eugen Block wrote: Did you stop the OSDs? Zitat von Jonas Schwab : Thank you very much! I now stated the first step, namely "Collect the map from each OSD host". As I have a cephadm deployment, I will have to execute ceph-objectstore-tool within each container. Unfortunately, this produces the error "Mount failed with '(11) Resource temporarily unavailable'". Does anybody know how to solve this? Best regards, Jonas On 2025-04-10 16:04, Robert Sander wrote: Hi Jonas, Am 4/10/25 um 16:01 schrieb Jonas Schwab: I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. There is a procedure to recover the MON-DB from the OSDs: https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds Regards ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
On 4/10/25 10:01 AM, Jonas Schwab wrote: Hello everyone, I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? Depends on how really “nuked.” Are there monitor directories with data still under /var/lib/ceph/ by a chance? If so, monitors can be started simply as ceph-mon services, at least temporarily, by pointing to those directories. -- Šarūnas Burdulis Dartmouth Mathematics math.dartmouth.edu/~sarunas · https://useplaintext.email · OpenPGP_signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
You have to stop the OSDs in order to mount them with the objectstore tool. Zitat von Jonas Schwab : No, didn't issue any commands to the OSDs. On 2025-04-10 17:28, Eugen Block wrote: Did you stop the OSDs? Zitat von Jonas Schwab : Thank you very much! I now stated the first step, namely "Collect the map from each OSD host". As I have a cephadm deployment, I will have to execute ceph-objectstore-tool within each container. Unfortunately, this produces the error "Mount failed with '(11) Resource temporarily unavailable'". Does anybody know how to solve this? Best regards, Jonas On 2025-04-10 16:04, Robert Sander wrote: Hi Jonas, Am 4/10/25 um 16:01 schrieb Jonas Schwab: I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. There is a procedure to recover the MON-DB from the OSDs: https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds Regards ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSDs ignore memory limit
Hi Jonas, Is swap enabled on OSD nodes? I've seen OSDs using way more memory than osd_memory_target and being OOM-killed from time to time just because swap was enabled. If that's the case, please disable swap in /etc/fstab and reboot the system. Regards, Frédéric. De : Jonas Schwab Envoyé : mercredi 9 avril 2025 13:54 À : ceph-users@ceph.io Objet : [ceph-users] OSDs ignore memory limit Hello everyone, I recently have many problems with OSDs using much more memory than they are supposed to (> 10GB), leading to the node running out of memory and killing processes. Does someone have ideas why the daemons seem to completely ignore the set memory limits? See e.g. the following: $ ceph orch ps ceph2-03 NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mon.ceph2-03 ceph2-03 running (3h) 1s ago 2y 501M 2048M 19.2.1 f2efb0401a30 d876fc30f741 node-exporter.ceph2-03 ceph2-03 *:9100 running (3h) 1s ago 17M 46.5M - 1.7.0 72c9c2088986 d32ec4d266ea osd.4 ceph2-03 running (26m) 1s ago 2y 10.2G 3310M 19.2.1 f2efb0401a30 b712a86dacb2 osd.11 ceph2-03 running (5m) 1s ago 2y 3458M 3310M 19.2.1 f2efb0401a30 f3d7705325b4 osd.13 ceph2-03 running (3h) 1s ago 6d 2059M 3310M 19.2.1 f2efb0401a30 980ee7e11252 osd.17 ceph2-03 running (114s) 1s ago 2y 3431M 3310M 19.2.1 f2efb0401a30 be7319fda00b osd.23 ceph2-03 running (30m) 1s ago 2y 10.4G 3310M 19.2.1 f2efb0401a30 9cfb86c4b34a osd.29 ceph2-03 running (8m) 1s ago 2y 4923M 3310M 19.2.1 f2efb0401a30 d764930bb557 osd.35 ceph2-03 running (14m) 1s ago 2y 7029M 3310M 19.2.1 f2efb0401a30 6a4113adca65 osd.59 ceph2-03 running (2m) 1s ago 2y 2821M 3310M 19.2.1 f2efb0401a30 8871d6d4f50a osd.61 ceph2-03 running (49s) 1s ago 2y 1090M 3310M 19.2.1 f2efb0401a30 3f7a0ed17ac2 osd.67 ceph2-03 running (7m) 1s ago 2y 4541M 3310M 19.2.1 f2efb0401a30 eea0a6bcefec osd.75 ceph2-03 running (3h) 1s ago 2y 1239M 3310M 19.2.1 f2efb0401a30 5a801902340d Best regards, Jonas -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Urgent help: I accidentally nuked all my Monitor
Hello everyone, I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. I am very grateful for all help! Best regards, Jonas Schwab ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
Hi Jonas, Am 4/10/25 um 16:01 schrieb Jonas Schwab: I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. There is a procedure to recover the MON-DB from the OSDs: https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds Regards -- Robert Sander Linux Consultant Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: +49 30 405051 - 0 Fax: +49 30 405051 - 19 Amtsgericht Berlin-Charlottenburg - HRB 220009 B Geschäftsführer: Peer Heinlein - Sitz: Berlin ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log
Thanks! ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] nodes with high density of OSDs
Hello everybody! I have a 4 nodes with 112 OSDs each and 18.2.4. OSD consist of db on SSD and data on HDD For some reason, when I reboot node, not all OSDs get up because some VG or LV are not active. To make it alive again I manually do vgchange -ay $VG_NAME or lvchange -ay $LV_NAME. I suspect it is linked to high amount of vg/lv but cannot find an answer. Maybe you can gimme a hint how to struggle it over? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: nodes with high density of OSDs
Hi Alex, Which OS? I had the same problem regarding not automatic activation of LVM's on an older version of Ubuntu. I never found a workaround except by upgrading to a newer release. > -Oorspronkelijk bericht- > Van: Alex from North > Verzonden: donderdag 10 april 2025 13:17 > Aan: ceph-users@ceph.io > Onderwerp: [ceph-users] nodes with high density of OSDs > > Hello everybody! > I have a 4 nodes with 112 OSDs each and 18.2.4. OSD consist of db on SSD and > data on HDD For some reason, when I reboot node, not all OSDs get up > because some VG or LV are not active. > To make it alive again I manually do vgchange -ay $VG_NAME or lvchange -ay > $LV_NAME. > > I suspect it is linked to high amount of vg/lv but cannot find an answer. > > Maybe you can gimme a hint how to struggle it over? > ___ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email > to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: nodes with high density of OSDs
Hello Dominique! Os is quite new - Ubuntu 22.04 with all the latest upgrades. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
It can work, but it might be necessary to modify the monmap first, since it's complaining that it has been removed from it. Are you familiar with the monmap-tool (https://docs.ceph.com/en/latest/man/8/monmaptool/)? The procedure is similar to changing a monitor's IP address the "messy way" (https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method). I also wrote a blog post how to do it with cephadm: https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/ But before changing anything, I'd inspect first what the current status is. You can get the current monmap from within the mon container (is it still there?): cephadm shell --name mon. ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out monmap monmaptool --print monmap You can paste the output here, if you want. Zitat von Jonas Schwab : I realized, I have access to a data directory of a monitor I removed just before the oopsie happened. Can I launch a ceph-mon from that? If I try just to launch ceph-mon, it commits suicide: 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 not in monmap and have been in a quorum before; must have been removed 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 commit suicide! 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize On 2025-04-10 16:01, Jonas Schwab wrote: Hello everyone, I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. I am very grateful for all help! Best regards, Jonas Schwab ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Diskprediction_local mgr module removal - Call for feedback
+1 I wasn't aware that this module is obsolete and was trying to start it a few weeks ago. We develop a home-made solution some time ago to monitor smart data from both HDD (uncorrected errors, grown defect list) and SSD (WLC/TBW). But keeping it up to date with non-unified disk models is a nightmare. Alert : "OSD.12 is going to fail. Replace it soon" before seeing SLOW_OPS would be a game changer! Thanks! On Tue, 8 Apr 2025 at 10:00, Michal Strnad wrote: > Hi. > > From our point of view, it's important to keep disk failure prediction > tool as part of Ceph, ideally as an MGR module. In environments with > hundreds or thousands of disks, it's crucial to know whether, for > example, a significant number of them are likely to fail within a month > - which, in the best-case scenario, would mean performance degradation, > and in the worst-case, data loss. > > Some have already responded to the deprecation of diskprediction by > starting to develop their own solutions. For instance, just yesterday, > Daniel Persson published a solution [1] on his website that addresses > the same problem. > > Would it be possible to join forces and try to revive that module? > > [1] https://www.youtube.com/watch?v=Gr_GtC9dcMQ > > Thanks, > Michal > > > On 4/8/25 01:18, Yaarit Hatuka wrote: > > Hi everyone, > > > > On today's Ceph Steering Committee call we discussed the idea of removing > > the diskprediction_local mgr module, as the current prediction model is > > obsolete and not maintained. > > > > We would like to gather feedback from the community about the usage of > this > > module, and find out if anyone is interested in maintaining it. > > > > Thanks, > > Yaarit > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Łukasz Borek luk...@borek.org.pl ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
I realized, I have access to a data directory of a monitor I removed just before the oopsie happened. Can I launch a ceph-mon from that? If I try just to launch ceph-mon, it commits suicide: 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 not in monmap and have been in a quorum before; must have been removed 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 commit suicide! 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize On 2025-04-10 16:01, Jonas Schwab wrote: Hello everyone, I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. I am very grateful for all help! Best regards, Jonas Schwab ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Diskprediction_local mgr module removal - Call for feedback
>> anthonydatri@Mac models % pwd >> /Users/anthonydatri/git/ceph/src/pybind/mgr/diskprediction_local/models >> anthonydatri@Mac models % file redhat/* >> redhat/config.json: JSON data >> redhat/hgst_predictor.pkl:data >> redhat/hgst_scaler.pkl: data >> redhat/seagate_predictor.pkl: data >> redhat/seagate_scaler.pkl:data >> anthonydatri@Mac models % > > These are Python pickle files from 2019 containing ML models made with a > version of sklearn from 2019. Leerer Blick IMHO binaries don’t belong in git repositories and the approach kinda sounds like trying to be clever and trendy for the sake of being clever and trendy. Cf. the KISS principle. By which I mean keeping it simple, not lip-syncing when you should have retired in the 1990s. I’ve had good luck in the past with an (admittedly ugly) SMART collector that dumped harmonized metrics into the textfile_collector directory for node_exporter to pick up, then using conventional Alertmanager rules, which are easy to write, improve, and tweak for local conditions. If kept as a Manager module I could see this being yet another thing hampering scalability. Were we to implement a framework for normalizing metrics for given drive models — and honestly that’s what it takes to be useful — the community could PR the individual SKU entries over time. I would draw a line in the sand up front: no client SKUs will be accepted, no USB/Thunderbolt drives, no HBA/SAN mirages. Only physical, enterprise drive SKUs. Client drive failures are trivially predicted as simply SOON. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log
I did have to add "su root root" to the log rotate script to fix the permissions issue. There's a RH KB article and Ceph github pull requests to fix it. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
Thank you very much! I now stated the first step, namely "Collect the map from each OSD host". As I have a cephadm deployment, I will have to execute ceph-objectstore-tool within each container. Unfortunately, this produces the error "Mount failed with '(11) Resource temporarily unavailable'". Does anybody know how to solve this? Best regards, Jonas On 2025-04-10 16:04, Robert Sander wrote: Hi Jonas, Am 4/10/25 um 16:01 schrieb Jonas Schwab: I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. There is a procedure to recover the MON-DB from the OSDs: https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds Regards ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Image Live-Migration does not respond to OpenStack Glance images
Hi, has it worked for any other glance image? The snapshot shouldn't make any difference, I just tried the same in a lab cluster. Have you checked on the client side (OpenStack) for anything in dmesg etc.? Can you query any information from that image? For example: rbd info images_meta/image_name rbd status images_meta/image_name Is the Ceph cluster healthy? Maybe you have inactive PGs on the glance pool? Zitat von "Yuta Kambe (Fujitsu)" : Hi everyone. I am trying Image Live-Migration but it is not working well and I would like some advice. https://docs.ceph.com/en/latest/rbd/rbd-live-migration/ I use Ceph as a backend for OpenStack Glance. I tried to migrate the Pool of Ceph used in Glance to the new Pool. Source Pool: - images_meta : metadata pool, Replication - images_data : data pool, Erasure Code Target Pool: - images_meta: metadata pool, Replication (Same as source Pool) - images_data_hdd: data pool, Erasure Code The following command I executed, but did not return a response. rbd migration prepare images_meta/image_name images_meta/image_name --data-pool images_data_hdd I checked the logs in /var/log/messages and /var/log/ceph, but no useful information was available. I would like some advice on this. - Are there any other logs I should check? - Is there a case where the rbd migration command cannot be executed? The following is supplemental information. - ceph version 17.2.8 - The migration of the OpenStack Nova image was successful with the same Pool configuration and command. - I don't know if it is related, but there is a snapshot in the image of Glance, and unprotect of the snapshot is also unresponsive. rbd snap unprotect images_meta/image_name@snap ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: nodes with high density of OSDs
That's quite a large number of storage units per machine. My suspicion is that since you have apparently an unusually high number of LVs coming online at boot, the time it takes to linearly activate them is long enough to overlap with the point in time that ceph starts bringing up its storage-dependent components. Likely not only OSDs, but other resources that might keep internal databases and the like. The cure for that under systemd would be to make Ceph - or at least its storage-dependent services - wait on LV availability. The fun part is figuring out how to do that. Offhand, I don't know what in systemd controls the activation of LVM resources and it's almost certainly being done asynchronously, so you'd need to provide a detector service that could determine when things were available. Then you'd have to tweak Ceph not to start until the safe time has arrived. You might be able to edit the master ceph target to add such a dependency using an /etc/systemd/system override, but admittedly that doesn't cover allowing everything to come up as soon as possible but no sooner. In particular, it would be hard to edit the individual OSDs to wait on their LVs, as the systemd components for OSDs on an administered system are constructed dynamically and do not persist when the system reboots, so it would likely require a worst-case delay. Regards, Tim On 4/10/25 07:45, Alex from North wrote: Hello Dominique! Os is quite new - Ubuntu 22.04 with all the latest upgrades. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph squid fresh install
More complete description: 1-) I formatted and installed the operating system 2-) This is "ceph installed": curl --silent --remote-name --location https://download.ceph.com/rpm-19.2.1/el9/noarch/cephadm chmod +x cephadm ./cephadm add-repo --release squid ./cephadm install cephadm -v bootstrap --mon-ip 172.27.254.6 --cluster-network 172.28.254.0/24 --log-to-file cephadm install ceph-common De: "Anthony D'Atri" Enviada: 2025/04/08 10:35:22 Para: quag...@bol.com.br Cc: ebl...@nde.ag, ceph-users@ceph.io Assunto: Re: [ceph-users] Ceph squid fresh install What does “ceph installed” mean? I suspect that this description is not complete. On Apr 8, 2025, at 9:21 AM, quag...@bol.com.br wrote: What is a “storage server”? These are machines that only have the operating system and ceph installed. De: "Anthony D'Atri" Enviada: 2025/04/08 10:19:08 Para: quag...@bol.com.br Cc: ebl...@nde.ag, ceph-users@ceph.io Assunto: Re: [ceph-users] Ceph squid fresh install > On Apr 8, 2025, at 9:13 AM, quag...@bol.com.br wrote: > > These 2 IPs are from the storage servers. What is a “storage server”? > There are no user processes running on them. It only has the operating system and ceph installed. Nobody said anything about user processes. > > > Rafael. > > De: "Eugen Block" > Enviada: 2025/04/08 09:35:35 > Para: quag...@bol.com.br > Cc: ceph-users@ceph.io > Assunto: Re: [ceph-users] Ceph squid fresh install > > These are your two Luminous clients: > > ---snip--- > { > "name": "unknown.0", > "entity_name": "client.admin", > "addrs": { > "addrvec": [ > { > "type": "none", > "addr": "172.27.254.7:0", > "nonce": 443842330 > } > ] > }, > "socket_addr": { > "type": "none", > "addr": "172.27.254.7:0", > "nonce": 443842330 > }, > "con_type": "client", > "con_features": 3387146417253690110, > "con_features_hex": "2f018fb87aa4aafe", > "con_features_release": "luminous", > ... > > { > "name": "client.104098", > "entity_name": "client.admin", > "addrs": { > "addrvec": [ > { > "type": "v1", > "addr": "172.27.254.6:0", > "nonce": 2027668300 > } > ] > }, > "socket_addr": { > "type": "v1", > "addr": "172.27.254.6:0", > "nonce": 2027668300 > }, > "con_type": "client", > "con_features": 3387146417253690110, > "con_features_hex": "2f018fb87aa4aafe", > "con_features_release": "luminous", > ---snip--- > > Zitat von quag...@bol.com.br: > > > Hi Eugen! Thanks a lot! I was able to find luminous connections, > > but I still can't identify which client process. Here is the output: > > Rafael. > > ── > > De: "Eugen Block" Enviada: 2025/04/08 04:37:47 Para: ceph-users@ceph.io > > Assunto: [ceph-users] Re: Ceph squid fresh install Hi, you can query > > the MON sessions to identify your older clients with: ceph tell mon. > > sessions It will show you the IP address, con_features_release (Luminous) > > and a couple of other things. Zitat von Laura Flores : > Hi Rafael, >> I > > would not force the min_compat_client to be reef when there are still > > > luminous clients connected, as it is important for all clients to be >=Reef > >> to understand/encode the pg_upmap_primary feature in the osdmap. >> As > > for checking which processes are still luminous, I am copying @Radoslaw > > > Zarzynski who may be able to help more with that. >> Thanks, > Laura > > Flores >> On Mon, Apr 7, 2025 at 11:30 AM quag...@bol.com.br > wrote: > Hi, >> I just did a new Ceph installation and would like to enable the > > "read >> balancer". >> However, the documentation requires that the minimum > > client version >> be reef. I checked this information through "ceph > > features" and came across >> the situation of having 2 luminous clients. >> > > # ceph features >> { >> "mon": [ >> { >> "features": "0x3f03cffd", > >>> "release": "squid", >> "num": 2 >> } >> ], >> "mds": [ >> { >> > > "features": "0x3f03cffd", >> "release": "squid", >> "num": 2 >> } > >>> ], >> "osd": [ >> { >> "features": "0x3f03cffd", >> "release": > > "squid", >> "num": 38 >> } >> ], >> "client": [ >> { >> "features": > > "0x2f018fb87aa4aafe", >> "release": "luminous", >> "num": 2 >> }, >> { >> > > "features": "0x3f03cffd", >> "release": "squid", >> "num": 5 >> } > >>> ], >> "mgr": [ >> { >> "features": "0x3f03cffd", >> "release": > > "squid", >> "num": 2 >> } >> ] >> } I tryed to configure the minimum > > version to reef and received the >> following alert: >> # ceph osd > > set-require-min-compat-client reef >> Error EPERM: cannot set > > require_min_compat_client to reef: 2 connected >> client(s) look like > > luminous (missing 0x8000); add >> --yes-i-really-mean-it to do it > > anyway Is it ok do confirm anyway? >> Which processes are still as > > luminous? Rafael. >> ___ > >>> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an > > email to ceph-users-le...@ceph.io >>
[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log
Is this bit of code responsible for hardcoding DEBUG to cephadm.log? 'loggers': { '': { 'level': 'DEBUG', 'handlers': ['console', 'log_file'], } } in /var/lib/ceph//cephadm.* ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
Again, thank you very much for your help! The container is not there any more, but I discovered that the "old" mon data still exists. I have the same situation for two mons I removed at the same time: $ monmaptool --print monmap1 monmaptool: monmap file monmap1 epoch 29 fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96 last_changed 2025-04-10T14:16:21.203171+0200 created 2021-02-26T14:02:29.522695+0100 min_mon_release 19 (squid) election_strategy: 1 0: [v2:10.127.239.2:3300/0,v1:10.127.239.2:6789/0] mon.ceph2-02 1: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04 2: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06 3: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05 $ monmaptool --print monmap2 monmaptool: monmap file monmap2 epoch 30 fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96 last_changed 2025-04-10T14:16:43.216713+0200 created 2021-02-26T14:02:29.522695+0100 min_mon_release 19 (unknown) election_strategy: 1 0: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04 1: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06 2: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05 Would it be feasible to move the data from node1 (which still contains node2 as mon) to node2, or would that just result in even more mess? On 2025-04-10 19:57, Eugen Block wrote: It can work, but it might be necessary to modify the monmap first, since it's complaining that it has been removed from it. Are you familiar with the monmap-tool (https://docs.ceph.com/en/latest/man/8/monmaptool/)? The procedure is similar to changing a monitor's IP address the "messy way" (https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method). I also wrote a blog post how to do it with cephadm: https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/ But before changing anything, I'd inspect first what the current status is. You can get the current monmap from within the mon container (is it still there?): cephadm shell --name mon. ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out monmap monmaptool --print monmap You can paste the output here, if you want. Zitat von Jonas Schwab : I realized, I have access to a data directory of a monitor I removed just before the oopsie happened. Can I launch a ceph-mon from that? If I try just to launch ceph-mon, it commits suicide: 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 not in monmap and have been in a quorum before; must have been removed 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 commit suicide! 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize On 2025-04-10 16:01, Jonas Schwab wrote: Hello everyone, I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. I am very grateful for all help! Best regards, Jonas Schwab ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
It depends a bit. Which mon do the OSDs still know about? You can check /var/lib/ceph//osd.X/config to retrieve that piece of information. I'd try to revive one of them. Do you still have the mon store.db for all of the mons or at least one of them? Just to be safe, back up all the store.db directories. Then modify a monmap to contain the one you want to revive by removing the other ones. Backup your monmap files as well. Then inject the modified monmap into the daemon and try starting it. Zitat von Jonas Schwab : Again, thank you very much for your help! The container is not there any more, but I discovered that the "old" mon data still exists. I have the same situation for two mons I removed at the same time: $ monmaptool --print monmap1 monmaptool: monmap file monmap1 epoch 29 fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96 last_changed 2025-04-10T14:16:21.203171+0200 created 2021-02-26T14:02:29.522695+0100 min_mon_release 19 (squid) election_strategy: 1 0: [v2:10.127.239.2:3300/0,v1:10.127.239.2:6789/0] mon.ceph2-02 1: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04 2: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06 3: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05 $ monmaptool --print monmap2 monmaptool: monmap file monmap2 epoch 30 fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96 last_changed 2025-04-10T14:16:43.216713+0200 created 2021-02-26T14:02:29.522695+0100 min_mon_release 19 (unknown) election_strategy: 1 0: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04 1: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06 2: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05 Would it be feasible to move the data from node1 (which still contains node2 as mon) to node2, or would that just result in even more mess? On 2025-04-10 19:57, Eugen Block wrote: It can work, but it might be necessary to modify the monmap first, since it's complaining that it has been removed from it. Are you familiar with the monmap-tool (https://docs.ceph.com/en/latest/man/8/monmaptool/)? The procedure is similar to changing a monitor's IP address the "messy way" (https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method). I also wrote a blog post how to do it with cephadm: https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/ But before changing anything, I'd inspect first what the current status is. You can get the current monmap from within the mon container (is it still there?): cephadm shell --name mon. ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out monmap monmaptool --print monmap You can paste the output here, if you want. Zitat von Jonas Schwab : I realized, I have access to a data directory of a monitor I removed just before the oopsie happened. Can I launch a ceph-mon from that? If I try just to launch ceph-mon, it commits suicide: 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 not in monmap and have been in a quorum before; must have been removed 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 commit suicide! 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize On 2025-04-10 16:01, Jonas Schwab wrote: Hello everyone, I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. I am very grateful for all help! Best regards, Jonas Schwab ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log
Hey all, Just confirming that the same debug level has been in Reef and Squid. We got so used to it that just decided not to care anymore. Best, Laimis J. > On 8 Apr 2025, at 14:21, Alex wrote: > > Interesting. So it's like that for everybody? > Meaning cephadm.log logs debug messages. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: nodes with high density of OSDs
Peter, I don't think udev factors in based on the original question. Firstly, because I'm not sure udev deals with permanently-attached devices (it's more for hot-swap items). Secondly, because the original complaint mentioned LVM specifically. I agree that the hosts seem overloaded, by the way. It sounds like large disks are being subdivided into many smaller disks, which would be bad for Ceph to do on HDDs, and while SSDs don't have the seek and rotational liabilities of HDDs, it's still questionable as to how many connections you really should be making to one physical unit that way. Ceph, for reasons I never discovered prefers that you create OSDs that either own an entire physical disk or an LVM Logical Volume, but NOT a disk partition. I find it curious, since LVs aren't necessarily contiguous space (again, more of a liability for HDDs than SSDs). unlike traditional partitions, but there you are. Incidentally, LVs are contained in Volume Groups, and the whole can end up with parts scattered over multiple Physical Volumes (PVs). When an LVM-supporting OS boots, part of the process is to run an lvscan (lvscan -ay) to locate and activate Logical Volumes, and from the information given, it's assumed that the lvscan process hasn't completed before Ceph starts up and begins trying to use them. The boot lvscan is normally pretty quick, since it would be rare to have more than a dozen or so LVs in the system. But in this case, more than 100 LVs are being configured at boot time and the systemd boot process doesn't currently account for the extra time needed to do that. If I haven't got my facts too badly scrambled, LVs end up being mapped to dm devices, but that's something I normally only pay attention to when hardware isn't behaving so I'm not really expert on that. Hope that helps, Tim On 4/10/25 16:43, Peter Grandi wrote: I have a 4 nodes with 112 OSDs each [...] As an aside I rekon that is not such a good idea as Ceph was designed for one-small-OSD per small-server and lots of them, but lots of people of course know better. Maybe you can gimme a hint how to struggle it over? That is not so much a Ceph question but a distribution question anyhow there are two possible hints that occur to me: * In most distributions the automatic activation of block devices is done by the kernel plus 'udevd' rules and/or 'systemd' units. * There are timeouts for activation of storage devices and on a system with many, depending on type etc., there may be a default setting to activate them serially instead of in parallel to prevent sudden power consumption and other surges, so some devices may not activate because of timeouts. You can start by asking the sysadmin for those machines to look at system logs (distribution dependent) for storage device activation reports to confirm whether the guesses above apply to your situation and if confirmed you can ask them to change the relevant settings for the distribution used. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Repo name bug?
I created a pull request, not sure what the etiquette is if I can merge it. First timer here. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Repo name bug?
On Fri, Apr 11, 2025 at 10:39 AM Alex wrote: > I created a pull request, not sure what the etiquette is if I can > merge it. First timer here. > hi Alex, I cannot find your pull request in https://github.com/ceph/cephadm-ansible/ . did you create it in this project? > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Regards Kefu Chai ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSDs ignore memory limit
Hi Jonas, Anthony gave some good advice for some things to check. You can also dump the mempool statistics for OSDs that you identify are over their memory target using: "ceph daemon osd.NNN dump_mempools" The osd_memory_target code basically looks at the memory usage of the process and then periodically grows or shrinks the aggregate memory for caches based on how far off the process usage is from the target. It's not perfect, but generally keeps memory close to the target size. It can't do anything if there is a memory leak or other component driving the overall memory usage higher than the target though. One example of this is that in erasure coded pools, huge xattrs on objects can drive pglog memory usage extremely high and the osd_memory_autotuning may not be able to compensate for it. Having said this, I'd suggest looking at the actual targets and the mempools and see if you can figure out where the memory is going and if its truly over the target. The targets themselves can be autotuned higher up in the stack in some cases. Mark On 4/9/25 07:52, Jonas Schwab wrote: Hello everyone, I recently have many problems with OSDs using much more memory than they are supposed to (> 10GB), leading to the node running out of memory and killing processes. Does someone have ideas why the daemons seem to completely ignore the set memory limits? See e.g. the following: $ ceph orch ps ceph2-03 NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID mon.ceph2-03 ceph2-03 running (3h) 1s ago 2y 501M 2048M 19.2.1 f2efb0401a30 d876fc30f741 node-exporter.ceph2-03 ceph2-03 *:9100 running (3h) 1s ago 17M 46.5M - 1.7.0 72c9c2088986 d32ec4d266ea osd.4 ceph2-03 running (26m) 1s ago 2y 10.2G 3310M 19.2.1 f2efb0401a30 b712a86dacb2 osd.11 ceph2-03 running (5m) 1s ago 2y 3458M 3310M 19.2.1 f2efb0401a30 f3d7705325b4 osd.13 ceph2-03 running (3h) 1s ago 6d 2059M 3310M 19.2.1 f2efb0401a30 980ee7e11252 osd.17 ceph2-03 running (114s) 1s ago 2y 3431M 3310M 19.2.1 f2efb0401a30 be7319fda00b osd.23 ceph2-03 running (30m) 1s ago 2y 10.4G 3310M 19.2.1 f2efb0401a30 9cfb86c4b34a osd.29 ceph2-03 running (8m) 1s ago 2y 4923M 3310M 19.2.1 f2efb0401a30 d764930bb557 osd.35 ceph2-03 running (14m) 1s ago 2y 7029M 3310M 19.2.1 f2efb0401a30 6a4113adca65 osd.59 ceph2-03 running (2m) 1s ago 2y 2821M 3310M 19.2.1 f2efb0401a30 8871d6d4f50a osd.61 ceph2-03 running (49s) 1s ago 2y 1090M 3310M 19.2.1 f2efb0401a30 3f7a0ed17ac2 osd.67 ceph2-03 running (7m) 1s ago 2y 4541M 3310M 19.2.1 f2efb0401a30 eea0a6bcefec osd.75 ceph2-03 running (3h) 1s ago 2y 1239M 3310M 19.2.1 f2efb0401a30 5a801902340d Best regards, Jonas -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Mark Nelson Head of Research and Development Clyso GmbH p: +49 89 21552391 12 | a: Minnesota, USA w: https://clyso.com | e: mark.nel...@clyso.com We are hiring: https://www.clyso.com/jobs/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph squid fresh install
Hi, I just did a new Ceph installation and would like to enable the "read balancer". However, the documentation requires that the minimum client version be reef. I checked this information through "ceph features" and came across the situation of having 2 luminous clients. # ceph features { "mon": [ { "features": "0x3f03cffd", "release": "squid", "num": 2 } ], "mds": [ { "features": "0x3f03cffd", "release": "squid", "num": 2 } ], "osd": [ { "features": "0x3f03cffd", "release": "squid", "num": 38 } ], "client": [ { "features": "0x2f018fb87aa4aafe", "release": "luminous", "num": 2 }, { "features": "0x3f03cffd", "release": "squid", "num": 5 } ], "mgr": [ { "features": "0x3f03cffd", "release": "squid", "num": 2 } ] } I tryed to configure the minimum version to reef and received the following alert: # ceph osd set-require-min-compat-client reef Error EPERM: cannot set require_min_compat_client to reef: 2 connected client(s) look like luminous (missing 0x8000); add --yes-i-really-mean-it to do it anyway Is it ok do confirm anyway? Which processes are still as luminous? Rafael. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log
I think it's the same block of code Eugen found. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: nodes with high density of OSDs
> I have a 4 nodes with 112 OSDs each [...] As an aside I rekon that is not such a good idea as Ceph was designed for one-small-OSD per small-server and lots of them, but lots of people of course know better. > Maybe you can gimme a hint how to struggle it over? That is not so much a Ceph question but a distribution question anyhow there are two possible hints that occur to me: * In most distributions the automatic activation of block devices is done by the kernel plus 'udevd' rules and/or 'systemd' units. * There are timeouts for activation of storage devices and on a system with many, depending on type etc., there may be a default setting to activate them serially instead of in parallel to prevent sudden power consumption and other surges, so some devices may not activate because of timeouts. You can start by asking the sysadmin for those machines to look at system logs (distribution dependent) for storage device activation reports to confirm whether the guesses above apply to your situation and if confirmed you can ask them to change the relevant settings for the distribution used. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
I edited the monmap to include only rgw2-06 and then followed https://docs.ceph.com/en/squid/rados/operations/add-or-rm-mons/#adding-a-monitor-manual to create a new monitor. Unfortunately, `ceph-mon -i mon.rgw2-06 --public-addr 10.127.239.63 -f` crashed with the traceback seen in the attachment. On 2025-04-10 20:34, Eugen Block wrote: It depends a bit. Which mon do the OSDs still know about? You can check /var/lib/ceph//osd.X/config to retrieve that piece of information. I'd try to revive one of them. Do you still have the mon store.db for all of the mons or at least one of them? Just to be safe, back up all the store.db directories. Then modify a monmap to contain the one you want to revive by removing the other ones. Backup your monmap files as well. Then inject the modified monmap into the daemon and try starting it. Zitat von Jonas Schwab : Again, thank you very much for your help! The container is not there any more, but I discovered that the "old" mon data still exists. I have the same situation for two mons I removed at the same time: $ monmaptool --print monmap1 monmaptool: monmap file monmap1 epoch 29 fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96 last_changed 2025-04-10T14:16:21.203171+0200 created 2021-02-26T14:02:29.522695+0100 min_mon_release 19 (squid) election_strategy: 1 0: [v2:10.127.239.2:3300/0,v1:10.127.239.2:6789/0] mon.ceph2-02 1: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04 2: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06 3: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05 $ monmaptool --print monmap2 monmaptool: monmap file monmap2 epoch 30 fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96 last_changed 2025-04-10T14:16:43.216713+0200 created 2021-02-26T14:02:29.522695+0100 min_mon_release 19 (unknown) election_strategy: 1 0: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04 1: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06 2: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05 Would it be feasible to move the data from node1 (which still contains node2 as mon) to node2, or would that just result in even more mess? On 2025-04-10 19:57, Eugen Block wrote: It can work, but it might be necessary to modify the monmap first, since it's complaining that it has been removed from it. Are you familiar with the monmap-tool (https://docs.ceph.com/en/latest/man/8/monmaptool/)? The procedure is similar to changing a monitor's IP address the "messy way" (https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method). I also wrote a blog post how to do it with cephadm: https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/ But before changing anything, I'd inspect first what the current status is. You can get the current monmap from within the mon container (is it still there?): cephadm shell --name mon. ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out monmap monmaptool --print monmap You can paste the output here, if you want. Zitat von Jonas Schwab : I realized, I have access to a data directory of a monitor I removed just before the oopsie happened. Can I launch a ceph-mon from that? If I try just to launch ceph-mon, it commits suicide: 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 not in monmap and have been in a quorum before; must have been removed 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 commit suicide! 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize On 2025-04-10 16:01, Jonas Schwab wrote: Hello everyone, I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. I am very grateful for all help! Best regards, Jonas Schwab ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerzburg.de Tel: +49 931 31-84460 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Jonas Schwab Research Data Management, Cluster of Excellence ct.qmat https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de Email: jonas.sch...@uni-wuerz
[ceph-users] v18.2.5 Reef released
We're happy to announce the 5th point release in the Reef series. We recommend users to update to this release. For detailed release notes with links & changelog please refer to the official blog entry at https://ceph.io/en/news/blog/2025/v18-2-5-reef-released/ Notable Changes --- * RBD: The ``try-netlink`` mapping option for rbd-nbd has become the default and is now deprecated. If the NBD netlink interface is not supported by the kernel, then the mapping is retried using the legacy ioctl interface. * RADOS: A new command, `ceph osd rm-pg-upmap-primary-all`, has been added that allows users to clear all pg-upmap-primary mappings in the osdmap when desired. Related trackers: - https://tracker.ceph.com/issues/67179 - https://tracker.ceph.com/issues/66867 Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at https://download.ceph.com/tarballs/ceph_18.2.5.orig.tar.gz * Containers at https://quay.io/repository/ceph/ceph * For packages, see https://docs.ceph.com/en/latest/install/get-packages/ * Release git sha1: a5b0e13f9c96f3b45f596a95ad098f51ca0ccce1 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: NIH Datasets
Sounds like a discussion for a discord server. Or BlueSky or something that's very definitely NOT what used to be known as twitter. My viewpoint is a little different. I really didn't consider HIPAA stuff, although since technically that is info that shouldn't be accessible to anyone but authorized staff at NIH - and there's the rub, if the very persons/offices involved are purged. At that point, what we'd really be doing is simply hiding it until a saner regime comes along and wants it back. But it's not just NIH that's being tossed down the Memory Hole. NASA, NOAA, and other agencies are also being "cleansed". We should properly be safeguarding ALL of that. Reminds me of Isaac Asimov's Foundation - an agency to preserve human knowledge over the dark ages. Also, the idea of having fixed homes for complete documents I feel is limiting. I'm minded of how the folding@home project distributed work to random volunteers. And again, how ceph can break an object into PGs and splatter them to replicas on multiple servers. It's less important for a given document server to be 100% online as it is to have the ability for nodes to check in and out and maintain a gestalt. As for the management of all this, I'd say that the top-level domain of my theoretical namespace would be a select committee in charge of the master servers. sub-domains would be administered by grant from the top level and have their own administrators. And so forth until you have librarian administrators. Existing examples can be seen in some of the larger git archives, such as for Linux. The Wikipedia can also provide examples of how to administer tamper-resistant information. So, in short, I'm proposing a sort of world-wide web of documents. Something that can live in the background of ordinary user computers, perhaps. But most importantly, reliable, accessible and secure. Tim On 4/7/25 15:33, Linas Vepstas wrote: Thanks Šarūnai and all who responded. I guess general discussion will need to go off-list. But first: To summarize, the situation seems to be this: * As a general rule, principle investigators (PI) always have a copy of their "master dataset", which thus is "safe" as long as they don't lose control over it. * Certain data sets are popular and are commonly shared. * NCBI publishes data sets, with the goal of making access easy, transparent, fast, documented, and shoulders the burden of network costs, sysadmin, server maintenance, etc. and it is this "free, easy, managed-for-you" infrastructure that is at risk. * Unlike climate data, some of the NIH data is covered by HIPAA (e.g. cancer datasets) because it contains personal identifying information. I have no clue how this is dealt with. Encryption? Passwords? Restricted access? Who makes the decision about who is allowed, and who is not allowed to work with, access, copy or mirror the data? WTF? I'm clueless here. What are the technical problems to be solved? As long as PI's have a copy of a master dataset, the technical issues are: -- how to find it? -- what does it contain? -- is there enough network bandwidth? -- can it be copied in full? -- if it can be, where's the mirrors / backups? -- If the PI's lab is shut down, who pays for the storage and network connectivity for the backups? -- How to protect against loss of backup copies? -- How to gain access to backup copies? The above issues sit at the "library science" level: yes, technology can help, but it's also social and organizational. So it's not really about "how can we build a utopian decentralized data store" in some abstract way that shards data across multiple nodes (which is what IPFS seemed to want to be). Instead, its four-fold: * How is the catalog of available data maintained? * How is the safety of backup copies ensured? * How do we cache data, improve latency, improve bandwidth? * How are the administrative burdens shared? (sysadmin, cost of servers, bandwidth) This is way far outside of the idea of "let's just harness a bunch of disks together on the internet", but it is the actual problem being faced. -- Linas On Mon, Apr 7, 2025 at 8:07 AM Šarūnas Burdulis wrote: On 4/4/25 11:39 PM, Linas Vepstas wrote: OK what you will read below might sound insane but I am obliged to ask. There are 275 petabytes of NIH data at risk of being deleted. Cancer research, medical data, HIPAA type stuff. Currently unclear where it's located, how it's managed, who has access to what, but lets ignore that for now. It's presumably splattered across data centers, cloud, AWS, supercomputing labs, who knows. Everywhere. Similar to climate research data back in 2017... It was all accessible via FTP or HTTP though. A Climate Mirror initiative was created and a distributed copy worldwide was made eventually. Essentially, a list of URLs was provided and some helper scripts to slurp multiple copies of data repositories. https://climatemirror.org/ https://github.com/climate-mirror --
[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log
That was my assumption, yes. Zitat von Alex : Is this bit of code responsible for hardcoding DEBUG to cephadm.log? 'loggers': { '': { 'level': 'DEBUG', 'handlers': ['console', 'log_file'], } } in /var/lib/ceph//cephadm.* ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph squid fresh install
___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log
I made a Pull Request for cephadm.log set DEBUG. Not sure if I should merge it. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: nodes with high density of OSDs
Filestore IIRC used partitions, with cute hex GPT types for various states and roles. Udev activation was sometimes problematic, and LVM tags are more flexible and reliable than the prior approach. There no doubt is more to it but that’s what I recall. > On Apr 10, 2025, at 9:11 PM, Tim Holloway wrote: > > Peter, > > I don't think udev factors in based on the original question. Firstly, > because I'm not sure udev deals with permanently-attached devices (it's more > for hot-swap items). Secondly, because the original complaint mentioned LVM > specifically. > > I agree that the hosts seem overloaded, by the way. It sounds like large > disks are being subdivided into many smaller disks, which would be bad for > Ceph to do on HDDs, and while SSDs don't have the seek and rotational > liabilities of HDDs, it's still questionable as to how many connections you > really should be making to one physical unit that way. > > Ceph, for reasons I never discovered prefers that you create OSDs that either > own an entire physical disk or an LVM Logical Volume, but NOT a disk > partition. I find it curious, since LVs aren't necessarily contiguous space > (again, more of a liability for HDDs than SSDs). unlike traditional > partitions, but there you are. Incidentally, LVs are contained in Volume > Groups, and the whole can end up with parts scattered over multiple Physical > Volumes (PVs). > > When an LVM-supporting OS boots, part of the process is to run an lvscan > (lvscan -ay) to locate and activate Logical Volumes, and from the information > given, it's assumed that the lvscan process hasn't completed before Ceph > starts up and begins trying to use them. The boot lvscan is normally pretty > quick, since it would be rare to have more than a dozen or so LVs in the > system. > > But in this case, more than 100 LVs are being configured at boot time and the > systemd boot process doesn't currently account for the extra time needed to > do that. > > If I haven't got my facts too badly scrambled, LVs end up being mapped to dm > devices, but that's something I normally only pay attention to when hardware > isn't behaving so I'm not really expert on that. > > Hope that helps, > >Tim > > On 4/10/25 16:43, Peter Grandi wrote: >>> I have a 4 nodes with 112 OSDs each [...] >> As an aside I rekon that is not such a good idea as Ceph was >> designed for one-small-OSD per small-server and lots of them, >> but lots of people of course know better. >> >>> Maybe you can gimme a hint how to struggle it over? >> That is not so much a Ceph question but a distribution question >> anyhow there are two possible hints that occur to me: >> >> * In most distributions the automatic activation of block >> devices is done by the kernel plus 'udevd' rules and/or >> 'systemd' units. >> >> * There are timeouts for activation of storage devices and on a >> system with many, depending on type etc., there may be a >> default setting to activate them serially instead of in >> parallel to prevent sudden power consumption and other surges, >> so some devices may not activate because of timeouts. >> >> You can start by asking the sysadmin for those machines to look >> at system logs (distribution dependent) for storage device >> activation reports to confirm whether the guesses above apply to >> your situation and if confirmed you can ask them to change the >> relevant settings for the distribution used. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log
Link please. > On Apr 10, 2025, at 10:59 PM, Alex wrote: > > I made a Pull Request for cephadm.log set DEBUG. > Not sure if I should merge it. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
I solved the problem with executing ceph-mon. Among others, -i mon.rgw2-06 was not the correct option, but rather -i rgw2-06. Unfortunately, that brought the next problem: The cluster now shows "100.000% pgs unknown", which is probably because the monitor data is not complete up to date, but rather the state it was in before I switched over to other mons. A few minutes or s after that, the cluster crashed and I lust the mons. I guess this outdated cluster map is probably unusable? All services seem to be running fine and there are not network obstructions. Should I instead go with this: https://docs.ceph.com/en/squid/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds ? I actually already tried the latter option, but ran into the error `rocksdb: [db/db_impl/db_impl_open.cc:2086] DB::Open() failed: IO error: while open a file for lock: /var/lib/ceph/mon/ceph-ceph2-01/store.db/LOCK: Permission denied` Even though I double checked that the permission and ownership on the replacing store.db are properly set. On 2025-04-10 22:45, Jonas Schwab wrote: I edited the monmap to include only rgw2-06 and then followed https://docs.ceph.com/en/squid/rados/operations/add-or-rm-mons/#adding-a-monitor-manual to create a new monitor. Unfortunately, `ceph-mon -i mon.rgw2-06 --public-addr 10.127.239.63 -f` crashed with the traceback seen in the attachment. On 2025-04-10 20:34, Eugen Block wrote: It depends a bit. Which mon do the OSDs still know about? You can check /var/lib/ceph//osd.X/config to retrieve that piece of information. I'd try to revive one of them. Do you still have the mon store.db for all of the mons or at least one of them? Just to be safe, back up all the store.db directories. Then modify a monmap to contain the one you want to revive by removing the other ones. Backup your monmap files as well. Then inject the modified monmap into the daemon and try starting it. Zitat von Jonas Schwab : Again, thank you very much for your help! The container is not there any more, but I discovered that the "old" mon data still exists. I have the same situation for two mons I removed at the same time: $ monmaptool --print monmap1 monmaptool: monmap file monmap1 epoch 29 fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96 last_changed 2025-04-10T14:16:21.203171+0200 created 2021-02-26T14:02:29.522695+0100 min_mon_release 19 (squid) election_strategy: 1 0: [v2:10.127.239.2:3300/0,v1:10.127.239.2:6789/0] mon.ceph2-02 1: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04 2: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06 3: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05 $ monmaptool --print monmap2 monmaptool: monmap file monmap2 epoch 30 fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96 last_changed 2025-04-10T14:16:43.216713+0200 created 2021-02-26T14:02:29.522695+0100 min_mon_release 19 (unknown) election_strategy: 1 0: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04 1: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06 2: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05 Would it be feasible to move the data from node1 (which still contains node2 as mon) to node2, or would that just result in even more mess? On 2025-04-10 19:57, Eugen Block wrote: It can work, but it might be necessary to modify the monmap first, since it's complaining that it has been removed from it. Are you familiar with the monmap-tool (https://docs.ceph.com/en/latest/man/8/monmaptool/)? The procedure is similar to changing a monitor's IP address the "messy way" (https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method). I also wrote a blog post how to do it with cephadm: https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/ But before changing anything, I'd inspect first what the current status is. You can get the current monmap from within the mon container (is it still there?): cephadm shell --name mon. ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out monmap monmaptool --print monmap You can paste the output here, if you want. Zitat von Jonas Schwab : I realized, I have access to a data directory of a monitor I removed just before the oopsie happened. Can I launch a ceph-mon from that? If I try just to launch ceph-mon, it commits suicide: 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 not in monmap and have been in a quorum before; must have been removed 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 commit suicide! 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize On 2025-04-10 16:01, Jonas Schwab wrote: Hello everyone, I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. I am very grateful for all help! Best regards, Jonas Schwab __
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
Is at least one mgr running? PG states are reported by the mgr daemon. Zitat von Jonas Schwab : I solved the problem with executing ceph-mon. Among others, -i mon.rgw2-06 was not the correct option, but rather -i rgw2-06. Unfortunately, that brought the next problem: The cluster now shows "100.000% pgs unknown", which is probably because the monitor data is not complete up to date, but rather the state it was in before I switched over to other mons. A few minutes or s after that, the cluster crashed and I lust the mons. I guess this outdated cluster map is probably unusable? All services seem to be running fine and there are not network obstructions. Should I instead go with this: https://docs.ceph.com/en/squid/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds ? I actually already tried the latter option, but ran into the error `rocksdb: [db/db_impl/db_impl_open.cc:2086] DB::Open() failed: IO error: while open a file for lock: /var/lib/ceph/mon/ceph-ceph2-01/store.db/LOCK: Permission denied` Even though I double checked that the permission and ownership on the replacing store.db are properly set. On 2025-04-10 22:45, Jonas Schwab wrote: I edited the monmap to include only rgw2-06 and then followed https://docs.ceph.com/en/squid/rados/operations/add-or-rm-mons/#adding-a-monitor-manual to create a new monitor. Unfortunately, `ceph-mon -i mon.rgw2-06 --public-addr 10.127.239.63 -f` crashed with the traceback seen in the attachment. On 2025-04-10 20:34, Eugen Block wrote: It depends a bit. Which mon do the OSDs still know about? You can check /var/lib/ceph//osd.X/config to retrieve that piece of information. I'd try to revive one of them. Do you still have the mon store.db for all of the mons or at least one of them? Just to be safe, back up all the store.db directories. Then modify a monmap to contain the one you want to revive by removing the other ones. Backup your monmap files as well. Then inject the modified monmap into the daemon and try starting it. Zitat von Jonas Schwab : Again, thank you very much for your help! The container is not there any more, but I discovered that the "old" mon data still exists. I have the same situation for two mons I removed at the same time: $ monmaptool --print monmap1 monmaptool: monmap file monmap1 epoch 29 fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96 last_changed 2025-04-10T14:16:21.203171+0200 created 2021-02-26T14:02:29.522695+0100 min_mon_release 19 (squid) election_strategy: 1 0: [v2:10.127.239.2:3300/0,v1:10.127.239.2:6789/0] mon.ceph2-02 1: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04 2: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06 3: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05 $ monmaptool --print monmap2 monmaptool: monmap file monmap2 epoch 30 fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96 last_changed 2025-04-10T14:16:43.216713+0200 created 2021-02-26T14:02:29.522695+0100 min_mon_release 19 (unknown) election_strategy: 1 0: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04 1: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06 2: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05 Would it be feasible to move the data from node1 (which still contains node2 as mon) to node2, or would that just result in even more mess? On 2025-04-10 19:57, Eugen Block wrote: It can work, but it might be necessary to modify the monmap first, since it's complaining that it has been removed from it. Are you familiar with the monmap-tool (https://docs.ceph.com/en/latest/man/8/monmaptool/)? The procedure is similar to changing a monitor's IP address the "messy way" (https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method). I also wrote a blog post how to do it with cephadm: https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/ But before changing anything, I'd inspect first what the current status is. You can get the current monmap from within the mon container (is it still there?): cephadm shell --name mon. ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out monmap monmaptool --print monmap You can paste the output here, if you want. Zitat von Jonas Schwab : I realized, I have access to a data directory of a monitor I removed just before the oopsie happened. Can I launch a ceph-mon from that? If I try just to launch ceph-mon, it commits suicide: 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 not in monmap and have been in a quorum before; must have been removed 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 commit suicide! 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize On 2025-04-10 16:01, Jonas Schwab wrote: Hello everyone, I believe I accidentally nuked all monitor of my cluster (please don't ask how). Is there a way to recover from this desaster? I have a cephadm setup. I am ver
[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor
Yes mgrs are running as intended. It just seems that mons and osd don't recongnize each other, because the monitors map is outdated. On 2025-04-11 07:07, Eugen Block wrote: Is at least one mgr running? PG states are reported by the mgr daemon. Zitat von Jonas Schwab : I solved the problem with executing ceph-mon. Among others, -i mon.rgw2-06 was not the correct option, but rather -i rgw2-06. Unfortunately, that brought the next problem: The cluster now shows "100.000% pgs unknown", which is probably because the monitor data is not complete up to date, but rather the state it was in before I switched over to other mons. A few minutes or s after that, the cluster crashed and I lust the mons. I guess this outdated cluster map is probably unusable? All services seem to be running fine and there are not network obstructions. Should I instead go with this: https://docs.ceph.com/en/squid/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds ? I actually already tried the latter option, but ran into the error `rocksdb: [db/db_impl/db_impl_open.cc:2086] DB::Open() failed: IO error: while open a file for lock: /var/lib/ceph/mon/ceph-ceph2-01/store.db/LOCK: Permission denied` Even though I double checked that the permission and ownership on the replacing store.db are properly set. On 2025-04-10 22:45, Jonas Schwab wrote: I edited the monmap to include only rgw2-06 and then followed https://docs.ceph.com/en/squid/rados/operations/add-or-rm-mons/#adding-a-monitor-manual to create a new monitor. Unfortunately, `ceph-mon -i mon.rgw2-06 --public-addr 10.127.239.63 -f` crashed with the traceback seen in the attachment. On 2025-04-10 20:34, Eugen Block wrote: It depends a bit. Which mon do the OSDs still know about? You can check /var/lib/ceph//osd.X/config to retrieve that piece of information. I'd try to revive one of them. Do you still have the mon store.db for all of the mons or at least one of them? Just to be safe, back up all the store.db directories. Then modify a monmap to contain the one you want to revive by removing the other ones. Backup your monmap files as well. Then inject the modified monmap into the daemon and try starting it. Zitat von Jonas Schwab : Again, thank you very much for your help! The container is not there any more, but I discovered that the "old" mon data still exists. I have the same situation for two mons I removed at the same time: $ monmaptool --print monmap1 monmaptool: monmap file monmap1 epoch 29 fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96 last_changed 2025-04-10T14:16:21.203171+0200 created 2021-02-26T14:02:29.522695+0100 min_mon_release 19 (squid) election_strategy: 1 0: [v2:10.127.239.2:3300/0,v1:10.127.239.2:6789/0] mon.ceph2-02 1: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04 2: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06 3: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05 $ monmaptool --print monmap2 monmaptool: monmap file monmap2 epoch 30 fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96 last_changed 2025-04-10T14:16:43.216713+0200 created 2021-02-26T14:02:29.522695+0100 min_mon_release 19 (unknown) election_strategy: 1 0: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04 1: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06 2: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05 Would it be feasible to move the data from node1 (which still contains node2 as mon) to node2, or would that just result in even more mess? On 2025-04-10 19:57, Eugen Block wrote: It can work, but it might be necessary to modify the monmap first, since it's complaining that it has been removed from it. Are you familiar with the monmap-tool (https://docs.ceph.com/en/latest/man/8/monmaptool/)? The procedure is similar to changing a monitor's IP address the "messy way" (https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method). I also wrote a blog post how to do it with cephadm: https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/ But before changing anything, I'd inspect first what the current status is. You can get the current monmap from within the mon container (is it still there?): cephadm shell --name mon. ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out monmap monmaptool --print monmap You can paste the output here, if you want. Zitat von Jonas Schwab : I realized, I have access to a data directory of a monitor I removed just before the oopsie happened. Can I launch a ceph-mon from that? If I try just to launch ceph-mon, it commits suicide: 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 not in monmap and have been in a quorum before; must have been removed 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???) e29 commit suicide! 2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize On 2025-04-10 16:01, Jonas Schwab wrote:
[ceph-users] Re: NIH Datasets
Hi Alex, "Cost concerns" is the fig leaf that is being used in many cases, but often a closer look indicates political motivations. The current US administration is actively engaged in the destruction of anything that would conflict with their view of the world. That includes health practices - especially regarding vaccination, climate data, the role of women and non-white people in history, and whatever else offends their fragile minds. For example, here in the Free State of Florida, the governor has been promoting the idea that slavery was not a bad thing because it gave forcibly-imported black people "useful job skills", textbooks must now refer to the "Gulf of America", fluoridation of water is a Bad Thing, and much more. Famous non-white people are being scrubbed from military websites and even national park webpages - and non-white, non-male people being fired from top-level government/military positions. Some are even joking that Harriet Tubman be re-classified as a "human trafficer" (this is in reference to the Underground Railroad). Which is why I don't think we should stop at just saving NIH data. Virtually all government-controlled data is at risk. And by the way, DOGE has just bragged that they saved the US government a whole million dollars by getting rid of records on magnetic tape (where they put the data afterwards wasn't said). So forget magtape archives inside the government itself. The science-fiction novel "A Canticle for Liebowitz" by Walter M. Miller outlines a post-nuclear future where the survivors rebel against knowledge, proudly bragging of being "simpletons" and burning books (and I think also educated people). Their USA counterpart is MAGA, who inherited a long history of "I don't need no librul education, I got's comun since!". Or, as Isaac Asimov put it, the idea that "my ignorance is just as good as your knowledge". This is not a concept unique to the USA, but the monkeys are firmly in charge of the zoo at this point so protecting everything we can is really important. Tim On 4/8/25 09:28, Alex Gorbachev wrote: Hi Linas, Is the intent of purging of this data mainly due to just cost concerns? If the goal is purely preservation of data, the likely cheapest and least maintenance intensive way of doing this is a large scale tape archive. Such archives (purely based on a google search) exist at LLNL and OU, and there is a TAPAS service from SpectraLogic. I would imagine questions would arise about custody of the data, legal implications etc. The easiest is for the organization already hosting the data to just preserve it by archiving, and thereby claim a significant cost reduction. -- Alex Gorbachev On Sun, Apr 6, 2025 at 11:08 PM Linas Vepstas wrote: OK what you will read below might sound insane but I am obliged to ask. There are 275 petabytes of NIH data at risk of being deleted. Cancer research, medical data, HIPAA type stuff. Currently unclear where it's located, how it's managed, who has access to what, but lets ignore that for now. It's presumably splattered across data centers, cloud, AWS, supercomputing labs, who knows. Everywhere. I'm talking to a biomed person in Australias that uses NCBI data daily, she's in talks w/ Australian govt to copy and preserve the datasets they use. Some multi-petabytes of stuff. I don't know. While bouncing around tech ideas, IPFS and Ceph came up. My experience with IPFS is that it's not a serious contender for anything. My experience with Ceph is that it's more-or-less A-list. OK. So here's the question: is it possible to (has anyone tried) set up an internet-wide Ceph cluster? Ticking off the typical checkboxes for "decentralized storage"? Stuff, like: internet connections need to be encrypted. Connections go down, come back up. Slow. Sure, national labs may have multi-terabit fiber, but little itty-bitty participants trying to contribute a small collection of disks to a large pool might only have a gigabit connection, of which maybe 10% is "usable". Barely. So, a hostile networking environment. Is this like, totally insane, run away now, can't do that, it won't work idea, or is there some glimmer of hope? Am I misunderstanding something about IPFS that merits taking a second look at it? Is there any other way of getting scalable reliable "decentralized" internet-wide storage? I mean, yes, of course, the conventional answer is that it could be copied to AWS or some national lab or two somewhere in the EU or Aus or UK or where-ever, That's the "obvious" answer. I'm looking for a non-obvious answer, an IPFS-like thing, but one that actually works. Could it work? -- Linas -- Patrick: Are they laughing at us? Sponge Bob: No, Patrick, they are laughing next to us. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users ma
[ceph-users] Diskprediction_local mgr module removal - Call for feedback
Hi everyone, On today's Ceph Steering Committee call we discussed the idea of removing the diskprediction_local mgr module, as the current prediction model is obsolete and not maintained. We would like to gather feedback from the community about the usage of this module, and find out if anyone is interested in maintaining it. Thanks, Yaarit ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: NIH Datasets
Super cool idea - I too wanted to refer to blockchain methods to avoid data being tampered. Ceph would need a completely different distribution coded for such storage, however we could say that the fundamentals are already in place? Best, Laimis J. > On 7 Apr 2025, at 18:23, Tim Holloway wrote: > > Additional features. > > * No "master server". No Single Point of Failure. > > * Resource location. A small number of master servers kept in sync like DNS > with tiers of secondary resources. I think blockchains also have a similar > setup? > > * Resource identification. A scheme like LDAP. For example: > > cn=library,catalog=dewey,filing=504.3,... > > cn=library,country=us,catalog=libraryofcongress,... > > country=us,agency=nih,department=... > > cn=upc,isbn=... > > A document/document set should have a canonical name, but allow alternate > names for ease of location, such as author searches, general topics and the > like. > > I considered OIDs as an alternative, but LDAP names are more human-friendly > and easier to add sub-domains to without petitioning a master registrar. Also > there's a better option for adding attributes to the entry description. > > On 4/7/25 09:39, Tim Holloway wrote: >> Yeah, Ceph in its current form doesn't seem like a good fit. >> >> I think that what we need to support the world's knowledge in the face of >> enstupidification is some sort of distributed holographic datastore. so, >> like Ceph's PG replication, a torrent-like ability to pull from multiple >> unreliable sources, a good indexing mechanism and, protections against >> tampering. Probably with a touch of git as well. >> >> I'm sure there's more, but those are items that immediately occur to me. >> >>Tim >> >> On 4/7/25 09:10, Alex Buie wrote: >>> MooseFS is the way to go here. >>> >>> >>> I have it working on android SD cards and of course normal Linux servers >>> over the internet and over Yggdrasil-network. >>> >>> One of my in-progress anarchy projects is a global hard drive for all of >>> humanity’s knowledge. >>> >>> I would LOVE to get involved with this preservation project technically in >>> a volunteer capacity. I can build a cutting edge resilient distributed >>> storage system for cheaper than anything currently on the market. >>> >>> Please reach out or pass along my email. >>> >>> Alex >>> >>> >>> On Sun, Apr 6, 2025 at 11:08 PM Linas Vepstas >>> wrote: >>> OK what you will read below might sound insane but I am obliged to ask. There are 275 petabytes of NIH data at risk of being deleted. Cancer research, medical data, HIPAA type stuff. Currently unclear where it's located, how it's managed, who has access to what, but lets ignore that for now. It's presumably splattered across data centers, cloud, AWS, supercomputing labs, who knows. Everywhere. I'm talking to a biomed person in Australias that uses NCBI data daily, she's in talks w/ Australian govt to copy and preserve the datasets they use. Some multi-petabytes of stuff. I don't know. While bouncing around tech ideas, IPFS and Ceph came up. My experience with IPFS is that it's not a serious contender for anything. My experience with Ceph is that it's more-or-less A-list. OK. So here's the question: is it possible to (has anyone tried) set up an internet-wide Ceph cluster? Ticking off the typical checkboxes for "decentralized storage"? Stuff, like: internet connections need to be encrypted. Connections go down, come back up. Slow. Sure, national labs may have multi-terabit fiber, but little itty-bitty participants trying to contribute a small collection of disks to a large pool might only have a gigabit connection, of which maybe 10% is "usable". Barely. So, a hostile networking environment. Is this like, totally insane, run away now, can't do that, it won't work idea, or is there some glimmer of hope? Am I misunderstanding something about IPFS that merits taking a second look at it? Is there any other way of getting scalable reliable "decentralized" internet-wide storage? I mean, yes, of course, the conventional answer is that it could be copied to AWS or some national lab or two somewhere in the EU or Aus or UK or where-ever, That's the "obvious" answer. I'm looking for a non-obvious answer, an IPFS-like thing, but one that actually works. Could it work? -- Linas -- Patrick: Are they laughing at us? Sponge Bob: No, Patrick, they are laughing next to us. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...
[ceph-users] v19.2.2 Squid released
We're happy to announce the 2nd backport release in the Squid series. https://ceph.io/en/news/blog/2025/v19-2-2-squid-released/ Notable Changes --- - This hotfix release resolves an RGW data loss bug when CopyObject is used to copy an object onto itself. S3 clients typically do this when they want to change the metadata of an existing object. Due to a regression caused by an earlier fix for https://tracker.ceph.com/issues/66286, any tail objects associated with such objects are erroneously marked for garbage collection. RGW deployments on Squid are encouraged to upgrade as soon as possible to minimize the damage. The experimental rgw-gap-list tool can help to identify damaged objects. Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at https://download.ceph.com/tarballs/ceph-19.2.2.tar.gz * Containers at https://quay.io/repository/ceph/ceph * For packages, see https://docs.ceph.com/en/latest/install/get-packages/ * Release git sha1: 0eceb0defba60152a8182f7bd87d164b639885b8 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: v19.2.2 Squid released
19.2.2 Installed! # ceph -s cluster: id: ,,, health: HEALTH_ERR 27 osds(s) are not reachable ... osd: 27 osds: 27 up (since 32m), 27 in (since 5w) ... It's such a 'bad look' something so visible, in such an often given command. 10/4/25 06:00 PM[ERR]osd.27's public address is not in 'fc00:1002:c7::/64' subnet But # ceph config get osd.27 .. global basic public_network fc00:1002:c7::/64 ... ifconfig of osd.27 ... inet6 fc00:1002:c7::43/64 scope global valid_lft forever preferred_lft forever ... ..similar for all the other osds, although of course on different hosts. On 4/10/25 15:08, Yuri Weinstein wrote: We're happy to announce the 2nd backport release in the Squid series. https://ceph.io/en/news/blog/2025/v19-2-2-squid-released/ Notable Changes --- - This hotfix release resolves an RGW data loss bug when CopyObject is used to copy an object onto itself. S3 clients typically do this when they want to change the metadata of an existing object. Due to a regression caused by an earlier fix for https://tracker.ceph.com/issues/66286, any tail objects associated with such objects are erroneously marked for garbage collection. RGW deployments on Squid are encouraged to upgrade as soon as possible to minimize the damage. The experimental rgw-gap-list tool can help to identify damaged objects. Getting Ceph * Git atgit://github.com/ceph/ceph.git * Tarball athttps://download.ceph.com/tarballs/ceph-19.2.2.tar.gz * Containers athttps://quay.io/repository/ceph/ceph * For packages, seehttps://docs.ceph.com/en/latest/install/get-packages/ * Release git sha1: 0eceb0defba60152a8182f7bd87d164b639885b8 ___ ceph-users mailing list --ceph-users@ceph.io To unsubscribe send an email toceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io