date:20250410

Glad I could help! I'm also waiting for 18.2.5 to upgrade our own  
cluster from Pacific after getting rid of our cache tier. :-D


Zitat von Jeremy Hansen :

This seems to have worked to get the orch back up and put me back to  
16.2.15. Thank you. Debating on waiting for 18.2.5 to move forward.


-jeremy

On Monday, Apr 07, 2025 at 1:26 AM, Eugen Block (mailto:ebl...@nde.ag)> wrote:

Still no, just edit the unit.run file for the MGRs to use a different
image. See Frédéric's instructions (now that I'm re-reading it,
there's a little mistake with dots and hyphens):

# Backup the unit.run file
$ cp /var/lib/ceph/$(ceph fsid)/mgr.ceph01.eydqvm/unit.run{,.bak}

# Change container image's signature. You can get the signature of the
version you
want to reach from https://quay.io/repository/ceph/ceph?tab=tags. It's
in the URL of a
version.
$ sed -i
's/ceph@sha256:e40c19cd70e047d14d70f5ec3cf501da081395a670cd59ca881ff56119660c8f/ceph@sha256:d26c11e20773704382946e34f0d3d2c0b8bb0b7b37d9017faa9dc11a0196c7d9/g'
/var/lib/ceph/$(ceph fsid)/mgr.ceph01.eydqvm/unit.run

# Restart the container (systemctl daemon-reload not needed)
$ systemctl restart ceph-$(ceph fsid)(a)mgr.ceph01.eydqvm.service

# Run this command a few times and it should show the new version
ceph orch ps --refresh --hostname ceph01 | grep mgr

To get the image signature, you can also look into the other unit.run
files, a version tag would also work.

It depends on how often you need the orchestrator to maintain the
cluster. If you have the time, you could wait a bit longer for other
responses. If you need the orchestrator in the meantime, you can roll
back the MGRs.

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/32APKOXKRAIZ7IDCNI25KVYFCCCF6RJG/

Zitat von Jeremy Hansen :

> Thank you. The only thing I’m unclear on is the rollback to pacific.
>
> Are you referring to
>
> > > >
> >  
https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-manager-daemon

>
> Thank you. I appreciate all the help. Should I wait for Adam to
> comment? At the moment, the cluster is functioning enough to
> maintain running vms, so if it’s wise to wait, I can do that.
>
> -jeremy
>
> > On Monday, Apr 07, 2025 at 12:23 AM, Eugen Block  > (mailto:ebl...@nde.ag)> wrote:
> > I haven't tried it this way yet, and I had hoped that Adam would chime
> > in, but my approach would be to remove this key (it's not present when
> > no upgrade is in progress):
> >
> > ceph config-key rm mgr/cephadm/upgrade_state
> >
> > Then rollback the two newer MGRs to Pacific as described before. If
> > they come up healthy, test if the orchestrator works properly first.
> > For example, remove a node-exporter or crash or anything else
> > uncritical and let it redeploy.
> > If that works, try a staggered upgrade, starting with the MGRs only:
> >
> > ceph orch upgrade start --image  --daemon-types mgr
> >
> > Since there's no need to go to Quincy, I suggest to upgrade to Reef
> > 18.2.4 (or you wait until 18.2.5 is released, which should be very
> > soon), so set the respective  in the above command.
> >
> > If all three MGRs successfully upgrade, you can continue with the
> > MONs, or with the entire rest.
> >
> > In production clusters, I usually do staggered upgrades, e. g. I limit
> > the number of OSD daemons first just to see if they come up healthy,
> > then I let it upgrade all other OSDs automatically.
> >
> > https://docs.ceph.com/en/latest/cephadm/upgrade/#staggered-upgrade
> >
> > Zitat von Jeremy Hansen :
> >
> > > Snipped some of the irrelevant logs to keep message size down.
> > >
> > > ceph config-key get mgr/cephadm/upgrade_state
> > >
> > > {"target_name": "quay.io/ceph/ceph:v17.2.0", "progress_id":
> > > "e7e1a809-558d-43a7-842a-c6229fdc57af", "target_id":
> > > "e1d6a67b021eb077ee22bf650f1a9fb1980a2cf5c36bdb9cba9eac6de8f702d9",
> > > "target_digests":
> > >
> >  
["quay.io/ceph/ceph@sha256:12a0a4f43413fd97a14a3d47a3451b2d2df50020835bb93db666209f3f77617a", "quay.io/ceph/ceph@sha256:cb4d698cb769b6aba05bf6ef04f41a7fe694160140347576e13bd9348514b667"], "target_version": "17.2.0", "fs_original_max_mds": null, "fs_original_allow_standby_replay": null, "error": null, "paused": false, "daemon_types": null, "hosts": null, "services": null, "total_count":  
null,

> > "remaining_count":
> > > null}
> > >
> > > What should I do next?
> > >
> > > Thank you!
> > > -jeremy
> > >
> > > > On Sunday, Apr 06, 2025 at 1:38 AM, Eugen Block  > > > (mailto:ebl...@nde.ag)> wrote:
> > > > Can you check if you have this config-key?
> > > >
> > > > ceph config-key get mgr/cephadm/upgrade_state
> > > >
> > > > If you reset the MGRs, it might be necessary to clear this key,
> > > > otherwise you might end up in some inconsistency. Just to be sure.
> > > >
> > > > Zitat von Jeremy Hansen :
> > > >
> > > > > Thanks. I’m trying to be extra careful since this cluster is
> > > > > actually in use. I’ll wait for your feedback.
> > > > >
> > > > > -jeremy
> > > > >
> > > > > > On

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

Can you bring back at least one of them? In that case you could reduce  
the monmap to 1 mon and bring the cluster back up. If the MONs are  
really dead, you can recover using OSDs [0]. I've never had to use  
that myself, but people have reported that to work.


[0]  
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds


Zitat von Jonas Schwab :


Hello everyone,

I believe I accidentally nuked all monitor of my cluster (please don't
ask how). Is there a way to recover from this desaster? I have a cephadm
setup.

I am very grateful for all help!

Best regards,
Jonas Schwab
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Cannot reinstate ceph fs mirror because i destroyed the ceph fs mirror peer/ target server

2025-04-10 Thread Jan Zeinstra

Hi,
This is my first post to the forum and I don't know if it's appropriate,
but I'd like to express my gratitude to all people working hard on ceph
because I think it's a fantastic piece of software.

The problem I'm having is caused by me; we had a well working ceph fs
mirror solution; let's call it source cluster A, and target cluster B.
Source cluster A is a modest cluster consisting of 6 instances, 3 OSD
instances, and 3 mon instances. The OSD instances all have 3 disks (HDD's)
and 3 OSD demons, totalling 9 OSD daemons and 9 HDD's. Target cluster B is
a single node system having 3 OSD daemons and 3 HDD's. Both clusters run
ceph 18.2.4 reef. Both clusters use Ubuntu 22.04 as OS throughout. Both
systems are installed using cephadm.
I have destroyed cluster B, and have built it from the ground up (I made a
mistake in PG sizing in the original cluster)
Now i find i cannot create/ reinstate the mirroring between 2 ceph fs
filesystems, and i suspect there is a peer left behind in the filesystem of
the source, pointing to the now non-existent target cluster.
When i do 'ceph fs snapshot mirror peer_list prodfs', i get:
'{"f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5": {"client_name":
"client.mirror_remote", "site_name": "bk-site", "fs_name": "prodfs"}}'
When i try to delete it: 'ceph fs snapshot mirror peer_remove prodfs
f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5', i get: 'Error EACCES: failed to
remove peeraccess denied: does your client key have mgr caps? See
http://docs.ceph.com/en/latest/mgr/administrator/#client-authentication',
but the logging of the daemon points to the more likely reason of failure:

Apr 08 12:54:26 s1mon systemd[1]: Started Ceph cephfs-mirror.s1mon.lvlkwp
for d0ea284a-8a16-11ee-9232-5934f0f00ec2.
Apr 08 12:54:26 s1mon cephfs-mirror[310088]: set uid:gid to 167:167
(ceph:ceph)
Apr 08 12:54:26 s1mon cephfs-mirror[310088]: ceph version 18.2.4
(e7ad5345525c7aa95470c26863873b581076945d) reef (stable), process
cephfs-mirror, pid 2
Apr 08 12:54:26 s1mon cephfs-mirror[310088]: pidfile_write: ignore empty
--pid-file
Apr 08 12:54:26 s1mon cephfs-mirror[310088]: mgrc service_daemon_register
cephfs-mirror.22849497 metadata
{arch=x86_64,ceph_release=reef,ceph_version=ceph version 18.2.4
(e7ad5345525c7a>
Apr 08 12:54:30 s1mon cephfs-mirror[310088]:
cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5) init:
remote monitor host=[v2:172.17.16.12:3300/0,v1:172.17.16.12:6789/0]
Apr 08 12:54:30 s1mon conmon[310082]: 2025-04-08T10:54:30.365+
7f57c51ba640 -1 monclient(hunting): handle_auth_bad_method server
allowed_methods [2] but i only support [2,1]
Apr 08 12:54:30 s1mon conmon[310082]: 2025-04-08T10:54:30.365+
7f57d81e0640 -1 cephfs::mirror::Utils connect: error connecting to bk-site:
(13) Permission denied
Apr 08 12:54:30 s1mon cephfs-mirror[310088]: cephfs::mirror::Utils connect:
error connecting to bk-site: (13) Permission denied
Apr 08 12:54:30 s1mon conmon[310082]: 2025-04-08T10:54:30.365+
7f57d81e0640 -1
cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5) init:
error connecting to remote cl>
Apr 08 12:54:30 s1mon cephfs-mirror[310088]:
cephfs::mirror::PeerReplayer(f3ea4e15-6d77-4f28-aacb-9afbfe8cc1c5) init:
error connecting to remote cluster: (13) Permission denied
Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received  signal: Hangup from
Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() )
UID: 0
Apr 09 00:00:16 s1mon conmon[310082]: 2025-04-08T22:00:16.362+
7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be generated
by pthread_kill(), raise(), abort(), alarm()>
Apr 09 00:00:16 s1mon conmon[310082]: 2025-04-08T22:00:16.386+
7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be generated
by pthread_kill(), raise(), abort(), alarm()>
Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received  signal: Hangup from
Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() )
UID: 0
Apr 09 00:00:16 s1mon conmon[310082]: 2025-04-08T22:00:16.430+
7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be generated
by pthread_kill(), raise(), abort(), alarm()>
Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received  signal: Hangup from
Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() )
UID: 0
Apr 09 00:00:16 s1mon conmon[310082]: 2025-04-08T22:00:16.466+
7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be generated
by pthread_kill(), raise(), abort(), alarm()>
Apr 09 00:00:16 s1mon cephfs-mirror[310088]: received  signal: Hangup from
Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() )
UID: 0
Apr 10 00:00:01 s1mon cephfs-mirror[310088]: received  signal: Hangup from
Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() )
UID: 0
Apr 10 00:00:01 s1mon conmon[310082]: 2025-04-09T22:00:01.767+
7f57d99e3640 -1 received  signal: Hangup from Kernel ( Could be generated
by pthread_kill(), raise(), abort(), alarm()>
Apr 10 00:00:01

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor


No, you have to run the objectstore-tool command within the cephadm shell:

cephadm shell --name osd.x -- ceph-objectstore-tool 

There are plenty examples online. I’m on my mobile phone right now

Zitat von Jonas Schwab :


Thank you for the help! Does that mean stopping the container and
mounting the lv?

On 2025-04-10 17:38, Eugen Block wrote:

You have to stop the OSDs in order to mount them with the objectstore
tool.

Zitat von Jonas Schwab :


No, didn't issue any commands to the OSDs.

On 2025-04-10 17:28, Eugen Block wrote:

Did you stop the OSDs?


Zitat von Jonas Schwab :


Thank you very much! I now stated the first step, namely "Collect the
map from each OSD host". As I have a cephadm deployment, I will
have to
execute ceph-objectstore-tool within each container. Unfortunately,
this
produces the error "Mount failed with '(11) Resource temporarily
unavailable'". Does anybody know how to solve this?

Best regards,
Jonas

On 2025-04-10 16:04, Robert Sander wrote:

Hi Jonas,

Am 4/10/25 um 16:01 schrieb Jonas Schwab:


I believe I accidentally nuked all monitor of my cluster (please
don't
ask how). Is there a way to recover from this desaster? I have a
cephadm
setup.


There is a procedure to recover the MON-DB from the OSDs:

https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds




Regards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor


No, didn't issue any commands to the OSDs.

On 2025-04-10 17:28, Eugen Block wrote:

Did you stop the OSDs?


Zitat von Jonas Schwab :


Thank you very much! I now stated the first step, namely "Collect the
map from each OSD host". As I have a cephadm deployment, I will have to
execute ceph-objectstore-tool within each container. Unfortunately, this
produces the error "Mount failed with '(11) Resource temporarily
unavailable'". Does anybody know how to solve this?

Best regards,
Jonas

On 2025-04-10 16:04, Robert Sander wrote:

Hi Jonas,

Am 4/10/25 um 16:01 schrieb Jonas Schwab:


I believe I accidentally nuked all monitor of my cluster (please don't
ask how). Is there a way to recover from this desaster? I have a
cephadm
setup.


There is a procedure to recover the MON-DB from the OSDs:

https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds



Regards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor


Did you stop the OSDs?


Zitat von Jonas Schwab :


Thank you very much! I now stated the first step, namely "Collect the
map from each OSD host". As I have a cephadm deployment, I will have to
execute ceph-objectstore-tool within each container. Unfortunately, this
produces the error "Mount failed with '(11) Resource temporarily
unavailable'". Does anybody know how to solve this?

Best regards,
Jonas

On 2025-04-10 16:04, Robert Sander wrote:

Hi Jonas,

Am 4/10/25 um 16:01 schrieb Jonas Schwab:


I believe I accidentally nuked all monitor of my cluster (please don't
ask how). Is there a way to recover from this desaster? I have a cephadm
setup.


There is a procedure to recover the MON-DB from the OSDs:

https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds


Regards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor


Thank you for the help! Does that mean stopping the container and
mounting the lv?

On 2025-04-10 17:38, Eugen Block wrote:

You have to stop the OSDs in order to mount them with the objectstore
tool.

Zitat von Jonas Schwab :


No, didn't issue any commands to the OSDs.

On 2025-04-10 17:28, Eugen Block wrote:

Did you stop the OSDs?


Zitat von Jonas Schwab :


Thank you very much! I now stated the first step, namely "Collect the
map from each OSD host". As I have a cephadm deployment, I will
have to
execute ceph-objectstore-tool within each container. Unfortunately,
this
produces the error "Mount failed with '(11) Resource temporarily
unavailable'". Does anybody know how to solve this?

Best regards,
Jonas

On 2025-04-10 16:04, Robert Sander wrote:

Hi Jonas,

Am 4/10/25 um 16:01 schrieb Jonas Schwab:


I believe I accidentally nuked all monitor of my cluster (please
don't
ask how). Is there a way to recover from this desaster? I have a
cephadm
setup.


There is a procedure to recover the MON-DB from the OSDs:

https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds




Regards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Šarūnas Burdulis


On 4/10/25 10:01 AM, Jonas Schwab wrote:

Hello everyone,

I believe I accidentally nuked all monitor of my cluster (please
don't ask how). Is there a way to recover from this desaster?


Depends on how really “nuked.” Are there monitor directories with data 
still under /var/lib/ceph/ by a chance? If so, monitors can be 
started simply as ceph-mon services, at least temporarily, by pointing 
to those directories.


--
Šarūnas Burdulis
Dartmouth Mathematics
math.dartmouth.edu/~sarunas

· https://useplaintext.email ·


OpenPGP_signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

You have to stop the OSDs in order to mount them with the objectstore tool.

Zitat von Jonas Schwab :

No, didn't issue any commands to the OSDs.

On 2025-04-10 17:28, Eugen Block wrote:

Did you stop the OSDs?

Zitat von Jonas Schwab :

Thank you very much! I now stated the first step, namely "Collect the
map from each OSD host". As I have a cephadm deployment, I will have to
execute ceph-objectstore-tool within each container. Unfortunately, this
produces the error "Mount failed with '(11) Resource temporarily
unavailable'". Does anybody know how to solve this?

Best regards,
Jonas

On 2025-04-10 16:04, Robert Sander wrote:

Hi Jonas,

Am 4/10/25 um 16:01 schrieb Jonas Schwab:

I believe I accidentally nuked all monitor of my cluster (please don't
ask how). Is there a way to recover from this desaster? I have a
cephadm
setup.

There is a procedure to recover the MON-DB from the OSDs:

https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

Regards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSDs ignore memory limit

2025-04-10 Thread Frédéric Nass

Hi Jonas,

Is swap enabled on OSD nodes?

I've seen OSDs using way more memory than osd_memory_target and being 
OOM-killed from time to time just because swap was enabled. If that's the case, 
please disable swap in /etc/fstab and reboot the system.

Regards,
Frédéric.


De : Jonas Schwab 
Envoyé : mercredi 9 avril 2025 13:54
À : ceph-users@ceph.io
Objet : [ceph-users] OSDs ignore memory limit

Hello everyone,

I recently have many problems with OSDs using much more memory than they
are supposed to (> 10GB), leading to the node running out of memory and
killing processes. Does someone have ideas why the daemons seem
to completely ignore the set memory limits?

See e.g. the following:

$ ceph orch ps ceph2-03
NAME    HOST  PORTS   STATUS REFRESHED  AGE  MEM
USE  MEM LIM  VERSION  IMAGE ID CONTAINER ID
mon.ceph2-03    ceph2-03  running (3h)   1s ago  
2y 501M    2048M  19.2.1   f2efb0401a30  d876fc30f741
node-exporter.ceph2-03  ceph2-03  *:9100  running (3h)   1s ago 
17M    46.5M    -  1.7.0    72c9c2088986  d32ec4d266ea
osd.4   ceph2-03  running (26m)  1s ago  
2y    10.2G    3310M  19.2.1   f2efb0401a30  b712a86dacb2
osd.11  ceph2-03  running (5m)   1s ago  
2y    3458M    3310M  19.2.1   f2efb0401a30  f3d7705325b4
osd.13  ceph2-03  running (3h)   1s ago  
6d    2059M    3310M  19.2.1   f2efb0401a30  980ee7e11252
osd.17  ceph2-03  running (114s) 1s ago  
2y    3431M    3310M  19.2.1   f2efb0401a30  be7319fda00b
osd.23  ceph2-03  running (30m)  1s ago  
2y    10.4G    3310M  19.2.1   f2efb0401a30  9cfb86c4b34a
osd.29  ceph2-03  running (8m)   1s ago  
2y    4923M    3310M  19.2.1   f2efb0401a30  d764930bb557
osd.35  ceph2-03  running (14m)  1s ago  
2y    7029M    3310M  19.2.1   f2efb0401a30  6a4113adca65
osd.59  ceph2-03  running (2m)   1s ago  
2y    2821M    3310M  19.2.1   f2efb0401a30  8871d6d4f50a
osd.61  ceph2-03  running (49s)  1s ago  
2y    1090M    3310M  19.2.1   f2efb0401a30  3f7a0ed17ac2
osd.67  ceph2-03  running (7m)   1s ago  
2y    4541M    3310M  19.2.1   f2efb0401a30  eea0a6bcefec
osd.75  ceph2-03  running (3h)   1s ago  
2y    1239M    3310M  19.2.1   f2efb0401a30  5a801902340d

Best regards,
Jonas

--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Urgent help: I accidentally nuked all my Monitor


Hello everyone,

I believe I accidentally nuked all monitor of my cluster (please don't
ask how). Is there a way to recover from this desaster? I have a cephadm
setup.

I am very grateful for all help!

Best regards,
Jonas Schwab
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

2025-04-10 Thread Robert Sander


Hi Jonas,

Am 4/10/25 um 16:01 schrieb Jonas Schwab:


I believe I accidentally nuked all monitor of my cluster (please don't
ask how). Is there a way to recover from this desaster? I have a cephadm
setup.


There is a procedure to recover the MON-DB from the OSDs:

https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

Regards
--
Robert Sander
Linux Consultant

Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: +49 30 405051 - 0
Fax: +49 30 405051 - 19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

Thanks!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] nodes with high density of OSDs

2025-04-10 Thread Alex from North

Hello everybody!
I have a 4 nodes with 112 OSDs each and 18.2.4. OSD consist of db on SSD and 
data on HDD
For some reason, when I reboot node, not all OSDs get up because some VG or LV 
are not active.
To make it alive again I manually do vgchange -ay $VG_NAME or lvchange -ay 
$LV_NAME.

I suspect it is linked to high amount of vg/lv but cannot find an answer.

Maybe you can gimme a hint how to struggle it over?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: nodes with high density of OSDs

2025-04-10 Thread Dominique Ramaekers

Hi Alex,

Which OS? I had the same problem regarding not automatic activation of LVM's on 
an older version of Ubuntu. I never found a workaround except by upgrading to a 
newer release.

> -Oorspronkelijk bericht-
> Van: Alex from North 
> Verzonden: donderdag 10 april 2025 13:17
> Aan: ceph-users@ceph.io
> Onderwerp: [ceph-users] nodes with high density of OSDs
> 
> Hello everybody!
> I have a 4 nodes with 112 OSDs each and 18.2.4. OSD consist of db on SSD and
> data on HDD For some reason, when I reboot node, not all OSDs get up
> because some VG or LV are not active.
> To make it alive again I manually do vgchange -ay $VG_NAME or lvchange -ay
> $LV_NAME.
> 
> I suspect it is linked to high amount of vg/lv but cannot find an answer.
> 
> Maybe you can gimme a hint how to struggle it over?
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email
> to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: nodes with high density of OSDs

2025-04-10 Thread Alex from North

Hello Dominique!
Os is quite new - Ubuntu 22.04 with all the latest upgrades.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

It can work, but it might be necessary to modify the monmap first,  
since it's complaining that it has been removed from it. Are you  
familiar with the monmap-tool  
(https://docs.ceph.com/en/latest/man/8/monmaptool/)?


The procedure is similar to changing a monitor's IP address the "messy  
way"  
(https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method).


I also wrote a blog post how to do it with cephadm:
https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/

But before changing anything, I'd inspect first what the current  
status is. You can get the current monmap from  within the mon  
container (is it still there?):


cephadm shell --name mon.
ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out monmap
monmaptool --print monmap

You can paste the output here, if you want.

Zitat von Jonas Schwab :


I realized, I have access to a data directory of a monitor I removed
just before the oopsie happened. Can I launch a ceph-mon from that? If I
try just to launch ceph-mon, it commits suicide:

2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???)
e29 not in monmap and have been in a quorum before; must have been removed
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???)
e29 commit suicide!
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize

On 2025-04-10 16:01, Jonas Schwab wrote:

Hello everyone,

I believe I accidentally nuked all monitor of my cluster (please don't
ask how). Is there a way to recover from this desaster? I have a cephadm
setup.

I am very grateful for all help!

Best regards,
Jonas Schwab
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Diskprediction_local mgr module removal - Call for feedback

2025-04-10 Thread Lukasz Borek

+1

I wasn't aware that this module is obsolete and was trying to start it a
few weeks ago.

We develop a home-made  solution some time ago to monitor smart data from
both HDD (uncorrected errors, grown defect list) and SSD (WLC/TBW). But
keeping it up to date with non-unified disk models is a nightmare.

Alert : "OSD.12 is going to fail. Replace it soon" before seeing SLOW_OPS
would be a game changer!

Thanks!

On Tue, 8 Apr 2025 at 10:00, Michal Strnad  wrote:

> Hi.
>
>  From our point of view, it's important to keep disk failure prediction
> tool as part of Ceph, ideally as an MGR module. In environments with
> hundreds or thousands of disks, it's crucial to know whether, for
> example, a significant number of them are likely to fail within a month
> - which, in the best-case scenario, would mean performance degradation,
> and in the worst-case, data loss.
>
> Some have already responded to the deprecation of diskprediction by
> starting to develop their own solutions. For instance, just yesterday,
> Daniel Persson published a solution [1] on his website that addresses
> the same problem.
>
> Would it be possible to join forces and try to revive that module?
>
> [1] https://www.youtube.com/watch?v=Gr_GtC9dcMQ
>
> Thanks,
> Michal
>
>
> On 4/8/25 01:18, Yaarit Hatuka wrote:
> > Hi everyone,
> >
> > On today's Ceph Steering Committee call we discussed the idea of removing
> > the diskprediction_local mgr module, as the current prediction model is
> > obsolete and not maintained.
> >
> > We would like to gather feedback from the community about the usage of
> this
> > module, and find out if anyone is interested in maintaining it.
> >
> > Thanks,
> > Yaarit
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Łukasz Borek
luk...@borek.org.pl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor


I realized, I have access to a data directory of a monitor I removed
just before the oopsie happened. Can I launch a ceph-mon from that? If I
try just to launch ceph-mon, it commits suicide:

2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???)
e29 not in monmap and have been in a quorum before; must have been removed
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???)
e29 commit suicide!
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize

On 2025-04-10 16:01, Jonas Schwab wrote:

Hello everyone,

I believe I accidentally nuked all monitor of my cluster (please don't
ask how). Is there a way to recover from this desaster? I have a cephadm
setup.

I am very grateful for all help!

Best regards,
Jonas Schwab
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Diskprediction_local mgr module removal - Call for feedback

2025-04-10 Thread Anthony D'Atri



>> anthonydatri@Mac models % pwd
>> /Users/anthonydatri/git/ceph/src/pybind/mgr/diskprediction_local/models
>> anthonydatri@Mac models % file redhat/*
>> redhat/config.json:   JSON data
>> redhat/hgst_predictor.pkl:data
>> redhat/hgst_scaler.pkl:   data
>> redhat/seagate_predictor.pkl: data
>> redhat/seagate_scaler.pkl:data
>> anthonydatri@Mac models %
> 
> These are Python pickle files from 2019 containing ML models made with a 
> version of sklearn from 2019.

Leerer Blick

IMHO binaries don’t belong in git repositories and the approach kinda sounds 
like trying to be clever and trendy for the sake of being clever and trendy.  
Cf. the KISS principle.  By which I mean keeping it simple, not lip-syncing 
when you should have retired in the 1990s.

I’ve had good luck in the past with an (admittedly ugly) SMART collector that 
dumped harmonized metrics into the textfile_collector directory for 
node_exporter to pick up, then using conventional Alertmanager rules, which are 
easy to write, improve, and tweak for local conditions.

If kept as a Manager module I could see this being yet another thing hampering 
scalability.

Were we to implement a framework for normalizing metrics for given drive models 
— and honestly that’s what it takes to be useful — the community could PR the 
individual SKU entries over time.  I would draw a line in the sand up front:  
no client SKUs will be accepted, no USB/Thunderbolt drives, no HBA/SAN mirages. 
 Only physical, enterprise drive SKUs.  Client drive failures are trivially 
predicted as simply SOON.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

I did have to add "su root root" to the log rotate script to fix the
permissions issue.
There's a RH KB article and Ceph github pull requests to fix it.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor


Thank you very much! I now stated the first step, namely "Collect the
map from each OSD host". As I have a cephadm deployment, I will have to
execute ceph-objectstore-tool within each container. Unfortunately, this
produces the error "Mount failed with '(11) Resource temporarily
unavailable'". Does anybody know how to solve this?

Best regards,
Jonas

On 2025-04-10 16:04, Robert Sander wrote:

Hi Jonas,

Am 4/10/25 um 16:01 schrieb Jonas Schwab:


I believe I accidentally nuked all monitor of my cluster (please don't
ask how). Is there a way to recover from this desaster? I have a cephadm
setup.


There is a procedure to recover the MON-DB from the OSDs:

https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds


Regards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Image Live-Migration does not respond to OpenStack Glance images


Hi,

has it worked for any other glance image? The snapshot shouldn't make  
any difference, I just tried the same in a lab cluster. Have you  
checked on the client side (OpenStack) for anything in dmesg etc.? Can  
you query any information from that image? For example:


rbd info images_meta/image_name
rbd status images_meta/image_name

Is the Ceph cluster healthy? Maybe you have inactive PGs on the glance pool?

Zitat von "Yuta Kambe (Fujitsu)" :


Hi everyone.

I am trying Image Live-Migration but it is not working well and I  
would like some advice.

https://docs.ceph.com/en/latest/rbd/rbd-live-migration/

I use Ceph as a backend for OpenStack Glance.
I tried to migrate the Pool of Ceph used in Glance to the new Pool.

Source Pool:
- images_meta : metadata pool, Replication
- images_data : data pool, Erasure Code
Target Pool:
- images_meta: metadata pool, Replication (Same as source Pool)
- images_data_hdd: data pool, Erasure Code

The following command I executed, but did not return a response.

rbd migration prepare images_meta/image_name images_meta/image_name  
--data-pool images_data_hdd


I checked the logs in /var/log/messages and /var/log/ceph, but no  
useful information was available.

I would like some advice on this.
- Are there any other logs I should check?
- Is there a case where the rbd migration command cannot be executed?

The following is supplemental information.
- ceph version 17.2.8
- The migration of the OpenStack Nova image was successful with the  
same Pool configuration and command.
- I don't know if it is related, but there is a snapshot in the  
image of Glance, and unprotect of the snapshot is also unresponsive.

  rbd snap unprotect images_meta/image_name@snap
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: nodes with high density of OSDs


That's quite a large number of storage units per machine.

My suspicion is that since you have apparently an unusually high number 
of LVs coming online at boot, the time it takes to linearly activate 
them is long enough to overlap with the point in time that ceph starts 
bringing up its storage-dependent components. Likely not only OSDs, but 
other resources that might keep internal databases and the like.


The cure for that under systemd would be to make Ceph - or at least its 
storage-dependent services - wait on LV availability.


The fun part is figuring out how to do that. Offhand, I don't know what 
in systemd controls the activation of LVM resources and it's almost 
certainly being done asynchronously, so you'd need to provide a detector 
service that could determine when things were available. Then you'd have 
to tweak Ceph not to start until the safe time has arrived. You might be 
able to edit the master ceph target to add such a dependency using an 
/etc/systemd/system override, but admittedly that doesn't cover allowing 
everything to come up as soon as possible but no sooner.


In particular, it would be hard to edit the individual OSDs to wait on 
their LVs, as the systemd components for OSDs on an administered system 
are constructed dynamically and do not persist when the system reboots, 
so it would likely require a worst-case delay.


   Regards,

   Tim

On 4/10/25 07:45, Alex from North wrote:

Hello Dominique!
Os is quite new - Ubuntu 22.04 with all the latest upgrades.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph squid fresh install

2025-04-10 Thread quag...@bol.com.br

More complete description:

1-) I formatted and installed the operating system

2-) This is "ceph installed":

curl --silent --remote-name --location https://download.ceph.com/rpm-19.2.1/el9/noarch/cephadm
chmod +x cephadm

./cephadm add-repo --release squid
./cephadm install

cephadm -v bootstrap --mon-ip 172.27.254.6 --cluster-network 172.28.254.0/24 --log-to-file

cephadm install ceph-common
 


De: "Anthony D'Atri" 
Enviada: 2025/04/08 10:35:22
Para: quag...@bol.com.br
Cc:  ebl...@nde.ag, ceph-users@ceph.io
Assunto:  Re: [ceph-users] Ceph squid fresh install
 
What does “ceph installed” mean?  I suspect that this description is not complete.

 

On Apr 8, 2025, at 9:21 AM, quag...@bol.com.br wrote:
 

What is a “storage server”?
  These are machines that only have the operating system and ceph installed.
 



De: "Anthony D'Atri" 
Enviada: 2025/04/08 10:19:08
Para: quag...@bol.com.br
Cc:  ebl...@nde.ag, ceph-users@ceph.io
Assunto:  Re: [ceph-users] Ceph squid fresh install
 


> On Apr 8, 2025, at 9:13 AM, quag...@bol.com.br wrote:
>
> These 2 IPs are from the storage servers.

What is a “storage server”?


> There are no user processes running on them. It only has the operating system and ceph installed.

Nobody said anything about user processes.

>
>
> Rafael.
>
> De: "Eugen Block" 
> Enviada: 2025/04/08 09:35:35
> Para: quag...@bol.com.br
> Cc: ceph-users@ceph.io
> Assunto: Re: [ceph-users] Ceph squid fresh install
>
> These are your two Luminous clients:
>
> ---snip---
> {
> "name": "unknown.0",
> "entity_name": "client.admin",
> "addrs": {
> "addrvec": [
> {
> "type": "none",
> "addr": "172.27.254.7:0",
> "nonce": 443842330
> }
> ]
> },
> "socket_addr": {
> "type": "none",
> "addr": "172.27.254.7:0",
> "nonce": 443842330
> },
> "con_type": "client",
> "con_features": 3387146417253690110,
> "con_features_hex": "2f018fb87aa4aafe",
> "con_features_release": "luminous",
> ...
>
> {
> "name": "client.104098",
> "entity_name": "client.admin",
> "addrs": {
> "addrvec": [
> {
> "type": "v1",
> "addr": "172.27.254.6:0",
> "nonce": 2027668300
> }
> ]
> },
> "socket_addr": {
> "type": "v1",
> "addr": "172.27.254.6:0",
> "nonce": 2027668300
> },
> "con_type": "client",
> "con_features": 3387146417253690110,
> "con_features_hex": "2f018fb87aa4aafe",
> "con_features_release": "luminous",
> ---snip---
>
> Zitat von quag...@bol.com.br:
>
> > Hi Eugen! Thanks a lot! I was able to find luminous connections,
> > but I still can't identify which client process. Here is the output:
> > Rafael.
> > ──
> > De: "Eugen Block" Enviada: 2025/04/08 04:37:47 Para: ceph-users@ceph.io
> > Assunto: [ceph-users] Re: Ceph squid fresh install Hi, you can query
> > the MON sessions to identify your older clients with: ceph tell mon.
> > sessions It will show you the IP address, con_features_release (Luminous)
> > and a couple of other things. Zitat von Laura Flores : > Hi Rafael, >> I
> > would not force the min_compat_client to be reef when there are still >
> > luminous clients connected, as it is important for all clients to be >=Reef
> >> to understand/encode the pg_upmap_primary feature in the osdmap. >> As
> > for checking which processes are still luminous, I am copying @Radoslaw >
> > Zarzynski who may be able to help more with that. >> Thanks, > Laura
> > Flores >> On Mon, Apr 7, 2025 at 11:30 AM quag...@bol.com.br > wrote:
>  Hi, >> I just did a new Ceph installation and would like to enable the
> > "read >> balancer". >> However, the documentation requires that the minimum
> > client version >> be reef. I checked this information through "ceph
> > features" and came across >> the situation of having 2 luminous clients. >>
> > # ceph features >> { >> "mon": [ >> { >> "features": "0x3f03cffd",
> >>> "release": "squid", >> "num": 2 >> } >> ], >> "mds": [ >> { >>
> > "features": "0x3f03cffd", >> "release": "squid", >> "num": 2 >> }
> >>> ], >> "osd": [ >> { >> "features": "0x3f03cffd", >> "release":
> > "squid", >> "num": 38 >> } >> ], >> "client": [ >> { >> "features":
> > "0x2f018fb87aa4aafe", >> "release": "luminous", >> "num": 2 >> }, >> { >>
> > "features": "0x3f03cffd", >> "release": "squid", >> "num": 5 >> }
> >>> ], >> "mgr": [ >> { >> "features": "0x3f03cffd", >> "release":
> > "squid", >> "num": 2 >> } >> ] >> }  I tryed to configure the minimum
> > version to reef and received the >> following alert: >> # ceph osd
> > set-require-min-compat-client reef >> Error EPERM: cannot set
> > require_min_compat_client to reef: 2 connected >> client(s) look like
> > luminous (missing 0x8000); add >> --yes-i-really-mean-it to do it
> > anyway  Is it ok do confirm anyway? >> Which processes are still as
> > luminous?  Rafael. >> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an
> > email to ceph-users-le...@ceph.io >>

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

Is this bit of code responsible for hardcoding DEBUG to cephadm.log?

'loggers': {
'': {
'level': 'DEBUG',
'handlers': ['console', 'log_file'],
}
}

in /var/lib/ceph//cephadm.* ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor


Again, thank you very much for your help!

The container is not there any more, but I discovered that the "old" mon
data still exists. I have the same situation for two mons I removed at
the same time:

$ monmaptool --print monmap1
monmaptool: monmap file monmap1
epoch 29
fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96
last_changed 2025-04-10T14:16:21.203171+0200
created 2021-02-26T14:02:29.522695+0100
min_mon_release 19 (squid)
election_strategy: 1
0: [v2:10.127.239.2:3300/0,v1:10.127.239.2:6789/0] mon.ceph2-02
1: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04
2: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06
3: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05

$ monmaptool --print monmap2
monmaptool: monmap file monmap2
epoch 30
fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96
last_changed 2025-04-10T14:16:43.216713+0200
created 2021-02-26T14:02:29.522695+0100
min_mon_release 19 (unknown)
election_strategy: 1
0: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04
1: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06
2: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05

Would it be feasible to move the data from node1 (which still contains
node2 as mon) to node2, or would that just result in even more mess?


On 2025-04-10 19:57, Eugen Block wrote:

It can work, but it might be necessary to modify the monmap first,
since it's complaining that it has been removed from it. Are you
familiar with the monmap-tool
(https://docs.ceph.com/en/latest/man/8/monmaptool/)?

The procedure is similar to changing a monitor's IP address the "messy
way"
(https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method).


I also wrote a blog post how to do it with cephadm:
https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/


But before changing anything, I'd inspect first what the current
status is. You can get the current monmap from  within the mon
container (is it still there?):

cephadm shell --name mon.
ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out
monmap
monmaptool --print monmap

You can paste the output here, if you want.

Zitat von Jonas Schwab :


I realized, I have access to a data directory of a monitor I removed
just before the oopsie happened. Can I launch a ceph-mon from that? If I
try just to launch ceph-mon, it commits suicide:

2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???)
e29 not in monmap and have been in a quorum before; must have been
removed
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???)
e29 commit suicide!
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize

On 2025-04-10 16:01, Jonas Schwab wrote:

Hello everyone,

I believe I accidentally nuked all monitor of my cluster (please don't
ask how). Is there a way to recover from this desaster? I have a
cephadm
setup.

I am very grateful for all help!

Best regards,
Jonas Schwab
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

It depends a bit. Which mon do the OSDs still know about? You can  
check /var/lib/ceph//osd.X/config to retrieve that piece of  
information. I'd try to revive one of them.
Do you still have the mon store.db for all of the mons or at least one  
of them? Just to be safe, back up all the store.db directories.


Then modify a monmap to contain the one you want to revive by removing  
the other ones. Backup your monmap files as well. Then inject the  
modified monmap into the daemon and try starting it.


Zitat von Jonas Schwab :


Again, thank you very much for your help!

The container is not there any more, but I discovered that the "old" mon
data still exists. I have the same situation for two mons I removed at
the same time:

$ monmaptool --print monmap1
monmaptool: monmap file monmap1
epoch 29
fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96
last_changed 2025-04-10T14:16:21.203171+0200
created 2021-02-26T14:02:29.522695+0100
min_mon_release 19 (squid)
election_strategy: 1
0: [v2:10.127.239.2:3300/0,v1:10.127.239.2:6789/0] mon.ceph2-02
1: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04
2: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06
3: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05

$ monmaptool --print monmap2
monmaptool: monmap file monmap2
epoch 30
fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96
last_changed 2025-04-10T14:16:43.216713+0200
created 2021-02-26T14:02:29.522695+0100
min_mon_release 19 (unknown)
election_strategy: 1
0: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04
1: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06
2: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05

Would it be feasible to move the data from node1 (which still contains
node2 as mon) to node2, or would that just result in even more mess?


On 2025-04-10 19:57, Eugen Block wrote:

It can work, but it might be necessary to modify the monmap first,
since it's complaining that it has been removed from it. Are you
familiar with the monmap-tool
(https://docs.ceph.com/en/latest/man/8/monmaptool/)?

The procedure is similar to changing a monitor's IP address the "messy
way"
(https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method).


I also wrote a blog post how to do it with cephadm:
https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/


But before changing anything, I'd inspect first what the current
status is. You can get the current monmap from  within the mon
container (is it still there?):

cephadm shell --name mon.
ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out
monmap
monmaptool --print monmap

You can paste the output here, if you want.

Zitat von Jonas Schwab :


I realized, I have access to a data directory of a monitor I removed
just before the oopsie happened. Can I launch a ceph-mon from that? If I
try just to launch ceph-mon, it commits suicide:

2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???)
e29 not in monmap and have been in a quorum before; must have been
removed
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???)
e29 commit suicide!
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize

On 2025-04-10 16:01, Jonas Schwab wrote:

Hello everyone,

I believe I accidentally nuked all monitor of my cluster (please don't
ask how). Is there a way to recover from this desaster? I have a
cephadm
setup.

I am very grateful for all help!

Best regards,
Jonas Schwab
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

2025-04-10 Thread Laimis Juzeliūnas

Hey all,

Just confirming that the same debug level has been in Reef and Squid.
We got so used to it that just decided not to care anymore.


Best,
Laimis J.

> On 8 Apr 2025, at 14:21, Alex  wrote:
> 
> Interesting. So it's like that for everybody?
> Meaning cephadm.log logs debug messages.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: nodes with high density of OSDs


Peter,

I don't think udev factors in based on the original question. Firstly, 
because I'm not sure udev deals with permanently-attached devices (it's 
more for hot-swap items). Secondly, because the original complaint 
mentioned LVM specifically.


I agree that the hosts seem overloaded, by the way. It sounds like large 
disks are being subdivided into many smaller disks, which would be bad 
for Ceph to do on HDDs, and while SSDs don't have the seek and 
rotational liabilities of HDDs, it's still questionable as to how many 
connections you really should be making to one physical unit that way.


Ceph, for reasons I never discovered prefers that you create OSDs that 
either own an entire physical disk or an LVM Logical Volume, but NOT a 
disk partition. I find it curious, since LVs aren't necessarily 
contiguous space (again, more of a liability for HDDs than SSDs). unlike 
traditional partitions, but there you are. Incidentally, LVs are 
contained in Volume Groups, and the whole can end up with parts 
scattered over multiple Physical Volumes (PVs).


When an LVM-supporting OS boots, part of the process is to run an lvscan 
(lvscan -ay) to locate and activate Logical Volumes, and from the 
information given, it's assumed that the lvscan process hasn't completed 
before Ceph starts up and begins trying to use them. The boot lvscan is 
normally pretty quick, since it would be rare to have more than a dozen 
or so LVs in the system.


But in this case, more than 100 LVs are being configured at boot time 
and the systemd boot process doesn't currently account for the extra 
time needed to do that.


If I haven't got my facts too badly scrambled, LVs end up being mapped 
to dm devices, but that's something I normally only pay attention to 
when hardware isn't behaving so I'm not really expert on that.


Hope that helps,

   Tim

On 4/10/25 16:43, Peter Grandi wrote:

I have a 4 nodes with 112 OSDs each [...]

As an aside I rekon that is not such a good idea as Ceph was
designed for one-small-OSD per small-server and lots of them,
but lots of people of course know better.


Maybe you can gimme a hint how to struggle it over?

That is not so much a Ceph question but a distribution question
anyhow there are two possible hints that occur to me:

* In most distributions the automatic activation of block
   devices is done by the kernel plus 'udevd' rules and/or
   'systemd' units.

* There are timeouts for activation of storage devices and on a
   system with many, depending on type etc., there may be a
   default setting to activate them serially instead of in
   parallel to prevent sudden power consumption and other surges,
   so some devices may not activate because of timeouts.

You can start by asking the sysadmin for those machines to look
at system logs (distribution dependent) for storage device
activation reports to confirm whether the guesses above apply to
your situation and if confirmed you can ask them to change the
relevant settings for the distribution used.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Repo name bug?

I created a pull request, not sure what the etiquette is if I can
merge it. First timer here.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Repo name bug?

2025-04-10 Thread kefu chai

On Fri, Apr 11, 2025 at 10:39 AM Alex  wrote:

> I created a pull request, not sure what the etiquette is if I can
> merge it. First timer here.
>

hi Alex, I cannot find your pull request in
https://github.com/ceph/cephadm-ansible/ . did you create it in this
project?

> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

-- 
Regards
Kefu Chai
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSDs ignore memory limit

2025-04-10 Thread Mark Nelson


Hi Jonas,

Anthony gave some good advice for some things to check.  You can also 
dump the mempool statistics for OSDs that you identify are over their 
memory target using: "ceph daemon osd.NNN dump_mempools"


The osd_memory_target code basically looks at the memory usage of the 
process and then periodically grows or shrinks the aggregate memory for 
caches based on how far off the process usage is from the target.  It's 
not perfect, but generally keeps memory close to the target size.  It 
can't do anything if there is a memory leak or other component driving 
the overall memory usage higher than the target though.  One example of 
this is that in erasure coded pools, huge xattrs on objects can drive 
pglog memory usage extremely high and the osd_memory_autotuning may not 
be able to compensate for it.


Having said this, I'd suggest looking at the actual targets and the 
mempools and see if you can figure out where the memory is going and if 
its truly over the target.  The targets themselves can be autotuned 
higher up in the stack in some cases.



Mark

On 4/9/25 07:52, Jonas Schwab wrote:

Hello everyone,

I recently have many problems with OSDs using much more memory than they
are supposed to (> 10GB), leading to the node running out of memory and
killing processes. Does someone have ideas why the daemons seem
to completely ignore the set memory limits?

See e.g. the following:

$ ceph orch ps ceph2-03
NAME    HOST  PORTS   STATUS REFRESHED  AGE MEM
USE  MEM LIM  VERSION  IMAGE ID CONTAINER ID
mon.ceph2-03    ceph2-03  running (3h)   1s ago
2y 501M    2048M  19.2.1   f2efb0401a30  d876fc30f741
node-exporter.ceph2-03  ceph2-03  *:9100  running (3h)   1s ago
17M    46.5M    -  1.7.0    72c9c2088986  d32ec4d266ea
osd.4   ceph2-03  running (26m)  1s ago
2y    10.2G    3310M  19.2.1   f2efb0401a30  b712a86dacb2
osd.11  ceph2-03  running (5m)   1s ago
2y    3458M    3310M  19.2.1   f2efb0401a30  f3d7705325b4
osd.13  ceph2-03  running (3h)   1s ago
6d    2059M    3310M  19.2.1   f2efb0401a30  980ee7e11252
osd.17  ceph2-03  running (114s) 1s ago
2y    3431M    3310M  19.2.1   f2efb0401a30  be7319fda00b
osd.23  ceph2-03  running (30m)  1s ago
2y    10.4G    3310M  19.2.1   f2efb0401a30  9cfb86c4b34a
osd.29  ceph2-03  running (8m)   1s ago
2y    4923M    3310M  19.2.1   f2efb0401a30  d764930bb557
osd.35  ceph2-03  running (14m)  1s ago
2y    7029M    3310M  19.2.1   f2efb0401a30  6a4113adca65
osd.59  ceph2-03  running (2m)   1s ago
2y    2821M    3310M  19.2.1   f2efb0401a30  8871d6d4f50a
osd.61  ceph2-03  running (49s)  1s ago
2y    1090M    3310M  19.2.1   f2efb0401a30  3f7a0ed17ac2
osd.67  ceph2-03  running (7m)   1s ago
2y    4541M    3310M  19.2.1   f2efb0401a30  eea0a6bcefec
osd.75  ceph2-03  running (3h)   1s ago
2y    1239M    3310M  19.2.1   f2efb0401a30  5a801902340d

Best regards,
Jonas

--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Best Regards,
Mark Nelson
Head of Research and Development

Clyso GmbH
p: +49 89 21552391 12 | a: Minnesota, USA
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph squid fresh install

2025-04-10 Thread quag...@bol.com.br

Hi,
 I just did a new Ceph installation and would like to enable the "read 
balancer".
 However, the documentation requires that the minimum client version be 
reef. I checked this information through "ceph features" and came across the 
situation of having 2 luminous clients.
# ceph features 
{
"mon": [
{
"features": "0x3f03cffd",
"release": "squid",
"num": 2
}
],
"mds": [
{
"features": "0x3f03cffd",
"release": "squid",
"num": 2
}
],
"osd": [
{
"features": "0x3f03cffd",
"release": "squid",
"num": 38
}
],
"client": [
{
"features": "0x2f018fb87aa4aafe",
"release": "luminous",
"num": 2
},
{
"features": "0x3f03cffd",
"release": "squid",
"num": 5
}
],
"mgr": [
{
"features": "0x3f03cffd",
"release": "squid",
"num": 2
}
]
}

 I tryed to configure the minimum version to reef and received the 
following alert:
# ceph osd set-require-min-compat-client reef
Error EPERM: cannot set require_min_compat_client to reef: 2 connected 
client(s) look like luminous (missing 0x8000); add --yes-i-really-mean-it 
to do it anyway

 Is it ok do confirm anyway?
 Which processes are still as luminous?

Rafael.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

I think it's the same block of code Eugen found.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: nodes with high density of OSDs

2025-04-10 Thread Peter Grandi

> I have a 4 nodes with 112 OSDs each [...]

As an aside I rekon that is not such a good idea as Ceph was
designed for one-small-OSD per small-server and lots of them,
but lots of people of course know better.

> Maybe you can gimme a hint how to struggle it over?

That is not so much a Ceph question but a distribution question
anyhow there are two possible hints that occur to me:

* In most distributions the automatic activation of block
  devices is done by the kernel plus 'udevd' rules and/or
  'systemd' units.

* There are timeouts for activation of storage devices and on a
  system with many, depending on type etc., there may be a
  default setting to activate them serially instead of in
  parallel to prevent sudden power consumption and other surges,
  so some devices may not activate because of timeouts.

You can start by asking the sysadmin for those machines to look
at system logs (distribution dependent) for storage device
activation reports to confirm whether the guesses above apply to
your situation and if confirmed you can ask them to change the
relevant settings for the distribution used.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor


I edited the monmap to include only rgw2-06 and then followed
https://docs.ceph.com/en/squid/rados/operations/add-or-rm-mons/#adding-a-monitor-manual
to create a new monitor.

Unfortunately, `ceph-mon -i mon.rgw2-06 --public-addr 10.127.239.63 -f`
crashed with the traceback seen in the attachment.

On 2025-04-10 20:34, Eugen Block wrote:

It depends a bit. Which mon do the OSDs still know about? You can
check /var/lib/ceph//osd.X/config to retrieve that piece of
information. I'd try to revive one of them.
Do you still have the mon store.db for all of the mons or at least one
of them? Just to be safe, back up all the store.db directories.

Then modify a monmap to contain the one you want to revive by removing
the other ones. Backup your monmap files as well. Then inject the
modified monmap into the daemon and try starting it.

Zitat von Jonas Schwab :


Again, thank you very much for your help!

The container is not there any more, but I discovered that the "old" mon
data still exists. I have the same situation for two mons I removed at
the same time:

$ monmaptool --print monmap1
monmaptool: monmap file monmap1
epoch 29
fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96
last_changed 2025-04-10T14:16:21.203171+0200
created 2021-02-26T14:02:29.522695+0100
min_mon_release 19 (squid)
election_strategy: 1
0: [v2:10.127.239.2:3300/0,v1:10.127.239.2:6789/0] mon.ceph2-02
1: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04
2: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06
3: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05

$ monmaptool --print monmap2
monmaptool: monmap file monmap2
epoch 30
fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96
last_changed 2025-04-10T14:16:43.216713+0200
created 2021-02-26T14:02:29.522695+0100
min_mon_release 19 (unknown)
election_strategy: 1
0: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04
1: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06
2: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05

Would it be feasible to move the data from node1 (which still contains
node2 as mon) to node2, or would that just result in even more mess?


On 2025-04-10 19:57, Eugen Block wrote:

It can work, but it might be necessary to modify the monmap first,
since it's complaining that it has been removed from it. Are you
familiar with the monmap-tool
(https://docs.ceph.com/en/latest/man/8/monmaptool/)?

The procedure is similar to changing a monitor's IP address the "messy
way"
(https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method).



I also wrote a blog post how to do it with cephadm:
https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/



But before changing anything, I'd inspect first what the current
status is. You can get the current monmap from  within the mon
container (is it still there?):

cephadm shell --name mon.
ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out
monmap
monmaptool --print monmap

You can paste the output here, if you want.

Zitat von Jonas Schwab :


I realized, I have access to a data directory of a monitor I removed
just before the oopsie happened. Can I launch a ceph-mon from that?
If I
try just to launch ceph-mon, it commits suicide:

2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???)
e29 not in monmap and have been in a quorum before; must have been
removed
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???)
e29 commit suicide!
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize

On 2025-04-10 16:01, Jonas Schwab wrote:

Hello everyone,

I believe I accidentally nuked all monitor of my cluster (please
don't
ask how). Is there a way to recover from this desaster? I have a
cephadm
setup.

I am very grateful for all help!

Best regards,
Jonas Schwab
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerzburg.de
Tel: +49 931 31-84460
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Jonas Schwab

Research Data Management, Cluster of Excellence ct.qmat
https://data.ctqmat.de | datamanagement.ct.q...@listserv.dfn.de
Email: jonas.sch...@uni-wuerz

[ceph-users] v18.2.5 Reef released

2025-04-10 Thread Yuri Weinstein

We're happy to announce the 5th point release in the Reef series.

We recommend users to update to this release.
For detailed release notes with links & changelog please refer to the
official blog entry at https://ceph.io/en/news/blog/2025/v18-2-5-reef-released/


Notable Changes
---
* RBD: The ``try-netlink`` mapping option for rbd-nbd has become the default
  and is now deprecated. If the NBD netlink interface is not supported by the
  kernel, then the mapping is retried using the legacy ioctl interface.

* RADOS: A new command, `ceph osd rm-pg-upmap-primary-all`, has been
added that allows
  users to clear all pg-upmap-primary mappings in the osdmap when desired.

  Related trackers:
   - https://tracker.ceph.com/issues/67179
   - https://tracker.ceph.com/issues/66867

Getting Ceph

* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph_18.2.5.orig.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/en/latest/install/get-packages/
* Release git sha1: a5b0e13f9c96f3b45f596a95ad098f51ca0ccce1
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: NIH Datasets

Sounds like a discussion for a discord server. Or BlueSky or something 
that's very definitely NOT what used to be known as twitter.


My viewpoint is a little different. I really didn't consider HIPAA 
stuff, although since technically that is info that shouldn't be 
accessible to anyone but authorized staff at NIH - and there's the rub, 
if the very persons/offices involved are purged. At that point, what 
we'd really be doing is simply hiding it until a saner regime comes 
along and wants it back.


But it's not just NIH that's being tossed down the Memory Hole. NASA, 
NOAA, and other agencies are also being "cleansed". We should properly 
be safeguarding ALL of that. Reminds me of Isaac Asimov's Foundation - 
an agency to preserve human knowledge over the dark ages.


Also, the idea of having fixed homes for complete documents I feel is 
limiting. I'm minded of how the folding@home project distributed work to 
random volunteers. And again, how ceph can break an object into PGs and 
splatter them to replicas on multiple servers. It's less important for a 
given document server to be 100% online as it is to have the ability for 
nodes to check in and out and maintain a gestalt.


As for the management of all this, I'd say that the top-level domain of 
my theoretical namespace would be a select committee in charge of the 
master servers. sub-domains would be administered by grant from the top 
level and have their own administrators. And so forth until you have 
librarian administrators. Existing examples can be seen in some of the 
larger git archives, such as for Linux. The Wikipedia can also provide 
examples of how to administer tamper-resistant information.


So, in short, I'm proposing a sort of world-wide web of documents. 
Something that can live in the background of ordinary user computers, 
perhaps. But most importantly, reliable, accessible and secure.


  Tim


On 4/7/25 15:33, Linas Vepstas wrote:

Thanks Šarūnai and all who responded.

I guess general discussion will need to go off-list. But first:

To summarize, the situation seems to be this:
* As a general rule, principle investigators (PI) always have a copy
of their "master dataset", which thus is "safe" as long as they don't
lose control over it.
* Certain data sets are popular and are commonly shared.
* NCBI publishes data sets, with the goal of making access easy,
transparent, fast, documented, and shoulders the burden of network
costs, sysadmin, server maintenance, etc. and it is this "free, easy,
managed-for-you" infrastructure that is at risk.
* Unlike climate data, some of the NIH data is covered by HIPAA (e.g.
cancer datasets) because it contains personal identifying information.
I have no clue how this is dealt with. Encryption? Passwords?
Restricted access? Who makes the decision about who is allowed, and
who is not allowed to work with, access, copy or mirror the data? WTF?
I'm clueless here.

  What are the technical problems to be solved? As long as PI's have a
copy of a master dataset, the technical issues are:
-- how to find it?
-- what does it contain?
-- is there enough network bandwidth?
-- can it be copied in full?
-- if it can be, where's the mirrors / backups?
-- If the PI's lab is shut down, who pays for the storage and network
connectivity for the backups?
-- How to protect against loss of backup copies?
-- How to gain access to backup copies?

The above issues sit at the "library science" level: yes, technology
can help, but it's also social and organizational. So it's not really
about "how can we build a utopian decentralized data store" in some
abstract way that shards data across multiple nodes (which is what
IPFS seemed to want to be). Instead, its four-fold:

  * How is the catalog of available data maintained?
  * How is the safety of backup copies ensured?
  * How do we cache data, improve latency, improve bandwidth?
  * How are the administrative burdens shared? (sysadmin, cost of
servers, bandwidth)

This is way far outside of the idea of "let's just harness a bunch of
disks together on the internet", but it is the actual problem being
faced.

-- Linas


On Mon, Apr 7, 2025 at 8:07 AM Šarūnas Burdulis
 wrote:

On 4/4/25 11:39 PM, Linas Vepstas wrote:

OK what you will read below might sound insane but I am obliged to ask.

There are 275 petabytes of NIH data at risk of being deleted. Cancer
research, medical data, HIPAA type stuff. Currently unclear where it's
located, how it's managed, who has access to what, but lets ignore
that for now. It's presumably splattered across data centers, cloud,
AWS, supercomputing labs, who knows. Everywhere.

Similar to climate research data back in 2017... It was all accessible
via FTP or HTTP though. A Climate Mirror initiative was created and a
distributed copy worldwide was made eventually. Essentially, a list of
URLs was provided and some helper scripts to slurp multiple copies of
data repositories.

https://climatemirror.org/
https://github.com/climate-mirror


--

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log


That was my assumption, yes.

Zitat von Alex :


Is this bit of code responsible for hardcoding DEBUG to cephadm.log?

'loggers': {
'': {
'level': 'DEBUG',
'handlers': ['console', 'log_file'],
}
}

in /var/lib/ceph//cephadm.* ?



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph squid fresh install

2025-04-10 Thread quag...@bol.com.br

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

I made a Pull Request for cephadm.log set DEBUG.
Not sure if I should merge it.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: nodes with high density of OSDs

2025-04-10 Thread Anthony D'Atri

Filestore IIRC used partitions, with cute hex GPT types for various states and 
roles.  Udev activation was sometimes problematic, and LVM tags are more 
flexible and reliable than the prior approach.  There no doubt is more to it 
but that’s what I recall.  

> On Apr 10, 2025, at 9:11 PM, Tim Holloway  wrote:
> 
> Peter,
> 
> I don't think udev factors in based on the original question. Firstly, 
> because I'm not sure udev deals with permanently-attached devices (it's more 
> for hot-swap items). Secondly, because the original complaint mentioned LVM 
> specifically.
> 
> I agree that the hosts seem overloaded, by the way. It sounds like large 
> disks are being subdivided into many smaller disks, which would be bad for 
> Ceph to do on HDDs, and while SSDs don't have the seek and rotational 
> liabilities of HDDs, it's still questionable as to how many connections you 
> really should be making to one physical unit that way.
> 
> Ceph, for reasons I never discovered prefers that you create OSDs that either 
> own an entire physical disk or an LVM Logical Volume, but NOT a disk 
> partition. I find it curious, since LVs aren't necessarily contiguous space 
> (again, more of a liability for HDDs than SSDs). unlike traditional 
> partitions, but there you are. Incidentally, LVs are contained in Volume 
> Groups, and the whole can end up with parts scattered over multiple Physical 
> Volumes (PVs).
> 
> When an LVM-supporting OS boots, part of the process is to run an lvscan 
> (lvscan -ay) to locate and activate Logical Volumes, and from the information 
> given, it's assumed that the lvscan process hasn't completed before Ceph 
> starts up and begins trying to use them. The boot lvscan is normally pretty 
> quick, since it would be rare to have more than a dozen or so LVs in the 
> system.
> 
> But in this case, more than 100 LVs are being configured at boot time and the 
> systemd boot process doesn't currently account for the extra time needed to 
> do that.
> 
> If I haven't got my facts too badly scrambled, LVs end up being mapped to dm 
> devices, but that's something I normally only pay attention to when hardware 
> isn't behaving so I'm not really expert on that.
> 
> Hope that helps,
> 
>Tim
> 
> On 4/10/25 16:43, Peter Grandi wrote:
>>> I have a 4 nodes with 112 OSDs each [...]
>> As an aside I rekon that is not such a good idea as Ceph was
>> designed for one-small-OSD per small-server and lots of them,
>> but lots of people of course know better.
>> 
>>> Maybe you can gimme a hint how to struggle it over?
>> That is not so much a Ceph question but a distribution question
>> anyhow there are two possible hints that occur to me:
>> 
>> * In most distributions the automatic activation of block
>>   devices is done by the kernel plus 'udevd' rules and/or
>>   'systemd' units.
>> 
>> * There are timeouts for activation of storage devices and on a
>>   system with many, depending on type etc., there may be a
>>   default setting to activate them serially instead of in
>>   parallel to prevent sudden power consumption and other surges,
>>   so some devices may not activate because of timeouts.
>> 
>> You can start by asking the sysadmin for those machines to look
>> at system logs (distribution dependent) for storage device
>> activation reports to confirm whether the guesses above apply to
>> your situation and if confirmed you can ask them to change the
>> relevant settings for the distribution used.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cephadm flooding /var/log/ceph/cephadm.log

2025-04-10 Thread Anthony D'Atri

Link please. 

> On Apr 10, 2025, at 10:59 PM, Alex  wrote:
> 
> I made a Pull Request for cephadm.log set DEBUG.
> Not sure if I should merge it.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

I solved the problem with executing ceph-mon. Among others, -i
mon.rgw2-06 was not the correct option, but rather -i rgw2-06.
Unfortunately, that brought the next problem:

The cluster now shows "100.000% pgs unknown", which is probably because
the monitor data is not complete up to date, but rather the state it was
in before I switched over to other mons. A few minutes or s after that,
the cluster crashed and I lust the mons. I guess this outdated cluster
map is probably unusable? All services seem to be running fine and there
are not network obstructions.

Should I instead go with this:
https://docs.ceph.com/en/squid/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds
?

I actually already tried the latter option, but ran into the error
`rocksdb: [db/db_impl/db_impl_open.cc:2086] DB::Open() failed: IO error:
while open a file for lock:
/var/lib/ceph/mon/ceph-ceph2-01/store.db/LOCK: Permission denied`
Even though I double checked that the permission and ownership on the
replacing store.db are properly set.

On 2025-04-10 22:45, Jonas Schwab wrote:

I edited the monmap to include only rgw2-06 and then followed
https://docs.ceph.com/en/squid/rados/operations/add-or-rm-mons/#adding-a-monitor-manual

to create a new monitor.

Unfortunately, `ceph-mon -i mon.rgw2-06 --public-addr 10.127.239.63 -f`
crashed with the traceback seen in the attachment.

On 2025-04-10 20:34, Eugen Block wrote:

It depends a bit. Which mon do the OSDs still know about? You can
check /var/lib/ceph//osd.X/config to retrieve that piece of
information. I'd try to revive one of them.
Do you still have the mon store.db for all of the mons or at least one
of them? Just to be safe, back up all the store.db directories.

Then modify a monmap to contain the one you want to revive by removing
the other ones. Backup your monmap files as well. Then inject the
modified monmap into the daemon and try starting it.

Zitat von Jonas Schwab :

Again, thank you very much for your help!

The container is not there any more, but I discovered that the "old"
mon
data still exists. I have the same situation for two mons I removed at
the same time:

$ monmaptool --print monmap1
monmaptool: monmap file monmap1
epoch 29
fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96
last_changed 2025-04-10T14:16:21.203171+0200
created 2021-02-26T14:02:29.522695+0100
min_mon_release 19 (squid)
election_strategy: 1
0: [v2:10.127.239.2:3300/0,v1:10.127.239.2:6789/0] mon.ceph2-02
1: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04
2: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06
3: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05

$ monmaptool --print monmap2
monmaptool: monmap file monmap2
epoch 30
fsid 6d0d4ed4-0052-4eb9-9d9d-e6872ba7ee96
last_changed 2025-04-10T14:16:43.216713+0200
created 2021-02-26T14:02:29.522695+0100
min_mon_release 19 (unknown)
election_strategy: 1
0: [v2:10.127.239.61:3300/0,v1:10.127.239.61:6789/0] mon.rgw2-04
1: [v2:10.127.239.63:3300/0,v1:10.127.239.63:6789/0] mon.rgw2-06
2: [v2:10.127.239.62:3300/0,v1:10.127.239.62:6789/0] mon.rgw2-05

Would it be feasible to move the data from node1 (which still contains
node2 as mon) to node2, or would that just result in even more mess?

On 2025-04-10 19:57, Eugen Block wrote:

It can work, but it might be necessary to modify the monmap first,
since it's complaining that it has been removed from it. Are you
familiar with the monmap-tool
(https://docs.ceph.com/en/latest/man/8/monmaptool/)?

The procedure is similar to changing a monitor's IP address the "messy
way"
(https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method).

I also wrote a blog post how to do it with cephadm:
https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/

But before changing anything, I'd inspect first what the current
status is. You can get the current monmap from within the mon
container (is it still there?):

cephadm shell --name mon.
ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out
monmap
monmaptool --print monmap

You can paste the output here, if you want.

Zitat von Jonas Schwab :

I realized, I have access to a data directory of a monitor I removed
just before the oopsie happened. Can I launch a ceph-mon from that?
If I
try just to launch ceph-mon, it commits suicide:

2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???)
e29 not in monmap and have been in a quorum before; must have been
removed
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 mon.mon.ceph2-01@-1(???)
e29 commit suicide!
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize

On 2025-04-10 16:01, Jonas Schwab wrote:

Hello everyone,

I believe I accidentally nuked all monitor of my cluster (please
don't
ask how). Is there a way to recover from this desaster? I have a
cephadm
setup.

I am very grateful for all help!

Best regards,
Jonas Schwab
__

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

Is at least one mgr running? PG states are reported by the mgr daemon.

Zitat von Jonas Schwab :

I solved the problem with executing ceph-mon. Among others, -i
mon.rgw2-06 was not the correct option, but rather -i rgw2-06.
Unfortunately, that brought the next problem:

Should I instead go with this:
https://docs.ceph.com/en/squid/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds
?

On 2025-04-10 22:45, Jonas Schwab wrote:

I edited the monmap to include only rgw2-06 and then followed
https://docs.ceph.com/en/squid/rados/operations/add-or-rm-mons/#adding-a-monitor-manual

to create a new monitor.

Unfortunately, `ceph-mon -i mon.rgw2-06 --public-addr 10.127.239.63 -f`
crashed with the traceback seen in the attachment.

On 2025-04-10 20:34, Eugen Block wrote:

Then modify a monmap to contain the one you want to revive by removing
the other ones. Backup your monmap files as well. Then inject the
modified monmap into the daemon and try starting it.

Zitat von Jonas Schwab :

Again, thank you very much for your help!

The container is not there any more, but I discovered that the "old"
mon
data still exists. I have the same situation for two mons I removed at
the same time:

Would it be feasible to move the data from node1 (which still contains
node2 as mon) to node2, or would that just result in even more mess?

On 2025-04-10 19:57, Eugen Block wrote:

The procedure is similar to changing a monitor's IP address the "messy
way"
(https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method).

I also wrote a blog post how to do it with cephadm:
https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/

But before changing anything, I'd inspect first what the current
status is. You can get the current monmap from within the mon
container (is it still there?):

cephadm shell --name mon.
ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out
monmap
monmaptool --print monmap

You can paste the output here, if you want.

Zitat von Jonas Schwab :

I realized, I have access to a data directory of a monitor I removed
just before the oopsie happened. Can I launch a ceph-mon from that?
If I
try just to launch ceph-mon, it commits suicide:

On 2025-04-10 16:01, Jonas Schwab wrote:

Hello everyone,

I believe I accidentally nuked all monitor of my cluster (please
don't
ask how). Is there a way to recover from this desaster? I have a
cephadm
setup.

I am ver

[ceph-users] Re: Urgent help: I accidentally nuked all my Monitor

Yes mgrs are running as intended. It just seems that mons and osd don't
recongnize each other, because the monitors map is outdated.

On 2025-04-11 07:07, Eugen Block wrote:

Is at least one mgr running? PG states are reported by the mgr daemon.

Zitat von Jonas Schwab :

I solved the problem with executing ceph-mon. Among others, -i
mon.rgw2-06 was not the correct option, but rather -i rgw2-06.
Unfortunately, that brought the next problem:

Should I instead go with this:
https://docs.ceph.com/en/squid/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds

On 2025-04-10 22:45, Jonas Schwab wrote:

I edited the monmap to include only rgw2-06 and then followed
https://docs.ceph.com/en/squid/rados/operations/add-or-rm-mons/#adding-a-monitor-manual

to create a new monitor.

Unfortunately, `ceph-mon -i mon.rgw2-06 --public-addr 10.127.239.63 -f`
crashed with the traceback seen in the attachment.

On 2025-04-10 20:34, Eugen Block wrote:

Then modify a monmap to contain the one you want to revive by removing
the other ones. Backup your monmap files as well. Then inject the
modified monmap into the daemon and try starting it.

Zitat von Jonas Schwab :

Again, thank you very much for your help!

The container is not there any more, but I discovered that the "old"
mon
data still exists. I have the same situation for two mons I
removed at
the same time:

Would it be feasible to move the data from node1 (which still
contains
node2 as mon) to node2, or would that just result in even more mess?

On 2025-04-10 19:57, Eugen Block wrote:

The procedure is similar to changing a monitor's IP address the
"messy
way"
(https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/#changing-a-monitor-s-ip-address-advanced-method).

I also wrote a blog post how to do it with cephadm:
https://heiterbiswolkig.blogs.nde.ag/2020/12/18/cephadm-changing-a-monitors-ip-address/

But before changing anything, I'd inspect first what the current
status is. You can get the current monmap from within the mon
container (is it still there?):

cephadm shell --name mon.
ceph-monstore-tool /var/lib/ceph/mon/ get monmap -- --out
monmap
monmaptool --print monmap

You can paste the output here, if you want.

Zitat von Jonas Schwab :

I realized, I have access to a data directory of a monitor I
removed
just before the oopsie happened. Can I launch a ceph-mon from that?
If I
try just to launch ceph-mon, it commits suicide:

2025-04-10T19:32:32.174+0200 7fec628c5e00 -1
mon.mon.ceph2-01@-1(???)
e29 not in monmap and have been in a quorum before; must have been
removed
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1
mon.mon.ceph2-01@-1(???)
e29 commit suicide!
2025-04-10T19:32:32.174+0200 7fec628c5e00 -1 failed to initialize

On 2025-04-10 16:01, Jonas Schwab wrote:

[ceph-users] Re: NIH Datasets