[ceph-users] Ceph octopus version cluster not starting

2024-09-16 Thread Amudhan P
Hi, Recently added one disk in Ceph cluster using "ceph-volume lvm create --data /dev/sdX" but the new OSD didn't start. After some rest of the other nodes OSD service also stopped. So, I restarted all nodes in the cluster now after restart. MON, MDS, MGR and OSD services are not starting. Could

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Eugen Block
Hi, I would focus on the MONs first. If they don't start, your cluster is not usable. It doesn't look like you use cephadm, but please confirm. Check if the nodes are running out of disk space, maybe that's why they don't log anything and fail to start. Zitat von Amudhan P : Hi, Recen

[ceph-users] telemetry

2024-09-16 Thread Marc
maybe add filter for types as well? https://telemetry-public.ceph.com/d/x1_ISxiMz_01/models-per-vendor?orgId=1 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Amudhan P
No, I don't use cephadm and I have enough space for a log storage. When I try to start mon service in any of the node it just keeps waiting to complete without any error msg in stdout or in log file. On Mon, Sep 16, 2024 at 1:21 PM Eugen Block wrote: > Hi, > > I would focus on the MONs first. I

[ceph-users] Re: Ceph RBD w/erasure coding

2024-09-16 Thread Frédéric Nass
As a reminder, there's this one waiting ;-) https://tracker.ceph.com/issues/66641 Frédéric. PS: For the record, Andre's problem was related to the 'caps' (https://www.reddit.com/r/ceph/comments/1ffzfjc/ceph_rbd_werasure_coding/) - Le 15 Sep 24, à 18:02, Anthony D'Atri anthony.da...@gmail.c

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Eugen Block
Have you tried to start it with a higher debug level? Is the ceph.conf still correct? Is a keyring present in /var/lib/ceph/mon? Is the mon store in good shape? Can you run something like this? ceph-monstore-tool /var/lib/ceph/mon/ceph-{MON}/ get monmap -- --out monmap monmaptool --print mo

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Frank Schilder
Hi. When I have issues like this, what sometimes helps is to start a daemon manually (not systemctl or anything like that). Make sure no ceph-mon is running on the host: ps -eo cmd | grep ceph-mon and start a ceph-mon manually with a command like this (make sure the binary is the correct versi

[ceph-users] Re: Metric or any information about disk (block) fragmentation

2024-09-16 Thread Frédéric Nass
Hey, Yes, you can use either of these commands depending on whether or not you are using containers to get live OSDs's bluestore fragmentation: ceph daemon osd.0 bluestore allocator score block or cephadm shell ceph daemon osd.0 bluestore allocator score block ... { "fragmentation_rating":

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Amudhan P
Yes, the mon folder has all keys. ceph-monstore-tool /var/lib/ceph/mon/ceph-{MON}/ get monmap -- --out monmap > I could see some output when running the above command and it looks like DB is not corrupt. On Mon, Sep 16, 2024 at 2:18 PM Eugen Block wrote: > Have you tried to start it with a hig

[ceph-users] Re: [EXTERNAL] Re: Bucket Notifications v2 & Multisite Redundancy

2024-09-16 Thread Alex Hussein-Kershaw (HE/HIM)
Following up on this, I've run into another issue during my prototyping. I have two Ceph Clusters, with a zone each, sharing a zonegroup and realm. I have a local application to each, that needs informed of replication changes. So I've created a topic and notification per site. Given the notific

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Amudhan P
Frank, with Manual command I was able to start mon and able to see logs in log file and I don't find any issue in logs except below lines. Should I stop manual command and try to start mon service from systemd or follow the same approach in all mon nodes? 2024-09-16T15:36:54.620+0530 7f5783d1e5c0

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Frank Schilder
I think this output is normal and I guess the MON is up? If so, I would start another mon in the same way on another host. If the monmap is correct with network etc. they should start talking to each other. If you have 3 mons in the cluster, you should get quorum. On the host where the mon is r

[ceph-users] Re: [EXTERNAL] Re: Bucket Notifications v2 & Multisite Redundancy

2024-09-16 Thread Yuval Lifshitz
just to see that i got it right. you are asking to disable the topic and notification replication. as you want to send to different topics based on the zone that got the update to the bucket? * one option is to disable "notifications_v2" on the zonegroup. but, this is probably not a good idea - t

[ceph-users] Re: [EXTERNAL] Re: Bucket Notifications v2 & Multisite Redundancy

2024-09-16 Thread Alex Hussein-Kershaw (HE/HIM)
Yes - that's correct. Thanks for the suggestions. I think the metadata suggestion probably does work, however it doesn't come easy for me as it seems not possible to do a negative match filter; what I would really like to do is setup a filter for objects where "x-amz-metadata-site" != "local-si

[ceph-users] Re: [EXTERNAL] Re: Bucket Notifications v2 & Multisite Redundancy

2024-09-16 Thread Yuval Lifshitz
inline On Mon, Sep 16, 2024 at 5:13 PM Alex Hussein-Kershaw (HE/HIM) < alex...@microsoft.com> wrote: > Yes - that's correct. Thanks for the suggestions. > > I think the metadata suggestion probably does work, however it doesn't > come easy for me as it seems not possible to do a negative match fi

[ceph-users] Re: [EXTERNAL] Re: Bucket Notifications v2 & Multisite Redundancy

2024-09-16 Thread Alex Hussein-Kershaw (HE/HIM)
This is really an application level problem, but it's not trivial for me to determine the name of the remote site. So while I could add a metadata header and include the site name, then use "x-amz-metadata-site" == remote_site_name as my filter, it would be much more practical to say "x-amz-meta

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Amudhan P
Thanks Frank. Figured out the issue was NTP, nodes were not able to reach NTP server which caused NTP service to fail. It looks like Ceph systemd service has dependency for NTP service status. On Mon, Sep 16, 2024 at 4:12 PM Frank Schilder wrote: > I think this output is normal and I guess the

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Frank Schilder
Hi Amudhan, great that you figured that out. Does systemd not output an error in that case?? I would expect an error message. On our systems systemd is quite chatty when a unit fails. You probably still need to figure out why your new OSD took everything down over time. Maybe create a new case

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Amudhan P
No there wasn't any error msg in systemd it was just silent even for an hour. On Mon, Sep 16, 2024 at 10:02 PM Frank Schilder wrote: > Hi Amudhan, > > great that you figured that out. Does systemd not output an error in that > case?? I would expect an error message. On our systems systemd is qui