[ceph-users] Re: Failed to probe daemons or devices

2022-10-25 Thread Sake Paulusma
I've created an issue: https://tracker.ceph.com/issues/57918 What can I do more the get to fix this issue? And the output of the requested commands [cephadm@mdshost2 ~]$ sudo lvs -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv_home vg_sys -wi-ao 256.00m lv_opt vg

[ceph-users] Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread Martin Johansen
Hi Two questions: 1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring several times a day? 2) Why was it turned into a directory? It contains one file "ceph.client.admin.keyring.new". This then causes an error in the ceph logs when ceph tries to remove the file: "rm: cannot remove '/etc/

[ceph-users] Re: Failed to probe daemons or devices

2022-10-25 Thread Sake Paulusma
I fixed the issue by removing the blanco/not labeled disk. It is still a bug, so hopefully it can get fixed for someone else who can't easily remove a disk :) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le

[ceph-users] Re: Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread Marc
> > 1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring several > times a > day? > > 2) Why was it turned into a directory? It contains one file > "ceph.client.admin.keyring.new". This then causes an error in the ceph > logs > when ceph tries to remove the file: "rm: cannot remove > '/etc

[ceph-users] Re: Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread Martin Johansen
Yes, we are using the ceph-csi driver in a kubernetes cluster. Is it that that is causing this? Best Regards, Martin Johansen On Tue, Oct 25, 2022 at 9:44 AM Marc wrote: > > > > 1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring several > > times a > > day? > > > > 2) Why was it turn

[ceph-users] Re: Temporary shutdown of subcluster and cephfs

2022-10-25 Thread Frank Schilder
Hi Patrick, thanks for your answer. This is exactly the behaviour we need. For future reference some more background: We need to prepare a quite large installation for planned power outages. Even though they are called planned, we will not be able to handle these manually in good time for reas

[ceph-users] MGR failures and pg autoscaler

2022-10-25 Thread Lo Re Giuseppe
Hi, Since some weeks we started to us pg autoscale on our pools. We run with version 16.2.7. Maybe a coincidence, maybe not, from some weeks we started to experience mgr progress module failures: “”” [root@naret-monitor01 ~]# ceph -s cluster: id: 63334166-d991-11eb-99de-40a6b72108d0

[ceph-users] Re: Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread Martin Johansen
How should we fix it? Should we remove the directory and add back the keyring file? Best Regards, Martin Johansen On Tue, Oct 25, 2022 at 9:45 AM Martin Johansen wrote: > Yes, we are using the ceph-csi driver in a kubernetes cluster. Is it that > that is causing this? > > Best Regards, > > Ma

[ceph-users] ceph status does not report IO any more

2022-10-25 Thread Frank Schilder
Hi all, I have a strange problem. I just completed an increase of pg_num on a pool and since then "ceph status" does not report aggregated client/recovery IO any more. It just looks like this now: # ceph status cluster: id: health: HEALTH_OK services: mon: 5 daemons, quoru

[ceph-users] Re: rgw multisite octopus - bucket can not be resharded after cancelling prior reshard process

2022-10-25 Thread Boris Behrens
Opened a bug on the tracker for it: https://tracker.ceph.com/issues/57919 Am Fr., 7. Okt. 2022 um 11:30 Uhr schrieb Boris Behrens : > Hi, > I just wanted to reshard a bucket but mistyped the amount of shards. In a > reflex I hit ctrl-c and waited. It looked like the resharding did not > finish so

[ceph-users] Re: Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread Marc
Wtf, unbelievable that it is still like this. You can't fix it, I had to fork and patch it because these @#$@#$@ ignored it. I don't know much about kubernetes I am running mesos. Can't you set/configure kubernetes to launch the driver in a container mode? > > How should we fix it? Should we

[ceph-users] Re: MGR failures and pg autoscaler

2022-10-25 Thread Lo Re Giuseppe
I have found the logs showing the progress module failure: debug 2022-10-25T05:06:08.877+ 7f40868e7700 0 [rbd_support INFO root] execute_trash_remove: task={"sequence": 150, "id": "fcc864a0-9bde-4512-9f84-be10976613db", "message": "Removing i mage fulen-hdd/f3f237d2f7e304 from trash", "refs

[ceph-users] changing alerts in cephadm (pacific) installed prometheus/alertmanager

2022-10-25 Thread Lasse Aagren
Hello, Is it possible to change/remove any of the provided alerts. Only way we've found so far, is to change ceph_alerts.yml in running containers - which don't persist redeploys. Best regards, Lasse Aagren ___ ceph-users mailing list -- ceph-users@ceph

[ceph-users] Re: Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread Martin Johansen
Could you explain? I have just deployed Ceph CSI just like the docs specified. What mode is it running in if not container mode? Best Regards, Martin Johansen On Tue, Oct 25, 2022 at 10:56 AM Marc wrote: > Wtf, unbelievable that it is still like this. You can't fix it, I had to > fork and pat

[ceph-users] RGW/S3 after a cluster is/was full

2022-10-25 Thread Ulrich Klein
Hi, I have a problem with a full cluster and getting it back to a healthy state. Fortunately it's a small test cluster with no valuable data in it. It is used exclusively for RGW/S3, running 17.2.3. I intentionaly filled it up via rclone/S3 until it got into HEALTH_ERR, so see what would happen

[ceph-users] Re: Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread E Taka
Question 1) makes me wonder too. This results in errors: 2022-10-25T11:20:00.000109+0200 mon.ceph00 [INF] overall HEALTH_OK 2022-10-25T11:21:05.422793+0200 mon.ceph00 [WRN] Health check failed: failed to probe daemons or devices (CEPHADM_REFRESH_FAILED) 2022-10-25T11:22:06.037456+0200 mon.ceph00

[ceph-users] setting unique labels in cephadm installed (pacific) prometheus.yml

2022-10-25 Thread Lasse Aagren
Hello, In our cephadm installed cluster (pacific) we are running two instances of prometheus. Through altering the prometheus.yml.j2 template (ceph config-key set mgr/cephadm/services/prometheus/prometheus.yml ...) we set the prometheus' to remote write to a corporate setup for long time retentio

[ceph-users] Re: setting unique labels in cephadm installed (pacific) prometheus.yml

2022-10-25 Thread Lasse Aagren
The context provided, when parsing the template: https://github.com/ceph/ceph/blob/v16.2.10/src/pybind/mgr/cephadm/services/monitoring.py#L319-L331 doesn't seem to provide any per host uniqueness On Tue, Oct 25, 2022 at 12:35 PM Lasse Aagren wrote: > Hello, > > In our cephadm installed cluster

[ceph-users] Re: setting unique labels in cephadm installed (pacific) prometheus.yml

2022-10-25 Thread Redouane Kachach Elhichou
Currently the generated template is the same for all the hosts and there's no way to have a dedicated template for a specific host AFAIK. On Tue, Oct 25, 2022 at 12:45 PM Lasse Aagren wrote: > The context provided, when parsing the template: > > > https://github.com/ceph/ceph/blob/v16.2.10/src/p

[ceph-users] Re: Using multiple SSDs as DB

2022-10-25 Thread Christian
Thank you! Robert Sander schrieb am Fr. 21. Okt. 2022 > This is a bug in certain versions of ceph-volume: > > https://tracker.ceph.com/issues/56031 > > It should be fixed in the latest releases. For completeness's sake: The cluster is on 16.2.10. Issue is resolved and marked as backported. 16.

[ceph-users] Cephadm container configurations

2022-10-25 Thread Mikhail Sidorov
Hello! I am planning a ceph cluster and evaluating cephadm as an orchestration tool My cluster is going to be relatively small at the start, so I am planning to run monitor daemons on the same node as osd. But I wanted to provide some QoS on memory and cpu resources, so I am wondering if it is pos

[ceph-users] Re: Cephadm container configurations

2022-10-25 Thread Robert Gallop
Host networking is used by default as the network layer (no ip forwarding requirement), so if your OS is jumbo your containers are. As for the resources I’ll let more knowledgeable answer that, but you can certainly run mon’s and OSD’s on the same box assuming you have enough CPU and memory. I ha

[ceph-users] 1 pg stale, 1 pg undersized

2022-10-25 Thread Alexander Fiedler
Hello, we run a ceph cluster with the following error which came up suddenly without any maintenance/changes: HEALTH_WARN Reduced data availability: 1 pg stale; Degraded data redundancy: 1 pg undersized The PG in question is PG 25 Output of ceph pg dump_stuck stale: PG_STAT STATE

[ceph-users] Re: Temporary shutdown of subcluster and cephfs

2022-10-25 Thread Patrick Donnelly
On Tue, Oct 25, 2022 at 3:48 AM Frank Schilder wrote: > > Hi Patrick, > > thanks for your answer. This is exactly the behaviour we need. > > For future reference some more background: > > We need to prepare a quite large installation for planned power outages. Even > though they are called planne

[ceph-users] Re: Cephadm container configurations

2022-10-25 Thread Adam King
If you're using a fairly recent cephadm version, there is the ability to provide miscellaneous container arguments in the service spec https://docs.ceph.com/en/quincy/cephadm/services/#extra-container-arguments. This means you can have cephadm deploy each container in that service with, for example

[ceph-users] Re: Temporary shutdown of subcluster and cephfs

2022-10-25 Thread Frank Schilder
Hi Patrick. > To be clear in case there is any confusion: once you do `fs fail`, the > MDS are removed from the cluster and they will respawn. They are not > given any time to flush remaining I/O. This is fine, there is not enough time to flush anything. As long as they leave the meta-data- and

[ceph-users] Large OMAP Objects & Pubsub

2022-10-25 Thread Alex Hussein-Kershaw (HE/HIM)
Hi All, Looking to get some advice on an issue my clusters have been suffering from. Realize there are lots of text below. Thanks in advance for your consideration. The cluster has a health warning of "32 large omap objects". It's persisted for several months. It appears functional and there a

[ceph-users] Re: MGR failures and pg autoscaler

2022-10-25 Thread Andreas Haupt
Hi Giuseppe, On Tue, 2022-10-25 at 07:54 +, Lo Re Giuseppe wrote: > “”” > > In the mgr logs I see: > “”” > > debug 2022-10-20T23:09:03.859+ 7fba5f300700 0 [pg_autoscaler ERROR root] > pool 2 has overlapping roots: {-60, -1} This is unrelated, I asked the same question some days ago:

[ceph-users] Re: MGR process regularly not responding

2022-10-25 Thread Eugen Block
Hi, I see the same on different Nautilus clusters, I was pointed to this tracker issue: https://tracker.ceph.com/issues/39264 In one cluster disabling the prometheus module seemed to have stopped the failing MGRs. But they happen so rarely that it might be something different and we just di

[ceph-users] post-mortem of a ceph disruption

2022-10-25 Thread Simon Oosthoek
Dear list, recently we experienced a short outage of our ceph storage, it was a surprising cause, and probably indicates a subtle misconfiguration on our part, I'm hoping for a useful suggestion ;-) We are running a 3PB cluster with 21 osd nodes (spread across 3 datacenters), 3 mon/mgrs and

[ceph-users] Re: What is the use case of your Ceph cluster? Developers want to know!

2022-10-25 Thread Laura Flores
Reminder that the survey will close this Friday, October 28th at 10:00pm UTC. Thanks to everyone who has submitted responses! On Thu, Oct 20, 2022 at 10:25 AM Laura Flores wrote: > Dear Ceph Users, > > Ceph developers and doc writers are looking for responses from people in > the user/dev commun

[ceph-users] Re: MGR process regularly not responding

2022-10-25 Thread Gilles Mocellin
Thank you, Indeed, the last messages I have in the logs are Prometheus access. But I use Prometheus (for the Ceph Dashboard and other Grafana Dashboards). I can't really deactivate it for days... One thing I can try instead of restarting the MGR is to disable/enable the Prometheus module, and

[ceph-users] Re: MGR failures and pg autoscaler

2022-10-25 Thread Satoru Takeuchi
Hi Lo, 2022年10月25日(火) 18:01 Lo Re Giuseppe : > > I have found the logs showing the progress module failure: > > debug 2022-10-25T05:06:08.877+ 7f40868e7700 0 [rbd_support INFO root] > execute_trash_remove: task={"sequence": 150, "id": > "fcc864a0-9bde-4512-9f84-be10976613db", "message": "Re

[ceph-users] Statefull set usage with ceph storage class

2022-10-25 Thread Oğuz Yarımtepe
Hi, I am wondering whether there are good or bad experiences using Ceph with application level replication also, like mongo or kafka on Kubernetes. Is it a bad idea to define separate pools and storage classes for such applications and define replication factor 1? Regards. -- Oğuz Yarımtepe http