I've created an issue: https://tracker.ceph.com/issues/57918
What can I do more the get to fix this issue?
And the output of the requested commands
[cephadm@mdshost2 ~]$ sudo lvs -a
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv_home vg_sys -wi-ao 256.00m
lv_opt vg
Hi
Two questions:
1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring several times a
day?
2) Why was it turned into a directory? It contains one file
"ceph.client.admin.keyring.new". This then causes an error in the ceph logs
when ceph tries to remove the file: "rm: cannot remove
'/etc/
I fixed the issue by removing the blanco/not labeled disk. It is still a bug,
so hopefully it can get fixed for someone else who can't easily remove a disk :)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le
>
> 1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring several
> times a
> day?
>
> 2) Why was it turned into a directory? It contains one file
> "ceph.client.admin.keyring.new". This then causes an error in the ceph
> logs
> when ceph tries to remove the file: "rm: cannot remove
> '/etc
Yes, we are using the ceph-csi driver in a kubernetes cluster. Is it that
that is causing this?
Best Regards,
Martin Johansen
On Tue, Oct 25, 2022 at 9:44 AM Marc wrote:
> >
> > 1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring several
> > times a
> > day?
> >
> > 2) Why was it turn
Hi Patrick,
thanks for your answer. This is exactly the behaviour we need.
For future reference some more background:
We need to prepare a quite large installation for planned power outages. Even
though they are called planned, we will not be able to handle these manually in
good time for reas
Hi,
Since some weeks we started to us pg autoscale on our pools.
We run with version 16.2.7.
Maybe a coincidence, maybe not, from some weeks we started to experience mgr
progress module failures:
“””
[root@naret-monitor01 ~]# ceph -s
cluster:
id: 63334166-d991-11eb-99de-40a6b72108d0
How should we fix it? Should we remove the directory and add back the
keyring file?
Best Regards,
Martin Johansen
On Tue, Oct 25, 2022 at 9:45 AM Martin Johansen wrote:
> Yes, we are using the ceph-csi driver in a kubernetes cluster. Is it that
> that is causing this?
>
> Best Regards,
>
> Ma
Hi all,
I have a strange problem. I just completed an increase of pg_num on a pool and
since then "ceph status" does not report aggregated client/recovery IO any
more. It just looks like this now:
# ceph status
cluster:
id:
health: HEALTH_OK
services:
mon: 5 daemons, quoru
Opened a bug on the tracker for it: https://tracker.ceph.com/issues/57919
Am Fr., 7. Okt. 2022 um 11:30 Uhr schrieb Boris Behrens :
> Hi,
> I just wanted to reshard a bucket but mistyped the amount of shards. In a
> reflex I hit ctrl-c and waited. It looked like the resharding did not
> finish so
Wtf, unbelievable that it is still like this. You can't fix it, I had to fork
and patch it because these @#$@#$@ ignored it. I don't know much about
kubernetes I am running mesos. Can't you set/configure kubernetes to launch the
driver in a container mode?
>
> How should we fix it? Should we
I have found the logs showing the progress module failure:
debug 2022-10-25T05:06:08.877+ 7f40868e7700 0 [rbd_support INFO root]
execute_trash_remove: task={"sequence": 150, "id":
"fcc864a0-9bde-4512-9f84-be10976613db", "message": "Removing i
mage fulen-hdd/f3f237d2f7e304 from trash", "refs
Hello,
Is it possible to change/remove any of the provided alerts. Only way we've
found so far, is to change ceph_alerts.yml in running containers - which
don't persist redeploys.
Best regards,
Lasse Aagren
___
ceph-users mailing list -- ceph-users@ceph
Could you explain? I have just deployed Ceph CSI just like the docs
specified. What mode is it running in if not container mode?
Best Regards,
Martin Johansen
On Tue, Oct 25, 2022 at 10:56 AM Marc wrote:
> Wtf, unbelievable that it is still like this. You can't fix it, I had to
> fork and pat
Hi,
I have a problem with a full cluster and getting it back to a healthy state.
Fortunately it's a small test cluster with no valuable data in it.
It is used exclusively for RGW/S3, running 17.2.3.
I intentionaly filled it up via rclone/S3 until it got into HEALTH_ERR, so see
what would happen
Question 1) makes me wonder too.
This results in errors:
2022-10-25T11:20:00.000109+0200 mon.ceph00 [INF] overall HEALTH_OK
2022-10-25T11:21:05.422793+0200 mon.ceph00 [WRN] Health check failed:
failed to probe daemons or devices (CEPHADM_REFRESH_FAILED)
2022-10-25T11:22:06.037456+0200 mon.ceph00
Hello,
In our cephadm installed cluster (pacific) we are running two instances of
prometheus.
Through altering the prometheus.yml.j2 template (ceph config-key set
mgr/cephadm/services/prometheus/prometheus.yml ...) we set the prometheus'
to remote write to a corporate setup for long time retentio
The context provided, when parsing the template:
https://github.com/ceph/ceph/blob/v16.2.10/src/pybind/mgr/cephadm/services/monitoring.py#L319-L331
doesn't seem to provide any per host uniqueness
On Tue, Oct 25, 2022 at 12:35 PM Lasse Aagren wrote:
> Hello,
>
> In our cephadm installed cluster
Currently the generated template is the same for all the hosts and there's
no way to have a dedicated template for a specific host AFAIK.
On Tue, Oct 25, 2022 at 12:45 PM Lasse Aagren wrote:
> The context provided, when parsing the template:
>
>
> https://github.com/ceph/ceph/blob/v16.2.10/src/p
Thank you!
Robert Sander schrieb am Fr. 21. Okt. 2022
> This is a bug in certain versions of ceph-volume:
>
> https://tracker.ceph.com/issues/56031
>
> It should be fixed in the latest releases.
For completeness's sake: The cluster is on 16.2.10.
Issue is resolved and marked as backported. 16.
Hello!
I am planning a ceph cluster and evaluating cephadm as an orchestration tool
My cluster is going to be relatively small at the start, so I am planning
to run monitor daemons on the same node as osd. But I wanted to provide
some QoS on memory and cpu resources, so I am wondering if it is pos
Host networking is used by default as the network layer (no ip forwarding
requirement), so if your OS is jumbo your containers are.
As for the resources I’ll let more knowledgeable answer that, but you can
certainly run mon’s and OSD’s on the same box assuming you have enough CPU
and memory. I ha
Hello,
we run a ceph cluster with the following error which came up suddenly without
any maintenance/changes:
HEALTH_WARN Reduced data availability: 1 pg stale; Degraded data redundancy: 1
pg undersized
The PG in question is PG 25
Output of ceph pg dump_stuck stale:
PG_STAT STATE
On Tue, Oct 25, 2022 at 3:48 AM Frank Schilder wrote:
>
> Hi Patrick,
>
> thanks for your answer. This is exactly the behaviour we need.
>
> For future reference some more background:
>
> We need to prepare a quite large installation for planned power outages. Even
> though they are called planne
If you're using a fairly recent cephadm version, there is the ability to
provide miscellaneous container arguments in the service spec
https://docs.ceph.com/en/quincy/cephadm/services/#extra-container-arguments.
This means you can have cephadm deploy each container in that service with,
for example
Hi Patrick.
> To be clear in case there is any confusion: once you do `fs fail`, the
> MDS are removed from the cluster and they will respawn. They are not
> given any time to flush remaining I/O.
This is fine, there is not enough time to flush anything. As long as they leave
the meta-data- and
Hi All,
Looking to get some advice on an issue my clusters have been suffering from.
Realize there are lots of text below. Thanks in advance for your consideration.
The cluster has a health warning of "32 large omap objects". It's persisted for
several months.
It appears functional and there a
Hi Giuseppe,
On Tue, 2022-10-25 at 07:54 +, Lo Re Giuseppe wrote:
> “””
>
> In the mgr logs I see:
> “””
>
> debug 2022-10-20T23:09:03.859+ 7fba5f300700 0 [pg_autoscaler ERROR root]
> pool 2 has overlapping roots: {-60, -1}
This is unrelated, I asked the same question some days ago:
Hi,
I see the same on different Nautilus clusters, I was pointed to this
tracker issue: https://tracker.ceph.com/issues/39264
In one cluster disabling the prometheus module seemed to have stopped
the failing MGRs. But they happen so rarely that it might be something
different and we just di
Dear list,
recently we experienced a short outage of our ceph storage, it was a
surprising cause, and probably indicates a subtle misconfiguration on
our part, I'm hoping for a useful suggestion ;-)
We are running a 3PB cluster with 21 osd nodes (spread across 3
datacenters), 3 mon/mgrs and
Reminder that the survey will close this Friday, October 28th at 10:00pm
UTC. Thanks to everyone who has submitted responses!
On Thu, Oct 20, 2022 at 10:25 AM Laura Flores wrote:
> Dear Ceph Users,
>
> Ceph developers and doc writers are looking for responses from people in
> the user/dev commun
Thank you,
Indeed, the last messages I have in the logs are Prometheus access.
But I use Prometheus (for the Ceph Dashboard and other Grafana Dashboards). I
can't really deactivate it for days...
One thing I can try instead of restarting the MGR is to disable/enable the
Prometheus module, and
Hi Lo,
2022年10月25日(火) 18:01 Lo Re Giuseppe :
>
> I have found the logs showing the progress module failure:
>
> debug 2022-10-25T05:06:08.877+ 7f40868e7700 0 [rbd_support INFO root]
> execute_trash_remove: task={"sequence": 150, "id":
> "fcc864a0-9bde-4512-9f84-be10976613db", "message": "Re
Hi,
I am wondering whether there are good or bad experiences using Ceph with
application level replication also, like mongo or kafka on Kubernetes. Is
it a bad idea to define separate pools and storage classes for such
applications and define replication factor 1?
Regards.
--
Oğuz Yarımtepe
http
34 matches
Mail list logo