Oh sorry, forget my last email, thanks Laura for pointing out the obvious
that this is for reef, not squid!
On Wed, Mar 26, 2025 at 2:46 PM Travis Nielsen wrote:
> Yuri, as of when did 18.2.5 include the latest squid branch? If [1] is
> included in 18.2.5, then we really need [2] merged before r
If you don't specify "count_per_host", the orchestrator won't deploy
multiple daemons on one host. There's no way (that I'm aware of) to
specify a primary daemon. Since standby daemons need to be able to
take over the workload, they should be all equally equipped.
Zitat von Kasper Rasmussen
service_type: prometheus
service_name: prometheus
placement:
hosts:
- dell02.mousetech.com
networks:
- 10.0.1.0/24
Can't list daemon logs, run restart usw., because "Error EINVAL: No
daemons exist under service name "prometheus". View currently running
services using "ceph orch ls""
And y
Then maybe the deployment did fail and we’re back at looking into the
cephadm.log.
Zitat von Tim Holloway :
it returns nothing. I'd already done the same via "systemctl | grep
prometheus". There simply isn't a systemd service, even though there
should be.
On 3/26/25 11:31, Eugen Block w
If you need a proxy to pull the images, I suggest to set it in the
containers.conf:
cat /etc/containers/containers.conf
[engine]
env = ["http_proxy=:", "https_proxy=:",
"no_proxy="]
But again, you should be able to see a failed to pull in the
cephadm.log on dell02. Or even in 'ceph health
That would be the correct log file, but I don't see an attempt to
deploy a prometheus instance there. You can use any pastebin you like,
e. g. https://pastebin.com/ to upload your logs. Mask any sensitive
data before you do that.
Zitat von Tim Holloway :
Well, here's an excerpt from the /
I don't think there is failure to deploy. For one thing, I did have, as
mentioned 3 Prometheus-related containers running at one point on the
machine. Also checked for port issues and there are none. Nothing
listens on 9095.
One thing that does concern me is that the docs sau changes in settin
There’s a service called „prometheus“, which can have multiple
daemons, just like any other service (mon, mgr etc). To get the daemon
logs you need to provide the daemon name (prometheus.ceph02.andsopn),
not just the service name (prometheus).
Can you run the cephadm command I provided? It
Since the containers are all podman, I found a "systemctl edit podman"
command that's recommended to set proxy for that.
However, once I did, 2 OSDs went down and cannot be restarted.
In any event, before I did that, ceph health detail was returning
"HEALTH OK".
Now I'm getting this:
HEALTH
Also, here are the currently-installed container images:
[root@dell02 ~]# podman image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/ceph/ceph 2bc0b0f4375d 8 months
ago 1.25 GB
quay.io/ceph/ceph 3c4eff6082ae 10
Let’s say I have 2 cephfs, and three hosts I want to use as MDS hosts.
I use ceph orch apply mds to spin up the MDS daemons.
Is there a way to ensure that I don’t get two active MDS running on the same
host?
I mean when using the ceph orch apply mds command, I can specify —placement,
but it on
The cephadm.log should show some details why it fails to deploy the
daemon. If there's not much, look into the daemon logs as well
(cephadm logs --name prometheus.ceph02.mousetech.com). Could it be
that there's a non-cephadm prometheus already listening on port 9095?
Zitat von Tim Holloway
Can you share 'ceph orch ls prometheus --export'? And if it has been
deployed successfully but is currently not running, the logs should
show why that is the case.
To restart prometheus, you can just run this to restart the entire
prometheus service (which would include all instances if you
Well, here's an excerpt from the /var/log/ceph/cephadm.log. I don't know
if that's the mechanism or file you mean, though.
2025-03-26 13:11:09,382 7fb2abc38740 DEBUG
cephadm ['--no-container-init', '--timeout', '
I added a run and rerun for the fs suite on a fix
https://github.com/ceph/ceph/pull/62492
Venky, pls review and if approved I will merge it to reef and
cherry-pick to the release branch.
On Wed, Mar 26, 2025 at 8:04 AM Adam King wrote:
>
> orch approved. The suite is obviously quite red, but the
Right, systemctl edit works as well. But I'm confused about the down
OSDs. Did you set the proxy on all hosts? Because the down OSDs are on
ceph06 while prometheus is supposed to run on dell02. Are you sure
those are related?
I would recommend to remove the prometheus service entirely and s
It's strange, but for a while I'd been trying to get prometheus working
on ceph08, so I don't know.
All I do know is immediately after editing the proxy settings I got
indications that those 2 OSDs had gone down.
What's REALLY strange is that their logs seem to hint that somehow they
shifted
No change.
On 3/26/25 13:01, Tim Holloway wrote:
It's strange, but for a while I'd been trying to get prometheus
working on ceph08, so I don't know.
All I do know is immediately after editing the proxy settings I got
indications that those 2 OSDs had gone down.
What's REALLY strange is that
On Wed, Mar 26, 2025 at 8:37 PM Yuri Weinstein wrote:
>
> I added a run and rerun for the fs suite on a fix
> https://github.com/ceph/ceph/pull/62492
>
> Venky, pls review and if approved I will merge it to reef and
> cherry-pick to the release branch.
Noted. I will let you know when it's ready t
Hi Yuri,
On Wed, Mar 26, 2025 at 8:59 PM Venky Shankar wrote:
>
> On Wed, Mar 26, 2025 at 8:37 PM Yuri Weinstein wrote:
> >
> > I added a run and rerun for the fs suite on a fix
> > https://github.com/ceph/ceph/pull/62492
> >
> > Venky, pls review and if approved I will merge it to reef and
> >
Ok, I'll try one last time and ask for cephadm.log output. ;-) And the
active MGR's log might help here as well.
Zitat von Tim Holloway :
No change.
On 3/26/25 13:01, Tim Holloway wrote:
It's strange, but for a while I'd been trying to get prometheus
working on ceph08, so I don't know.
A
Ack, Travis
I was about to reply the same.
Venky, Guillaume the PRs below were cherry-picked
I will rerun the fs and ceph-volume tests when the build is done
https://github.com/ceph/ceph/pull/62492/commits
https://github.com/ceph/ceph/pull/62178/commits
On Wed, Mar 26, 2025 at 2:20 PM Travis Nie
Sorry, duplicated a URL. The mgr log is
https://www.mousetech.com/share/ceph-mgr.log
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hi,
We have a production cluster made of 3 mon+mgr, 18 OSD servers and ~500
OSDs and configured with ~50 pools, 1/2 EC (9+6) and 1/2 replica 3. It
also has 2 CephFS filesystems with 1 MDS each.
2 days ago, in a period spanning 16 hours, 13 OSD crashed with an OOM.
The OSD were first restarte
Hi again,
Looking for more info on the degraded filesystem, I managed to connect
to the dashboard where I see an error not reported as explicitely by
'ceph health' :
One or more metadata daemons (MDS ranks) are failed or in a damaged
state. At best the filesystem is partially available, at w
OSD mystery is solved.
Both OSDs were LVM-based imported as vdisks for Ceph VMs. Apparently
something scrambled either the VM manager or the host disk subsystem as
the VM disks were getting I/O errors and even disappearing from the VM.
I rebooted the physical machine and that cleared it. All
And sorry for all these mails, I forgot to mention that we are running
18.2.2.
Michel
Le 26/03/2025 à 21:51, Michel Jouvin a écrit :
Hi again,
Looking for more info on the degraded filesystem, I managed to connect
to the dashboard where I see an error not reported as explicitely by
'ceph he
OK. I couldn't find a quick way to shovel a largish file from an
internal server into pastebin, but my own servers can suffice.
the URLs are:
https://www.mousetech.com/share/cephadm.log
https://www.mousetech.com/share/cephadm.log
And I don't see a deployment either.
On 3/26/25 14:26, Eugen
On Mon, Mar 24, 2025 at 10:40 PM Yuri Weinstein wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/70563#note-1
> Release Notes - TBD
> LRC upgrade - TBD
>
> Seeking approvals/reviews for:
>
> smoke - Laura approved?
>
> rados - Radek, Laura approved? Travi
Hi Yuri,
ceph-volume is missing this backport [1].
Also for this release you will need to run the teuthology orch/cephadm test
suite for validating ceph-volume rather than the usual "ceph-volume functional
test suite" [2]
[1] https://github.com/ceph/ceph/pull/62178
[2] https://jenkins.ceph.com/
Yuri, as of when did 18.2.5 include the latest squid branch? If [1] is
included in 18.2.5, then we really need [2] merged before release, as it
would be blocking Rook.
[1] https://github.com/ceph/ceph/pull/62095 (merged to squid on March 19)
[2] https://tracker.ceph.com/issues/70667
Thanks!
Travi
Rados approved:
https://tracker.ceph.com/projects/rados/wiki/REEF#v1825-httpstrackercephcomissues70563note-1
On Wed, Mar 26, 2025 at 12:22 PM Venky Shankar wrote:
> Hi Yuri,
>
> On Wed, Mar 26, 2025 at 8:59 PM Venky Shankar wrote:
> >
> > On Wed, Mar 26, 2025 at 8:37 PM Yuri Weinstein
> wrote:
I finally got brave and migrated from Pacific to Reef, did some banging
and hammering and for the first time in a long time got a complete
"HEALTH OK" status.
However, the dashboard is still not happy. It cannot contact the
Prometheus API on port 9095.
I have redeployed Prometheus multiple t
I tried something else, but the result is not really satifying. I
edited the keepalive.conf files which had no peers at all or only one
peer, so they were all identical. Restarting the daemons helped having
only one virtual ip assigned, so now the daemons did communicate and I
see messages
Thanks, I removed the ingress service and redeployed it again, with
the same result. The interesting part here is, the configs are
identical compared to the previous deployment, so the same peers (or
no peers) as before.
Zitat von Robert Sander :
Am 3/25/25 um 18:55 schrieb Eugen Block:
O
Am 3/25/25 um 18:55 schrieb Eugen Block:
Okay, so I don't see anything in the keepalive log about communicating
between each other. The config files are almost identical, no difference
in priority, but in unicast_peer. ceph03 has no entry at all for
unicast_peer, ceph02 has only ceph03 in there
36 matches
Mail list logo