Ceph Pacific
Thanks to some misplaced thumbs-on-keyboard, I inadvertently managed to
alias a non-ceph system's ip as a ceph host and ceph adopted it
somehow.
I fixed the fat-fingered IP, and have gone through the usual motions to
delete a host, but some parts of the ceph ecosystem haven't caught
mgr (ceph mgr fail) although that
> might
> change the url you need for the dashboard by changing where the
> active mgr
> is.
>
> On Fri, Jun 21, 2024 at 10:14 AM Tim Holloway
> wrote:
>
> > Ceph Pacific
> >
> > Thanks to some misplaced thumbs-on-keyb
It's getting worse.
As many may be aware, the venerable CentOS 7 OS is hitting end-of-life in a
matter of days.
The easiest way to upgrade my serves has been to simply create an alternate
disk with the new OS, turn my provisioning system loose on it, yank the old
OS system disk and jack in the ne
up
> first.
> I had something similar on a customer cluster recently where we
> hunted
> failing OSDs but it turned out they were removed quite a while ago,
> just not properly cleaned up yet on the filesystem.
>
> Thanks,
> Eugen
>
> Zitat von Tim Holloway :
&
Ivan,
This may be a little off-topic, but if you're still running AlmaLinux
8,9, it's worth noting that CentOS 8 actually end-of-lifed about 2
years ago, thanks to CentOS Stream.
Up until this last week, however, I had several AlmaLinux 8 machines
running myself, but apparently somewhere around M
Just my €.02. There is, in fact a cephadm package for the Raspberry Pi
OS. If I read the synopis correctly, it's for ceph 16.2.11, which I
think is the same release of Ceph Pacific that I'm presently running my
own farm on. It appears to derive off Debian Bookworm.
Since cephadm is mainly a prog
Incidentally, I just noticed that my phantom host isn't completely
gone. It's not in the host list, either command-line or dashboard, but
it does list (with no assets) as a host under "ceph osd tree".
---
More seriously, I've been having problems with OSDs that report as
being both up and down at
it would help to see your osd tree and
> which
> OSDs you’re talking about.
>
> Zitat von Tim Holloway :
>
> > Incidentally, I just noticed that my phantom host isn't completely
> > gone. It's not in the host list, either command-line or dashboard,
> >
he device:
cephadm [--image your-custom-image] adopt --style legacy --name osd.4
locally on that host. The --image parameter is optional. Did you
follow the docs [1] when you moved to cephadm? Anyway, since it
somehow seems to work already, it's probably not that relevant
anymore, I just
t; Check which OSDs are active and remove the remainders of the
> orphaned
> directories, that should be fine. But be careful and check properly
> before actually remocing anything and only remove one by one while
> watching the cluster status.
>
> Zitat von Tim Hollowa
hat up and keep it consistent. You should keep the cephadm
> and
> optional the ceph-common package, but the rest isn't required to run
> a
> cephadm cluster.
>
>
> Zitat von Tim Holloway :
>
> > The problem with merely disabling or masking the non-cephadm OS
OK. I deleted the questionable stuff with this command:
dnf erase ceph-mgr-modules-core-16.2.15-1.el9s.noarch ceph-mgr-
diskprediction-local-16.2.15-1.el9s.noarch ceph-mgr-16.2.15-
1.el9s.x86_64 ceph-mds-16.2.15-1.el9s.x86_64 ceph-mon-16.2.15-
1.el9s.x86_64
That left these two:
centos-release-
I've been setting up a cookbook OSD creation process and as I walked
through the various stages, I noted that the /etc/redhat-release file
said "CentOS Stream 8". I panicked, because IBM has pulled the Ceph
archives for CentOS 8 and nuked the machine, then rebuilt it with more
attention to detail.
robably spin up a copy of
CentOS 2, because repos are still online even if no sensible person
would want it for a production server. But not so the CentOS 8
enterprise extensions. They;re gone.
On Fri, 2024-07-19 at 09:00 +0200, Robert Sander wrote:
> On 7/18/24 21:50, Tim Holloway wrote
oduct, and that's about the
only other factor I can think of.
On Fri, 2024-07-19 at 14:50 +0200, Stefan Kooman wrote:
> On 19-07-2024 14:04, Tim Holloway wrote:
> > Ah. Makes sense. Might be nice if the container build appended
> > something like "cephadm container"
While running Ceph in a VM may not be considered the most efficient
mode of operation, it's still fairly popular. I've put together a small
project that can spin up an OSD VM in just a few minutes.
Whereas Ceph given free rein will gladly acquire resources with all the
enthusiasm as a sprig of min
I was in the middle of tuning my OSDs when lightning blew me off the
Internet. Had to wait 5 days for my ISP to send a tech and replace a
fried cable. In the mean time, among other things. I had some serious
time drift between servers thanks to the OS upgrades replacing NTP with
chrony and me not
308 active+clean+remapped
io:
client: 170 B/s rd, 0 op/s rd, 0 op/s wr
On Sat, 2024-07-27 at 11:31 -0400, Tim Holloway wrote:
> I was in the middle of tuning my OSDs when lightning blew me off the
> Internet. Had to wait 5 days for my ISP to send a tech and replace a
> fried cabl
t of entry for the Reef docs. Frustrating.
On Sat, 2024-07-27 at 11:44 -0400, Tim Holloway wrote:
> Update on "ceph-s". A machine was in the process of crashing when I
> took the original snapshot. Here it is after the reboot:
>
> [root@dell02 ~]# ceph -s
> cluster:
&g
You might want to to try my "bringing up an OSD really, really fast"
package (https://gogs.mousetech.com/mtsinc7/instant_osd).
It's actually for spinning up a VM with an OSD in it, although you can
skip the VM setup script if you're on a bare OS and just run the
Ansible part.
Apologies for anyone
Been there/did that. Cried a lot. Fixed now.
Personally, I recommend the containerise/cephadm-managed approach. In a
lot of ways, it's simpler and it supports more than one fsid on a
single host.The downside is that the systemd names are really gnarly
(the full fsid is part of the unitname) and th
I find the Spongebox ideas amusing, and I agree that in an isolated
world, "Squid" would be the logical next release name.
BUT
It's going to wreak havoc on search engines that can't tell when
someone's looking up Ceph versus the long-establish Squid Proxy.
If we're going to look to the cartoon w
If it makes you feel better, that sounds exactly like what happened to
me and I have no idea how. Other than I'd started with Octopus and it
was a transitional release, there are conflicting instructions AND a
reference in the Octopus docs to procedures using a tool that was no
longer distributed w
It depends on your available resources, but I really do recommend
destroying and re-creating that OSD. If you have to spin up a VM and
set up a temporary OSD just to keep the overall system happy, even that
is a small price to pay.
As I said, you can't unlink/disable the container systemd, because
I'd put in an RFO to detect/prevent creation of mutually-exclusive OSD
definitions on a single OSD storage unit myself, since that's the real
problem. As Eugen has noted, you can up-convert a traditional OSD to
cephadm manaegment... unless there's already a managed instance
existing. I can attest t
Although I'm seeing this in Pacific, it appears to be a perennial issue
with no well-documented solution. The dashboard home screen is flooded
with popups saying "404 - Not Found
Could not reach Prometheus's API on
http://ceph1234.mydomain.com:9095/api/v1
"
If I was a slack-jawed PHB casually wan
docs.ceph.com/en/quincy/mgr/dashboard/#haproxy-example-configuration
>
>
>
>
> In my opinion, a VIP for the dashboard (etc.) could and maybe should
> be and out of the box config.
>
>
>
>
>
> > On Aug 19, 2024, at 8:23 AM, Tim Holloway
> > wrot
I believe that the original Ansible installation process is deprecated.
It was pretty messy, anyway, since it had to do a lot of grunt work.
Likewise the ceph-install program, which is in the Octopus docs, but
wasn't actually available in the release of Octopus I installed on my
servers.
The Ansib
"Ceph is ready" covers a lot of territory. It's more like, "How can I
delay util Ceph is available for the particular service I need?
I've been taking a systemd-bsaed approach. Since I don't actually care
bout Ceph in the abstract, but I'm actually looking for the Ceph or
Ceph NFS shares, I create
Those are reasonable objections, although some are now dated. In the
context of Ceph some of those issues are also further addressed by Ceph
itself. So let me present my take.
1. Networking. You can set up some gnarly virtual networks in both
container and cloud systems, it's true. Docker has
n fre 30 aug. 2024 kl 20:43 skrev Milan Kupcevic
> :
> >
> > On 8/30/24 12:38, Tim Holloway wrote:
> > > I believe that the original Ansible installation process is
> > > deprecated.
> >
> > This would be a bad news as I repeatedly hear from admin
Sorry if that sounds trollish. It wasn't intended to be. Look at it this
way.
There are two approaches to running an IT installation. One is the
free-wheeling idependent aproach. The other is the stuffy corporate
approach.
Free-wheeling shops run things like Ubuntu. Or even BSD (but that's
While I generally don't recommend getting down and dirty with the
containers in Ceph, if you're going to build your own, well, that's
different.
When I have a container and the expected port isn't listening, the
first thing I do is see if it's really listening and internal-only or
truly not listen
der bumping its logging levels - the exact opposite of what I'm
looking to do!
Tim
On Tue, 2024-09-03 at 15:34 +0100, Matthew Vernon wrote:
> Hi,
>
> On 03/09/2024 14:27, Tim Holloway wrote:
>
> FWIW, I'm using podman not docker.
>
> > The netstat command is no
n, on the other hand, sounds
familiar, but I'm too senile to recall from where.
Tim
On Tue, 2024-09-03 at 11:48 -0400, Tim Holloway wrote:
> Yeah. Although taming the Prometheus logs is on my list, I'm still
> fuzzy on its details.
>
> For your purposes, Docker and P
I've been monitoring my Ceph LAN segment for the last several hours and
absolutely no traffic has shown up on any server for port 8765.
Furthermore I did a quick review of Prometheus itself and it's only
claiming those 9000-series ports I mentioned previously.
So I conclude that this isn't litera
One of my major regrets is that there isn't a "Ceph Lite" for setups
where you want a cluster with "only" a few terabytes and a half-dozen
servers. Ceph excels at really, really big storage and the tuning
parameters reflect that.
I, too ran into the issue where I couldn't allocate a disk partition
Now you've got me worried. As I said, there is absolutely no traffic
using port 8765 on my LAN.
Am I missing a service? Since my distro is based on stock Prometheus,
I'd have to assume that the port 8765 server would be part of the Ceph
generic container image and isn't being switched on for some
; [1] https://github.com/ceph/ceph/pull/57535
> [2] https://github.com/ceph/ceph/pull/58402
> [3] https://github.com/ceph/ceph/pull/58460
> [4] https://github.com/ceph/ceph/pull/46400
>
>
>
>
> On Thu, Sep 5, 2024 at 7:00 PM Tim Holloway
> wrote:
> > Now you'
, which treats '/foo' as a
> request for bucket foo - that's why you see NoSuchBucket errors when
> it's misconfigured
>
> also note that, because of how these apis are nested,
> rgw_admin_entry='default' would prevent users from creating and
> operati
se a
> command like:
>
> $ ceph config set client.rgw rgw_admin_entry admin
>
> then restart radosgws because they only read that value on startup
>
> On Tue, Oct 17, 2023 at 9:54 AM Tim Holloway
> wrote:
> >
> > Thanks, Casey!
> >
> > I'm not
I started with Octopus. It had one very serious flaw that I only fixed
by having Ceph self-upgrade to Pacific. Octopus required perfect health
to alter daemons and often the health problems were themselves issues
with daemons. Pacific can overlook most of those problems, so it's a
lot easier to rep
Ceph version is Pacific (16.2.14), upgraded from a sloppy Octopus.
I ran afoul of all the best bugs in Octopus, and in the process
switched on a lot of stuff better left alone, including some detailed
debug logging. Now I can't turn it off.
I am confidently informed by the documentation that the
> osd.1 lives it wouldnt work. "ceph tell" should work anywhere there
> is a client.admin key.
>
>
> Respectfully,
>
> Wes Dillingham
> w...@wesdillingham.com
> LinkedIn
>
>
> On Tue, Dec 19, 2023 at 4:02 PM Tim Holloway
> wrote:
> > Ce
anywhere there is a
> client.admin key.
>
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, Dec 19, 2023 at 4:02 PM Tim Holloway
> wrote:
>
> > Ceph version is Pacific (16.
I can't speak for details of ceph-ansible. I don't use it because from
what I can see, ceph-ansible requires a lot more symmetry in the server
farm than I have.
It is, however, my understanding that cephadm is the preferred
installation and management option these days and it certainly helped
me t
as
> you may have noticed. For example, you could reduce the log level of
> debug_rocksdb (default 4/5). If you want to reduce the
> mgr_tick_period
> (the repeating health messages every two seconds) you can do that
> like
> this:
>
> quincy-1:~ # ceph config set m
I just jacked in a completely new, clean server and I've been trying to
get a Ceph (Pacific) monitor running on it.
The "ceph orch daemon add" appears to install all/most of what's
necessary, but when the monitor starts, it shuts down immediately, and
in the manner of Ceph containers immediately e
Back when I was battline Octopus, I had problems getting ganesha's NFS
to work reliably. I resolved this by doing a direct (ceph) mount on my
desktop machine instead of an NFS mount.
I've since been plagued by ceph "laggy OSD" complaints that appear to
be due to a non-responsive client and I'm sus
My €0.02 for what it's worth(less).
I've been doing RBD-based VMs under libvirt with no problem. In that
particular case, the ceph RBD base images are being overlaid cloud-
style with a an instance-specific qcow2 image and the RBD is just part
of my storage pools.
For a physical machine, I'd prob
e output of:
> ceph orch ls mon
>
> If the orchestrator expects only one mon and you deploy another
> manually via daemon add it can be removed. Try using a mon.yaml file
> instead which contains the designated mon hosts and then run
> ceph orch apply -I mon.yaml
>
>
hat’s why your manually
> added
> daemons are rejected. Try my suggestion with a mon.yaml.
>
> Zitat von Tim Holloway :
>
> > ceph orch ls
> > NAME PORTS RUNNING REFRESHED
> > AGE
> > PLACEMENT
> > alertm
elly wrote:
> On Tue, Feb 6, 2024 at 12:09 PM Tim Holloway
> wrote:
> >
> > Back when I was battline Octopus, I had problems getting ganesha's
> > NFS
> > to work reliably. I resolved this by doing a direct (ceph) mount on
> > my
> > desktop machine ins
Just FYI, I've seen this on CentOS systems as well, and I'm not even
sure that it was just for Ceph. Maybe some stuff like Ansible.
I THINK you can safely ignore that message or alternatively that it's
such an easy fix that senility has already driven it from my mind.
Tim
On Tue, 2024-02-06
ppening
slowly and the suspension wasn't waiting properly for it to complete.
On Tue, 2024-02-06 at 13:00 -0500, Patrick Donnelly wrote:
> On Tue, Feb 6, 2024 at 12:09 PM Tim Holloway
> wrote:
> >
> > Back when I was battline Octopus, I had problems getting ganesha's
&g
uorum.
> That's
> why you had 2/1 running, this is reproducible in my test cluster.
> Adding more mons also failed because of the count:1 spec. You could
> have just overwritten it in the cli as well without a yaml spec file
> (omit the count spec):
>
> ceph orch apply
First, an abject apology for the horrors I'm about to unveil. I made a
cold migration from GlusterFS to Ceph a few months back, so it was a
learn-/screwup/-as-you-go affair.
For reasons of presumed compatibility with some of my older servers, I
started with Ceph Octopus. Unfortunately, Octopus see
rgws.
>
> You can confirm that both of these settings are set properly by
> sending GET request to ${rgw-ip}:${port}/${rgw_admin_entry}
> “default" in your case -> it should return 405 Method Not Supported
>
> Btw there is actually no bucket that you would be able to s
This is a common error on my system (Pacific).
It appears that there is internal confusion as to where the crash
support stuff lives - whether it's new-style (administered and under
/var/lib/ceph/fsid) or legacy style (/var/lib/ceph). One way to fake it
out was to manually created a minimal c
Take care when reading the output of "ceph osd metadata". When you are
running the OSD as an administered service, it's running in a container,
and a container is a miniature VM. So, for example, it may report your
OS as "CentOS Stream 8" even if your actual machine is running Ubuntu.
The big
> I think the problem might be that the NVMe LV that was the WAL/DB for
> the
> failed OSD did not get cleaned up, but on my systems 4 OSDs use the
> same
> NVMe drive for WAL/DB, so I'm not sure how to proceed.
>
> Any suggestions would be welcome.
>
> Thanks.
>
> -D
id I make all my OSDs managed, or just
> all of
> the ones on ceph01, or just the one that got created when I applied
> the
> spec?
>
> When I add my next host, should I change the placement to that host
> name or
> to '*'?
>
> More generally, is there a hig
That can be a bit sticky.
First, check to see if you have a /var/log/messages file. The dmesg log
isn't always as complete.
Also, of course, make sure you have enough spare RAM and disk space to
run the OSD. When running a Managed OSD, a LOT of space is used under
the root directory several layer
Speaking abstractly, I can see 3 possible approaches.
1. You can create a separate container and invoke it from the mgr
container as a micro-service. As to how, I don't know. This is likely
the cleanest approach.
2. You can create a Dockerfile based on the stock mgr but with your
extensions added
en/latest/cephadm/services/osd/#deploy-osds
[1]
https://docs.ceph.com/en/latest/cephadm/services/osd/#advanced-osd-service-specifications
Zitat von Tim Holloway :
As I understand it, the manual OSD setup is only for legacy
(non-container) OSDs. Directory locations are wrong for managed
(contain
As I understand it, the manual OSD setup is only for legacy
(non-container) OSDs. Directory locations are wrong for managed
(containerized) OSDs, for one.
Actually, the whole manual setup docs ought to be moved out of the
mainline documentation. In their present arrangement, they make legacy
inghamton.edu
On Thu, Oct 31, 2024 at 3:52 PM Tim Holloway wrote:
I migrated from gluster when I found out it's going unsupported shortly.
I'm really not big enough for Ceph proper, but there were only so many
supported distributed filesystems with triple redundancy.
Where I got int
On 10/30/24 14:58, Tim Holloway wrote:
Speaking abstractly, I can see 3 possible approaches.
...
2. You can create a Dockerfile based on the stock mgr but with your
extensions added. The main problem with this is that from what I can
see, the cephadm tool has the names and repositories of
Yes, but it's irritating. Ideally, I'd like my OSD IDs and hostnames to
track so that if a server going pong I can find it and fix it ASAP. But
it doesn't take much maintenance to break that scheme and the only thing
more painful than renaming a Ceph host is re-numbering an OSD.
On 10/28/24 06
On 10/28/24 04:54, Burkhard Linke wrote:
Hi,
On 10/26/24 18:45, Tim Holloway wrote:
On the whole, I prefer to use NFS for my clients to use Ceph
filesystem. It has the advantage that NFS client/mount is practically
guaranteed to be pre-installed on all my client systems.
On the other hand, the
I have seen instances where the crash daemon is running under a
container (using /var/lib/cep/{fsid}/crash), but the daemon is trying to
use the legacy location (/varlib/ceph/crash). This can result in file
access violations or "file not found" issues which should show up in the
system logs (jo
I've worked with systems much smaller than that where I would have LOVED
to get everything up in only an hour. Kids these days.
1. Have you tried using a spec file? Might help, might not.
2. You could always do the old "&" Unix shell operator for asynchronous
commands. I think you could get An
On the whole, I prefer to use NFS for my clients to use Ceph
filesystem. It has the advantage that NFS client/mount is practically
guaranteed to be pre-installed on all my client systems.
On the other hand, there are downsides. NFS (Ceph/NFS-Ganesha) has been
known to be cranky on my network when
Things are a little more complex. Managed (container) resources are
handled by systemd, which typically auto-restart failed services. Ceph
diverts the container logs (what you'd get from "docker logs
container-id") into the systemd journal log. So doing a journalctl check
is advised. Although c
full control over it. That's why we also have
osd_crush_initial_weight = 0, to check the OSD creation before letting
Ceph remap any PGs.
It definitely couldn't hurt to clarify the docs, you can always report
on tracker.ceph.com if you have any improvement ideas.
Zitat von Tim Holloway :
I ha
Well, using Ceph as its own backup system has its merits, and I've
little doubt something could be cooked up, but another alternative
would be to use a true backup system.
In my particular case, I use the Bacula backup system product. It's not
the most polished thing around, but it is a full-featu
obably allowing service actions against the
"osd" service (even though it's just a placeholder in reality) but none of
that exists currently.
On Wed, Nov 6, 2024 at 11:50 AM Tim Holloway wrote:
On 11/6/24 11:04, Frédéric Nass wrote:
...
You could enumerate all hosts one by one or us
You can get this sort of behaviour because different Ceph subsystems get
their information from different places instead of having an
authoritative source of information.
Specifically, Ceph may look directly at:
A) Its configuration database
B) Systemd units running on the OSD host
C) Contai
;s actually
**no** stray daemons, that is **no** ghost process on any hosts trying to start.
Here we're talking about unexpected behavior, most likely a bug.
Regards,
Frédéric.
----- Le 7 Nov 24, à 21:08, Tim Holloway t...@mousetech.com a écrit :
You can get this sort of behaviour because
There is a certain virtue in using a firewall appliance for front-line
protection. I think fail2ban could add IPs to its block list.
An advantage of this is that you don't have to remember what all the
internal servers are to firewall them individually.
Certainly one could update firewall-cmd via
On 11/6/24 11:04, Frédéric Nass wrote:
...
You could enumerate all hosts one by one or use a pattern like 'ceph0[1-2]'
You may also use regex patterns depending on the version of Ceph that you're
using. Check [1].
Regex patterns should be available in next minor Quincy release 17.2.8.
[1] htt
1. Make sure you have enough RAM on ceph-1 and the "ls -h /" indicates
that the system disk is less than 70% full (managed services eat a LOt
of disk space!)
2. Check your selunix audit log to make sure nothing's being blocked.
3. Check your /var/lib/ceph and
/var/lib/ceph/16a56cdf-9bb4-11ef-
Check the /var/lib/ceph directory on host ceph-osd3. If there is an
osd.3 directory there, and a /var/lib/ceph/{fsid}/osd.3 directory then
you are a member of the schizophrenic OSD club. Congratulations, your
membership badge and certificate of Membership will be arriving shortly.
I think you
on each node, it starts a single OSD and waits for it to
> become
> ready before moving on to the next.
>
> Regards,
> Yufan
>
> Tim Holloway 於 2024年11月9日 週六 上午1:58寫道:
> >
> > I've worked with systems much smaller than that where I would have
> > LOVE
H. I have somewhat similar issues, and I'm not entirely happy with
what I've got, but let me fill you in.
Ceph supports NFS by launching instances of Ganesha-nfs. If you're using
managed services, this will be run out of the master Ceph container
image and the name of this container is rat
I think I can count 5 sources that Ceph can query to
report/display/control its resources.
1. The /dec/ceph/ceph.conf file. Mostly supplanted bt the Ceph
configuration database.
2. The ceph configuration database. A namelesskey/value store internal
to a ceph filesystem. It's distributed (no fixed
As to the comings and goings of Octopus from download.ceph.com I cannot
speak. I had enough grief when IBM Red Hat pulled Ceph from its CentOS
archives.
But my experience with Octopus was such that unless you have a really
compelling reason to use it, I'd upgrade to Pacific or higher. Octupus
had
then looking
> into/editing/removing ceph-config keys like 'mgr/cephadm/inventory'
> and 'mgr/cephadm/host.ceph07.internal.mousetech.com' that 'ceph
> config-key dump' output shows might help.
>
> Regards,
> Frédéric.
>
> - Le 25 Fév 25,
Ack. Another fine mess.
I was trying to clean things up and the process of tossing around OSD's
kept getting me reports of slow responses and hanging PG operations.
This is Ceph Pacific, by the way.
I found a deprecated server that claimed to have an OSD even though it
didn't show in either "cep
le or set of OSDs that it seemed to hang on, I just picked a server
with the most OSDs reported and rebooted that on. I suspect, however,
that any server would have done.
Thanks,
Tim
On Thu, 2025-02-27 at 08:28 +0100, Frédéric Nass wrote:
>
>
> - Le 26 Fév 25, à 16:40,
I'm coming in late so I don't know the whole story here, but that
name's indicative of a Managed (containerized) resource.
You can't manually construct, delete or change the systemd services for
such items. I learned that the hard way. The service
declaration/control files are dynamically created
reboot and in the process flush out any other
issues that might have arisen.
On Thu, 2025-02-27 at 15:47 -0500, Anthony D'Atri wrote:
>
>
> > On Feb 27, 2025, at 8:14 AM, Tim Holloway
> > wrote:
> >
> > System is now stable. The rebalancing was doing what it shou
ince I not only maintain Ceph, but
every other service on the farm, including appservers, LDAP, NFS, DNS,
and much more, I haven't had the luxury to dig into Ceph as deeply as
I'd like, so the fact that it works so well under such shoddy
administration is also a point in its favor.
Based on my experience, that error comes from 1 of 3 possible causes:
1. The machine in question doesn't have proper security keys
2. The machine in question is short on resources - especially RAM
3. The machine in question has its brains scrambled. Cosmic rays
flipping critical RAM bits, bugs
t 1 more
[ERR] MGR_MODULE_ERROR: 2 mgr modules have failed
Module 'cephadm' has failed: 'ceph06.internal.mousetech.com'
Module 'prometheus' has failed: gaierror(-2, 'Name or service not
known')
[WRN] TOO_MANY_PGS: too many PGs per OSD (648 > m
service_type: prometheus
service_name: prometheus
placement:
hosts:
- dell02.mousetech.com
networks:
- 10.0.1.0/24
Can't list daemon logs, run restart usw., because "Error EINVAL: No
daemons exist under service name "prometheus". View currently running
services using "ceph orch ls""
And y
ng.
On 3/26/25 07:26, Eugen Block wrote:
The cephadm.log should show some details why it fails to deploy the
daemon. If there's not much, look into the daemon logs as well
(cephadm logs --name prometheus.ceph02.mousetech.com). Could it be
that there's a non-cephadm prometheus al
ot;,
"no_proxy="]
But again, you should be able to see a failed to pull in the
cephadm.log on dell02. Or even in 'ceph health detail', usually it
warns you if the orchestrator failed to place a daemon.
Zitat von Tim Holloway :
One thing I did run into when upgrading
8.3.5 dad864ee21e9 2 years
ago 571 MB
quay.io/prometheus/node-exporter v1.3.1 1dbe0e931976 3 years
ago 22.3 MB
quay.io/prometheus/alertmanager v0.23.0 ba2b418f427c 3 years
ago 58.9 MB
On 3/26/25 11:36, Tim Holloway wrote:
it returns nothing. I'd al
'--timeout', '895', 'gather-facts']
2025-03-26 13:22:21,860 7f29c27cb740 DEBUG
cephadm ['--no-container-init', '--timeout', '895', 'gather-facts']
202
1 - 100 of 107 matches
Mail list logo