[ceph-users] Phhantom host

2024-06-21 Thread Tim Holloway
Ceph Pacific Thanks to some misplaced thumbs-on-keyboard, I inadvertently managed to alias a non-ceph system's ip as a ceph host and ceph adopted it somehow. I fixed the fat-fingered IP, and have gone through the usual motions to delete a host, but some parts of the ceph ecosystem haven't caught

[ceph-users] Re: Phhantom host

2024-06-22 Thread Tim Holloway
mgr (ceph mgr fail) although that > might > change the url you need for the dashboard by changing where the > active mgr > is. > > On Fri, Jun 21, 2024 at 10:14 AM Tim Holloway > wrote: > > > Ceph Pacific > > > > Thanks to some misplaced thumbs-on-keyb

[ceph-users] Phantom hosts

2024-06-30 Thread Tim Holloway
It's getting worse. As many may be aware, the venerable CentOS 7 OS is hitting end-of-life in a matter of days. The easiest way to upgrade my serves has been to simply create an alternate disk with the new OS, turn my provisioning system loose on it, yank the old OS system disk and jack in the ne

[ceph-users] Re: Phantom hosts

2024-07-09 Thread Tim Holloway
up > first.  > I had something similar on a customer cluster recently where we > hunted  > failing OSDs but it turned out they were removed quite a while ago,  > just not properly cleaned up yet on the filesystem. > > Thanks, > Eugen > > Zitat von Tim Holloway : &

[ceph-users] Re: CephFS MDS crashing during replay with standby MDSes crashing afterwards

2024-07-09 Thread Tim Holloway
Ivan, This may be a little off-topic, but if you're still running AlmaLinux 8,9, it's worth noting that CentOS 8 actually end-of-lifed about 2 years ago, thanks to CentOS Stream. Up until this last week, however, I had several AlmaLinux 8 machines running myself, but apparently somewhere around M

[ceph-users] Re: cephadm for Ubuntu 24.04

2024-07-11 Thread Tim Holloway
Just my €.02. There is, in fact a cephadm package for the Raspberry Pi OS. If I read the synopis correctly, it's for ceph 16.2.11, which I think is the same release of Ceph Pacific that I'm presently running my own farm on. It appears to derive off Debian Bookworm. Since cephadm is mainly a prog

[ceph-users] Schödinger's OSD

2024-07-12 Thread Tim Holloway
Incidentally, I just noticed that my phantom host isn't completely gone. It's not in the host list, either command-line or dashboard, but it does list (with no assets) as a host under "ceph osd tree". --- More seriously, I've been having problems with OSDs that report as being both up and down at

[ceph-users] Re: Schödinger's OSD

2024-07-12 Thread Tim Holloway
it would help to see your osd tree and > which  > OSDs you’re talking about. > > Zitat von Tim Holloway : > > > Incidentally, I just noticed that my phantom host isn't completely > > gone. It's not in the host list, either command-line or dashboard, > >

[ceph-users] Re: Schödinger's OSD

2024-07-13 Thread Tim Holloway
he device: cephadm [--image your-custom-image] adopt --style legacy --name osd.4 locally on that host. The --image parameter is optional. Did you follow the docs [1] when you moved to cephadm? Anyway, since it somehow seems to work already, it's probably not that relevant anymore, I just

[ceph-users] Re: Schödinger's OSD

2024-07-15 Thread Tim Holloway
t; Check which OSDs are active and remove the remainders of the > orphaned  > directories, that should be fine. But be careful and check properly  > before actually remocing anything and only remove one by one while  > watching the cluster status. > > Zitat von Tim Hollowa

[ceph-users] Re: Schödinger's OSD

2024-07-16 Thread Tim Holloway
hat up and keep it consistent. You should keep the cephadm > and  > optional the ceph-common package, but the rest isn't required to run > a  > cephadm cluster. > > > Zitat von Tim Holloway : > > > The problem with merely disabling or masking the non-cephadm OS

[ceph-users] Re: Schödinger's OSD

2024-07-16 Thread Tim Holloway
OK. I deleted the questionable stuff with this command: dnf erase ceph-mgr-modules-core-16.2.15-1.el9s.noarch ceph-mgr- diskprediction-local-16.2.15-1.el9s.noarch ceph-mgr-16.2.15- 1.el9s.x86_64 ceph-mds-16.2.15-1.el9s.x86_64 ceph-mon-16.2.15- 1.el9s.x86_64 That left these two: centos-release-

[ceph-users] Cephadm has a small wart

2024-07-18 Thread Tim Holloway
I've been setting up a cookbook OSD creation process and as I walked through the various stages, I noted that the /etc/redhat-release file said "CentOS Stream 8". I panicked, because IBM has pulled the Ceph archives for CentOS 8 and nuked the machine, then rebuilt it with more attention to detail.

[ceph-users] Re: Cephadm has a small wart

2024-07-19 Thread Tim Holloway
robably spin up a copy of CentOS 2, because repos are still online even if no sensible person would want it for a production server. But not so the CentOS 8 enterprise extensions. They;re gone. On Fri, 2024-07-19 at 09:00 +0200, Robert Sander wrote: > On 7/18/24 21:50, Tim Holloway wrote

[ceph-users] Re: Cephadm has a small wart

2024-07-19 Thread Tim Holloway
oduct, and that's about the only other factor I can think of. On Fri, 2024-07-19 at 14:50 +0200, Stefan Kooman wrote: > On 19-07-2024 14:04, Tim Holloway wrote: > > Ah. Makes sense. Might be nice if the container build appended > > something like "cephadm container"

[ceph-users] PSA: Bringing up an OSD really, really fast

2024-07-20 Thread Tim Holloway
While running Ceph in a VM may not be considered the most efficient mode of operation, it's still fairly popular. I've put together a small project that can spin up an OSD VM in just a few minutes. Whereas Ceph given free rein will gladly acquire resources with all the enthusiasm as a sprig of min

[ceph-users] Stuck in remapped state?

2024-07-27 Thread Tim Holloway
I was in the middle of tuning my OSDs when lightning blew me off the Internet. Had to wait 5 days for my ISP to send a tech and replace a fried cable. In the mean time, among other things. I had some serious time drift between servers thanks to the OS upgrades replacing NTP with chrony and me not

[ceph-users] Re: Stuck in remapped state?

2024-07-27 Thread Tim Holloway
308 active+clean+remapped io: client: 170 B/s rd, 0 op/s rd, 0 op/s wr On Sat, 2024-07-27 at 11:31 -0400, Tim Holloway wrote: > I was in the  middle of tuning my OSDs when lightning blew me off the > Internet. Had to wait 5 days for my ISP to send a tech and replace a > fried cabl

[ceph-users] Re: Stuck in remapped state?

2024-07-27 Thread Tim Holloway
t of entry for the Reef docs. Frustrating. On Sat, 2024-07-27 at 11:44 -0400, Tim Holloway wrote: > Update on "ceph-s". A machine was in the process of crashing when I > took the original snapshot. Here it is after the reboot: > > [root@dell02 ~]# ceph -s >   cluster: &g

[ceph-users] Re: Cephadm Offline Bootstrapping Issue

2024-08-02 Thread Tim Holloway
You might want to to try my "bringing up an OSD really, really fast" package (https://gogs.mousetech.com/mtsinc7/instant_osd). It's actually for spinning up a VM with an OSD in it, although you can skip the VM setup script if you're on a bare OS and just run the Ansible part. Apologies for anyone

[ceph-users] Re: Accidentally created systemd units for OSDs

2024-08-16 Thread Tim Holloway
Been there/did that. Cried a lot. Fixed now. Personally, I recommend the containerise/cephadm-managed approach. In a lot of ways, it's simpler and it supports more than one fsid on a single host.The downside is that the systemd names are really gnarly (the full fsid is part of the unitname) and th

[ceph-users] Re: squid release codename

2024-08-16 Thread Tim Holloway
I find the Spongebox ideas amusing, and I agree that in an isolated world, "Squid" would be the logical next release name. BUT It's going to wreak havoc on search engines that can't tell when someone's looking up Ceph versus the long-establish Squid Proxy. If we're going to look to the cartoon w

[ceph-users] Re: Accidentally created systemd units for OSDs

2024-08-16 Thread Tim Holloway
If it makes you feel better, that sounds exactly like what happened to me and I have no idea how. Other than I'd started with Octopus and it was a transitional release, there are conflicting instructions AND a reference in the Octopus docs to procedures using a tool that was no longer distributed w

[ceph-users] Re: Accidentally created systemd units for OSDs

2024-08-16 Thread Tim Holloway
It depends on your available resources, but I really do recommend destroying and re-creating that OSD. If you have to spin up a VM and set up a temporary OSD just to keep the overall system happy, even that is a small price to pay. As I said, you can't unlink/disable the container systemd, because

[ceph-users] Re: Accidentally created systemd units for OSDs

2024-08-17 Thread Tim Holloway
I'd put in an RFO to detect/prevent creation of mutually-exclusive OSD definitions on a single OSD storage unit myself, since that's the real problem. As Eugen has noted, you can up-convert a traditional OSD to cephadm manaegment... unless there's already a managed instance existing. I can attest t

[ceph-users] Prometheus and "404" error on console

2024-08-19 Thread Tim Holloway
Although I'm seeing this in Pacific, it appears to be a perennial issue with no well-documented solution. The dashboard home screen is flooded with popups saying "404 - Not Found Could not reach Prometheus's API on http://ceph1234.mydomain.com:9095/api/v1 " If I was a slack-jawed PHB casually wan

[ceph-users] Re: Prometheus and "404" error on console

2024-08-19 Thread Tim Holloway
docs.ceph.com/en/quincy/mgr/dashboard/#haproxy-example-configuration > > > >   > In my opinion, a VIP for the dashboard (etc.) could and maybe should > be and out of the box config. > > > > > > > On Aug 19, 2024, at 8:23 AM, Tim Holloway > > wrot

[ceph-users] Re: ceph-ansible installation error

2024-08-30 Thread Tim Holloway
I believe that the original Ansible installation process is deprecated. It was pretty messy, anyway, since it had to do a lot of grunt work. Likewise the ceph-install program, which is in the Octopus docs, but wasn't actually available in the release of Octopus I installed on my servers. The Ansib

[ceph-users] Re: How to know is ceph is ready?

2024-08-30 Thread Tim Holloway
"Ceph is ready" covers a lot of territory. It's more like, "How can I delay util Ceph is available for the particular service I need? I've been taking a systemd-bsaed approach. Since I don't actually care bout Ceph in the abstract, but I'm actually looking for the Ceph or Ceph NFS shares, I create

[ceph-users] Re: ceph-ansible installation error

2024-09-01 Thread Tim Holloway
Those are reasonable objections, although some are now dated. In the context of Ceph some of those issues are also further addressed by Ceph itself. So let me present my take. 1. Networking. You can set up some gnarly virtual networks in both container and cloud systems, it's true. Docker has

[ceph-users] Re: ceph-ansible installation error

2024-08-31 Thread Tim Holloway
n fre 30 aug. 2024 kl 20:43 skrev Milan Kupcevic > : > > > > On 8/30/24 12:38, Tim Holloway wrote: > > > I believe that the original Ansible installation process is > > > deprecated. > > > > This would be a bad news as I repeatedly hear from admin

[ceph-users] Re: ceph-ansible installation error

2024-09-02 Thread Tim Holloway
Sorry if that sounds trollish. It wasn't intended to be. Look at it this way. There are two approaches to running an IT installation. One is the free-wheeling idependent aproach. The other is the stuffy corporate approach. Free-wheeling shops run things like Ubuntu. Or even BSD (but that's

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Tim Holloway
While I generally don't recommend getting down and dirty with the containers in Ceph, if you're going to build your own, well, that's different. When I have a container and the expected port isn't listening, the first thing I do is see if it's really listening and internal-only or truly not listen

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Tim Holloway
der bumping its logging levels - the exact opposite of what I'm looking to do! Tim On Tue, 2024-09-03 at 15:34 +0100, Matthew Vernon wrote: > Hi, > > On 03/09/2024 14:27, Tim Holloway wrote: > > FWIW, I'm using podman not docker. > > > The netstat command is no

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Tim Holloway
n, on the other hand, sounds familiar, but I'm too senile to recall from where. Tim On Tue, 2024-09-03 at 11:48 -0400, Tim Holloway wrote: > Yeah. Although taming the Prometheus logs is on my list, I'm still > fuzzy on its details. > > For your purposes, Docker and P

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-04 Thread Tim Holloway
I've been monitoring my Ceph LAN segment for the last several hours and absolutely no traffic has shown up on any server for port 8765. Furthermore I did a quick review of Prometheus itself and it's only claiming those 9000-series ports I mentioned previously. So I conclude that this isn't litera

[ceph-users] Re: Issue Replacing OSD with cephadm: Partition Path Not Accepted

2024-09-04 Thread Tim Holloway
One of my major regrets is that there isn't a "Ceph Lite" for setups where you want a cluster with "only" a few terabytes and a half-dozen servers. Ceph excels at really, really big storage and the tuning parameters reflect that. I, too ran into the issue where I couldn't allocate a disk partition

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-05 Thread Tim Holloway
Now you've got me worried. As I said, there is absolutely no traffic using port 8765 on my LAN. Am I missing a service? Since my distro is based on stock Prometheus, I'd have to assume that the port 8765 server would be part of the Ceph generic container image and isn't being switched on for some

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Tim Holloway
; [1] https://github.com/ceph/ceph/pull/57535 > [2] https://github.com/ceph/ceph/pull/58402 > [3] https://github.com/ceph/ceph/pull/58460 > [4] https://github.com/ceph/ceph/pull/46400 > > > > > On Thu, Sep 5, 2024 at 7:00 PM Tim Holloway > wrote: > > Now you'

[ceph-users] Re: Dashboard and Object Gateway

2023-10-17 Thread Tim Holloway
, which treats '/foo' as a > request for bucket foo - that's why you see NoSuchBucket errors when > it's misconfigured > > also note that, because of how these apis are nested, > rgw_admin_entry='default' would prevent users from creating and > operati

[ceph-users] Re: Dashboard and Object Gateway

2023-10-17 Thread Tim Holloway
se a > command like: > > $ ceph config set client.rgw rgw_admin_entry admin > > then restart radosgws because they only read that value on startup > > On Tue, Oct 17, 2023 at 9:54 AM Tim Holloway > wrote: > > > > Thanks, Casey! > > > > I'm not

[ceph-users] Re: Nautilus - Octopus upgrade - more questions

2023-10-18 Thread Tim Holloway
I started with Octopus. It had one very serious flaw that I only fixed by having Ceph self-upgrade to Pacific. Octopus required perfect health to alter daemons and often the health problems were themselves issues with daemons. Pacific can overlook most of those problems, so it's a lot easier to rep

[ceph-users] Logging control

2023-12-19 Thread Tim Holloway
Ceph version is Pacific (16.2.14), upgraded from a sloppy Octopus. I ran afoul of all the best bugs in Octopus, and in the process switched on a lot of stuff better left alone, including some detailed debug logging. Now I can't turn it off. I am confidently informed by the documentation that the

[ceph-users] Re: Logging control

2023-12-19 Thread Tim Holloway
> osd.1 lives it wouldnt work. "ceph tell" should work anywhere there > is a client.admin key. > > > Respectfully, > > Wes Dillingham > w...@wesdillingham.com > LinkedIn > > > On Tue, Dec 19, 2023 at 4:02 PM Tim Holloway > wrote: > > Ce

[ceph-users] Re: Logging control

2023-12-19 Thread Tim Holloway
anywhere there is a > client.admin key. > > > Respectfully, > > *Wes Dillingham* > w...@wesdillingham.com > LinkedIn <http://www.linkedin.com/in/wesleydillingham> > > > On Tue, Dec 19, 2023 at 4:02 PM Tim Holloway > wrote: > > > Ceph version is Pacific (16.

[ceph-users] Re: Support of SNMP on CEPH ansible

2023-12-20 Thread Tim Holloway
I can't speak for details of ceph-ansible. I don't use it because from what I can see, ceph-ansible requires a lot more symmetry in the server farm than I have. It is, however, my understanding that cephadm is the preferred installation and management option these days and it certainly helped me t

[ceph-users] Re: Logging control

2023-12-20 Thread Tim Holloway
as  > you may have noticed. For example, you could reduce the log level of  > debug_rocksdb (default 4/5). If you want to reduce the > mgr_tick_period  > (the repeating health messages every two seconds) you can do that > like  > this: > > quincy-1:~ # ceph config set m

[ceph-users] Adding a new monitor fails

2024-02-06 Thread Tim Holloway
I just jacked in a completely new, clean server and I've been trying to get a Ceph (Pacific) monitor running on it. The "ceph orch daemon add" appears to install all/most of what's necessary, but when the monitor starts, it shuts down immediately, and in the manner of Ceph containers immediately e

[ceph-users] Direct ceph mount on desktops

2024-02-06 Thread Tim Holloway
Back when I was battline Octopus, I had problems getting ganesha's NFS to work reliably. I resolved this by doing a direct (ceph) mount on my desktop machine instead of an NFS mount. I've since been plagued by ceph "laggy OSD" complaints that appear to be due to a non-responsive client and I'm sus

[ceph-users] Re: Ceph as rootfs?

2024-02-06 Thread Tim Holloway
My €0.02 for what it's worth(less). I've been doing RBD-based VMs under libvirt with no problem. In that particular case, the ceph RBD base images are being overlaid cloud- style with a an instance-specific qcow2 image and the RBD is just part of my storage pools. For a physical machine, I'd prob

[ceph-users] Re: Adding a new monitor fails

2024-02-06 Thread Tim Holloway
e output of: > ceph orch ls mon > > If the orchestrator expects only one mon and you deploy another  > manually via daemon add it can be removed. Try using a mon.yaml file  > instead which contains the designated mon hosts and then run > ceph orch apply -I mon.yaml > >

[ceph-users] Re: Adding a new monitor fails

2024-02-06 Thread Tim Holloway
hat’s why your manually > added  > daemons are rejected. Try my suggestion with a mon.yaml. > > Zitat von Tim Holloway : > > > ceph orch ls > > NAME   PORTS    RUNNING  REFRESHED  > > AGE > > PLACEMENT > > alertm

[ceph-users] Re: Direct ceph mount on desktops

2024-02-06 Thread Tim Holloway
elly wrote: > On Tue, Feb 6, 2024 at 12:09 PM Tim Holloway > wrote: > > > > Back when I was battline Octopus, I had problems getting ganesha's > > NFS > > to work reliably. I resolved this by doing a direct (ceph) mount on > > my > > desktop machine ins

[ceph-users] Re: Problems adding a new host via orchestration.

2024-02-06 Thread Tim Holloway
Just FYI, I've seen this on CentOS systems as well, and I'm not even sure that it was just for Ceph. Maybe some stuff like Ansible. I THINK you can safely ignore that message or alternatively that it's such an easy fix that senility has already driven it from my mind. Tim On Tue, 2024-02-06

[ceph-users] Re: Direct ceph mount on desktops

2024-02-07 Thread Tim Holloway
ppening slowly and the suspension wasn't waiting properly for it to complete. On Tue, 2024-02-06 at 13:00 -0500, Patrick Donnelly wrote: > On Tue, Feb 6, 2024 at 12:09 PM Tim Holloway > wrote: > > > > Back when I was battline Octopus, I had problems getting ganesha's &g

[ceph-users] Re: Adding a new monitor fails

2024-02-08 Thread Tim Holloway
uorum. > That's  > why you had 2/1 running, this is reproducible in my test cluster. > Adding more mons also failed because of the count:1 spec. You could  > have just overwritten it in the cli as well without a yaml spec file  > (omit the count spec): > > ceph orch apply

[ceph-users] Dashboard and Object Gateway

2023-10-16 Thread Tim Holloway
First, an abject apology for the horrors I'm about to unveil. I made a cold migration from GlusterFS to Ceph a few months back, so it was a learn-/screwup/-as-you-go affair. For reasons of presumed compatibility with some of my older servers, I started with Ceph Octopus. Unfortunately, Octopus see

[ceph-users] Re: Dashboard and Object Gateway

2023-10-17 Thread Tim Holloway
rgws. > > You can confirm that both of these settings are set properly by > sending GET request to ${rgw-ip}:${port}/${rgw_admin_entry}  > “default" in your case -> it should return 405 Method Not Supported > > Btw there is actually no bucket that you would be able to s

[ceph-users] Re: Ceph Crash Module "RADOS permission denied"

2024-10-29 Thread Tim Holloway
This is a common error on my system (Pacific). It appears that there is internal confusion as to where the crash support stuff lives - whether it's new-style (administered and under /var/lib/ceph/fsid) or legacy style (/var/lib/ceph). One way to fake it out was to manually created a minimal c

[ceph-users] Re: Destroyed OSD clinging to wrong disk

2024-10-29 Thread Tim Holloway
Take care when reading the output of "ceph osd metadata". When you are running the OSD as an administered service, it's running in a container, and a container is a miniature VM. So, for example, it may report your OS as "CentOS Stream 8" even if your actual machine is running Ubuntu. The big

[ceph-users] Re: Destroyed OSD clinging to wrong disk

2024-10-30 Thread Tim Holloway
> I think the problem might be that the NVMe LV that was the WAL/DB for > the > failed OSD did not get cleaned up, but on my systems 4 OSDs use the > same > NVMe drive for WAL/DB, so I'm not sure how to proceed. > > Any suggestions would be welcome. > > Thanks. > > -D

[ceph-users] Re: [External Email] Re: Recreate Destroyed OSD

2024-11-01 Thread Tim Holloway
id I make all my OSDs managed, or just > all of > the ones on ceph01, or just the one that got created when I applied > the > spec? > > When I add my next host, should I change the placement to that host > name or > to '*'? > > More generally, is there a hig

[ceph-users] Re: OSD refuse to start

2024-11-05 Thread Tim Holloway
That can be a bit sticky. First, check to see if you have a /var/log/messages file. The dmesg log isn't always as complete. Also, of course, make sure you have enough spare RAM and disk space to run the OSD. When running a Managed OSD, a LOT of space is used under the root directory several layer

[ceph-users] Re: Deploy custom mgr module

2024-10-30 Thread Tim Holloway
Speaking abstractly, I can see 3 possible approaches. 1. You can create a separate container and invoke it from the mgr container as a micro-service. As to how, I don't know. This is likely the cleanest approach. 2. You can create a Dockerfile based on the stock mgr but with your extensions added

[ceph-users] Re: Recreate Destroyed OSD

2024-10-31 Thread Tim Holloway
en/latest/cephadm/services/osd/#deploy-osds [1] https://docs.ceph.com/en/latest/cephadm/services/osd/#advanced-osd-service-specifications Zitat von Tim Holloway : As I understand it, the manual OSD setup is only for legacy (non-container) OSDs. Directory locations are wrong for managed (contain

[ceph-users] Re: Recreate Destroyed OSD

2024-10-31 Thread Tim Holloway
As I understand it, the manual OSD setup is only for legacy (non-container) OSDs. Directory locations are wrong for managed (containerized) OSDs, for one. Actually, the whole manual setup docs ought to be moved out of the mainline documentation. In their present arrangement, they make legacy

[ceph-users] Re: Recreate Destroyed OSD

2024-11-01 Thread Tim Holloway
inghamton.edu On Thu, Oct 31, 2024 at 3:52 PM Tim Holloway wrote: I migrated from gluster when I found out it's going unsupported shortly. I'm really not big enough for Ceph proper, but there were only so many supported distributed filesystems with triple redundancy. Where I got int

[ceph-users] Re: Deploy custom mgr module

2024-10-30 Thread Tim Holloway
On 10/30/24 14:58, Tim Holloway wrote: Speaking abstractly, I can see 3 possible approaches. ... 2. You can create a Dockerfile based on the stock mgr but with your extensions added. The main problem with this is that from what I can see, the cephadm tool has the names and repositories of

[ceph-users] Re: Influencing the osd.id when creating or replacing an osd

2024-10-28 Thread Tim Holloway
Yes, but it's irritating. Ideally, I'd like my OSD IDs and hostnames to track so that if a server going pong I can find it and fix it ASAP. But it doesn't take much maintenance to break that scheme and the only thing more painful than renaming a Ceph host is re-numbering an OSD. On 10/28/24 06

[ceph-users] Re: Ceph native clients

2024-10-28 Thread Tim Holloway
On 10/28/24 04:54, Burkhard Linke wrote: Hi, On 10/26/24 18:45, Tim Holloway wrote: On the whole, I prefer to use NFS for my clients to use Ceph filesystem. It has the advantage that NFS client/mount is practically guaranteed to be pre-installed on all my client systems. On the other hand, the

[ceph-users] Re: Ceph pacific error when add new host

2024-11-11 Thread Tim Holloway
I have seen instances where the crash daemon is running under a container (using /var/lib/cep/{fsid}/crash), but the daemon is trying to use the legacy location (/varlib/ceph/crash). This can result in file access violations or "file not found" issues which should show up in the system logs (jo

[ceph-users] Re: How to speed up OSD deployment process

2024-11-08 Thread Tim Holloway
I've worked with systems much smaller than that where I would have LOVED to get everything up in only an hour. Kids these days. 1. Have you tried using a spec file? Might help, might not. 2. You could always do the old "&" Unix shell operator for asynchronous commands. I think you could get An

[ceph-users] Ceph native clients

2024-10-26 Thread Tim Holloway
On the whole, I prefer to use NFS for my clients to use Ceph filesystem. It has the advantage that NFS client/mount is practically guaranteed to be pre-installed on all my client systems. On the other hand, there are downsides. NFS (Ceph/NFS-Ganesha) has been known to be cranky on my network when

[ceph-users] Re: Strange container restarts?

2024-11-13 Thread Tim Holloway
Things are a little more complex. Managed (container) resources are handled by systemd, which typically auto-restart failed services. Ceph diverts the container logs (what you'd get from "docker logs container-id") into the systemd journal log. So doing a journalctl check is advised. Although c

[ceph-users] Re: Recreate Destroyed OSD

2024-10-31 Thread Tim Holloway
full control over it. That's why we also have osd_crush_initial_weight = 0, to check the OSD creation before letting Ceph remap any PGs. It definitely couldn't hurt to clarify the docs, you can always report on tracker.ceph.com if you have any improvement ideas. Zitat von Tim Holloway : I ha

[ceph-users] Re: Backup strategies for rgw s3

2024-09-25 Thread Tim Holloway
Well, using Ceph as its own backup system has its merits, and I've little doubt something could be cooked up, but another alternative would be to use a true backup system. In my particular case, I use the Bacula backup system product. It's not the most polished thing around, but it is a full-featu

[ceph-users] Re: [External Email] Re: Recreate Destroyed OSD

2024-11-06 Thread Tim Holloway
obably allowing service actions against the "osd" service (even though it's just a placeholder in reality) but none of that exists currently. On Wed, Nov 6, 2024 at 11:50 AM Tim Holloway wrote: On 11/6/24 11:04, Frédéric Nass wrote: ... You could enumerate all hosts one by one or us

[ceph-users] Re: osd removal leaves 'stray daemon'

2024-11-07 Thread Tim Holloway
You can get this sort of behaviour because different Ceph subsystems get their information from different places instead of having an authoritative source of information. Specifically, Ceph may look directly at: A) Its configuration database B) Systemd units running on the OSD host C) Contai

[ceph-users] Re: osd removal leaves 'stray daemon'

2024-11-07 Thread Tim Holloway
;s actually **no** stray daemons, that is **no** ghost process on any hosts trying to start. Here we're talking about unexpected behavior, most likely a bug. Regards, Frédéric. ----- Le 7 Nov 24, à 21:08, Tim Holloway t...@mousetech.com a écrit : You can get this sort of behaviour because

[ceph-users] Re: centos9 or el9/rocky9

2024-10-25 Thread Tim Holloway
There is a certain virtue in using a firewall appliance for front-line protection. I think fail2ban could add IPs to its block list. An advantage of this is that you don't have to remember what all the internal servers are to firewall them individually. Certainly one could update firewall-cmd via

[ceph-users] Re: [External Email] Re: Recreate Destroyed OSD

2024-11-06 Thread Tim Holloway
On 11/6/24 11:04, Frédéric Nass wrote: ... You could enumerate all hosts one by one or use a pattern like 'ceph0[1-2]' You may also use regex patterns depending on the version of Ceph that you're using. Check [1]. Regex patterns should be available in next minor Quincy release 17.2.8. [1] htt

[ceph-users] Re: Unable to add OSD

2024-11-06 Thread Tim Holloway
1. Make sure you have enough RAM on ceph-1 and the "ls -h /" indicates that the system disk is less than 70% full (managed services eat a LOt of disk space!) 2. Check your selunix audit log to make sure nothing's being blocked. 3. Check your /var/lib/ceph and /var/lib/ceph/16a56cdf-9bb4-11ef-

[ceph-users] Re: 1 stray daemon(s) not managed by cephadm

2024-11-08 Thread Tim Holloway
Check the /var/lib/ceph directory on host ceph-osd3. If there is an osd.3 directory there, and a /var/lib/ceph/{fsid}/osd.3 directory then you are a member of the schizophrenic OSD club. Congratulations, your membership badge and certificate of Membership will be arriving shortly. I think you

[ceph-users] Re: How to speed up OSD deployment process

2024-11-08 Thread Tim Holloway
on each node, it starts a single OSD and waits for it to > become > ready before moving on to the next. > > Regards, > Yufan > > Tim Holloway 於 2024年11月9日 週六 上午1:58寫道: > > > > I've worked with systems much smaller than that where I would have > > LOVE

[ceph-users] Re: NFS and Service Dependencies

2024-11-09 Thread Tim Holloway
H. I have somewhat similar issues, and I'm not entirely happy with what I've got, but let me fill you in. Ceph supports NFS by launching instances of Ganesha-nfs. If you're using managed services, this will be run out of the master Ceph container image and the name of this container is rat

[ceph-users] Re: Stray monitor

2024-11-17 Thread Tim Holloway
I think I can count 5 sources that Ceph can query to report/display/control its resources. 1. The /dec/ceph/ceph.conf file. Mostly supplanted bt the Ceph configuration database. 2. The ceph configuration database. A namelesskey/value store internal to a ceph filesystem. It's distributed (no fixed

[ceph-users] Re: Ceph Octopus packages missing at download.ceph.com

2024-11-17 Thread Tim Holloway
As to the comings and goings of Octopus from download.ceph.com I cannot speak. I had enough grief when IBM Red Hat pulled Ceph from its CentOS archives. But my experience with Octopus was such that unless you have a really compelling reason to use it, I'd upgrade to Pacific or higher. Octupus had

[ceph-users] Re: Schrödinger's Server

2025-02-26 Thread Tim Holloway
then looking > into/editing/removing ceph-config keys like 'mgr/cephadm/inventory' > and 'mgr/cephadm/host.ceph07.internal.mousetech.com' that 'ceph > config-key dump' output shows might help. > > Regards, > Frédéric. > > - Le 25 Fév 25,

[ceph-users] Schrödinger's Server

2025-02-25 Thread Tim Holloway
Ack. Another fine mess. I was trying to clean things up and the process of tossing around OSD's kept getting me reports of slow responses and hanging PG operations. This is Ceph Pacific, by the way. I found a deprecated server that claimed to have an OSD even though it didn't show in either "cep

[ceph-users] Re: Schrödinger's Server

2025-02-27 Thread Tim Holloway
le or set of OSDs that it seemed to hang on, I just picked a server with the most OSDs reported and rebooted that on. I suspect, however, that any server would have done. Thanks, Tim On Thu, 2025-02-27 at 08:28 +0100, Frédéric Nass wrote: > > > - Le 26 Fév 25, à 16:40,

[ceph-users] Re: external multipath disk not mounted after power off/on the server

2025-02-27 Thread Tim Holloway
I'm coming in late so I don't know the whole story here, but that name's indicative of a Managed (containerized) resource. You can't manually construct, delete or change the systemd services for such items. I learned that the hard way. The service declaration/control files are dynamically created

[ceph-users] Re: Schrödinger's Server

2025-02-28 Thread Tim Holloway
reboot and in the process flush out any other issues that might have arisen. On Thu, 2025-02-27 at 15:47 -0500, Anthony D'Atri wrote: > > > > On Feb 27, 2025, at 8:14 AM, Tim Holloway > > wrote: > > > > System is now stable. The rebalancing was doing what it shou

[ceph-users] Re: Schrödinger's Server

2025-03-01 Thread Tim Holloway
ince I not only maintain Ceph, but every other service on the farm, including appservers, LDAP, NFS, DNS, and much more, I haven't had the luxury to dig into Ceph as deeply as I'd like, so the fact that it works so well under such shoddy administration is also a point in its favor.

[ceph-users] Re: One host down osd status error

2025-03-20 Thread Tim Holloway
Based on my experience, that error comes from 1 of 3 possible causes: 1. The machine in question doesn't have proper security keys 2. The machine in question is short on resources - especially RAM 3. The machine in question has its brains scrambled. Cosmic rays flipping critical RAM bits, bugs

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-27 Thread Tim Holloway
t 1 more [ERR] MGR_MODULE_ERROR: 2 mgr modules have failed     Module 'cephadm' has failed: 'ceph06.internal.mousetech.com'     Module 'prometheus' has failed: gaierror(-2, 'Name or service not known') [WRN] TOO_MANY_PGS: too many PGs per OSD (648 > m

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-26 Thread Tim Holloway
service_type: prometheus service_name: prometheus placement:   hosts:   - dell02.mousetech.com networks: - 10.0.1.0/24 Can't list daemon logs, run restart usw., because "Error EINVAL: No daemons exist under service name "prometheus". View currently running services using "ceph orch ls"" And y

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-26 Thread Tim Holloway
ng. On 3/26/25 07:26, Eugen Block wrote: The cephadm.log should show some details why it fails to deploy the daemon. If there's not much, look into the daemon logs as well (cephadm logs --name prometheus.ceph02.mousetech.com). Could it be that there's a non-cephadm prometheus al

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-26 Thread Tim Holloway
ot;, "no_proxy="] But again, you should be able to see a failed to pull in the cephadm.log on dell02. Or even in 'ceph health detail', usually it warns you if the orchestrator failed to place a daemon. Zitat von Tim Holloway : One thing I did run into when upgrading

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-26 Thread Tim Holloway
8.3.5   dad864ee21e9  2 years ago    571 MB quay.io/prometheus/node-exporter  v1.3.1  1dbe0e931976  3 years ago    22.3 MB quay.io/prometheus/alertmanager   v0.23.0 ba2b418f427c  3 years ago    58.9 MB On 3/26/25 11:36, Tim Holloway wrote: it returns nothing. I'd al

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-26 Thread Tim Holloway
'--timeout', '895', 'gather-facts'] 2025-03-26 13:22:21,860 7f29c27cb740 DEBUG cephadm ['--no-container-init', '--timeout', '895', 'gather-facts'] 202

  1   2   >