[ceph-users] Re: Prometheus anomaly in Reef

2025-03-27 Thread Tim Holloway
Thanks for your patience. host ceph06 isn't referenced in the config database. I think I've finally purged it. I also reset the dashboard API host address from ceph08 to dell02. But since prometheus isn't running on dell02 either, there's no gain there. I did clear some of that lint out via

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-28 Thread Tim Holloway
you try to deploy prometheus again. So my recommendation is to get into HEALTH_OK first. And btw, "TOO_MANY_PGS: too many PGs per OSD (648 > max 560)" is serious, you can end up with inactive PGs during recovery, so I'd also consider checking the pools and their PGs. Zitat von Tim

[ceph-users] Re: NIH Datasets

2025-04-08 Thread Tim Holloway
I don't think Linus is only concerned with public data, no. The United States Government has had in places for many years effective means of preserving their data. Some of those systems may be old and creaky, granted, and not always the most efficient, but they suffice. The problem is that th

[ceph-users] Re: nodes with high density of OSDs

2025-04-10 Thread Tim Holloway
That's quite a large number of storage units per machine. My suspicion is that since you have apparently an unusually high number of LVs coming online at boot, the time it takes to linearly activate them is long enough to overlap with the point in time that ceph starts bringing up its storage-

[ceph-users] Re: Prometheus anomaly in Reef

2025-04-01 Thread Tim Holloway
t; parameter, I started to wonder. Since it's the only "extra" paramter in your spec file, I must assume that it's the root cause of the failures. That would probably easy to test... if you still have the patience. ;-) Zitat von Tim Holloway : Let me close this out by

[ceph-users] Re: space size issue

2025-03-28 Thread Tim Holloway
We're glad to have been of help. There is no One Size Fits All solution. For you, it seems that speed is more important than high availability. For me, it's HA+redundancy. Ceph has 3 ways to deliver data to remote clients: 1. As a direct ceph mount on the client. From experience, this is a pa

[ceph-users] Re: space size issue

2025-03-28 Thread Tim Holloway
I'm going to chime in and vote with the majority. You do not sound like an ideal user for ceph. Ceph provides high ability by being able to lose both drives /and servers/ in a transparent manner. With only one server, you have a single point of failure, and that's not considered "high availabi

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-28 Thread Tim Holloway
blob/be5dba538167f282c4ec74ea3cae958c8bd79830/src/pybind/mgr/cephadm/utils.py#L141 or https://github.com/ceph/ceph/blob/be5dba538167f282c4ec74ea3cae958c8bd79830/src/python-common/ceph/deployment/utils.py#L58 where one tries a "_dns_lookup" and the other a "resolve_ip&quo

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-31 Thread Tim Holloway
version again, considering that's where the problem started. I also didn't attempt to try an "orch apply" that omitted any running service host, for fear it wouldn't remove the omitted host. So the problem is fixed, but it took a lot of banging and hammering to mak

[ceph-users] Re: Updating ceph to pacific and quince

2025-04-01 Thread Tim Holloway
As Eugen has noted, cephadm/containers were already available in Octopus. In fact, thanks to the somewhat scrambled nature of the documents, I had a mix of both containerized and legacy OSDs under Octopus and for the most part had no issues attributable to that. My bigger problem with Octopus

[ceph-users] Re: Major version upgrades with CephADM

2025-04-01 Thread Tim Holloway
Well, I just upgraded Pacific to Reef 18.2.4 and the only problems I ran into had been problems previously seen in Pacific. Your Mileage May Vary. as the only part of Ceph I put a strain on is deploying stuff and adding/removing OSDs, but for the rest, I'd look to recent Reef complaints on the lis

[ceph-users] Re: NIH Datasets

2025-04-07 Thread Tim Holloway
Yeah, Ceph in its current form doesn't seem like a good fit. I think that what we need to support the world's knowledge in the face of enstupidification is some sort of distributed holographic datastore. so, like Ceph's PG replication, a torrent-like ability to pull from multiple unreliable so

[ceph-users] Re: nodes with high density of OSDs

2025-04-11 Thread Tim Holloway
Hi Alex, I think one of the scariest things about your setup is that there are only 4 nodes (I'm assuming that means Ceph hosts carrying OSDs). I've been bouncing around different configurations lately between some of my deployment issues and cranky old hardware and I presently am down to 4 h

[ceph-users] Re: nodes with high density of OSDs

2025-04-11 Thread Tim Holloway
I just checked an OSD and the "block" entry is indeed linked to storage using a /dev/mapper uuid LV, not a /dev/device. When ceph builds an LV-based OSD, it creates a VG whose name is "ceph-u", where "" is a UUID, and an LV named "osd-block-", where "" is also a uuid. So althoug

[ceph-users] Re: NIH Datasets

2025-04-07 Thread Tim Holloway
topics and the like. I considered OIDs as an alternative, but LDAP names are more human-friendly and easier to add sub-domains to without petitioning a master registrar. Also there's a better option for adding attributes to the entry description. On 4/7/25 09:39, Tim Holloway wrote: Ye

[ceph-users] Re: nodes with high density of OSDs

2025-04-10 Thread Tim Holloway
Peter, I don't think udev factors in based on the original question. Firstly, because I'm not sure udev deals with permanently-attached devices (it's more for hot-swap items). Secondly, because the original complaint mentioned LVM specifically. I agree that the hosts seem overloaded, by the

[ceph-users] Re: NIH Datasets

2025-04-10 Thread Tim Holloway
Sounds like a discussion for a discord server. Or BlueSky or something that's very definitely NOT what used to be known as twitter. My viewpoint is a little different. I really didn't consider HIPAA stuff, although since technically that is info that shouldn't be accessible to anyone but autho

[ceph-users] Re: nodes with high density of OSDs

2025-04-12 Thread Tim Holloway
hester Institute of Technology ________ From: Tim Holloway Sent: Saturday, April 12, 2025 1:13:05 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: nodes with high density of OSDs When I first migrated to Ceph, my servers were all running CentOS 7, which I (wrongly) thought could not

[ceph-users] Re: nodes with high density of OSDs

2025-04-12 Thread Tim Holloway
cated on its host, but it seems like it should be possible to carry a copy on the actual device.    Tim On 4/11/25 16:23, Anthony D'Atri wrote: Filestore, pre-ceph-volume may have been entirely different. IIRC LVM is used these days to exploit persistent metadata tags. On Apr 11, 2025, a

[ceph-users] Re: nodes with high density of OSDs

2025-04-12 Thread Tim Holloway
LVs, after all.    Tim On 4/12/25 10:25, Gregory Orange wrote: On 12/4/25 20:56, Tim Holloway wrote: Which brings up something I've wondered about for some time. Shouldn't it be possible for OSDs to be portable? That is, if a box goes bad, in theory I should be able to remove the drive a

[ceph-users] Re: nodes with high density of OSDs

2025-04-12 Thread Tim Holloway
4/11/25 16:23, Anthony D'Atri wrote: Filestore, pre-ceph-volume may have been entirely different. IIRC LVM is used these days to exploit persistent metadata tags. On Apr 11, 2025, at 4:03 PM, Tim Holloway wrote: I just checked an OSD and the "block" entry is indeed linked to sto

[ceph-users] Re: Dashboard lies to me

2025-04-16 Thread Tim Holloway
d, therefore creating a false sense of confidence. You may start having a look at Prometheus and/or Alertmanager web UIs, or checking their logs. Kind Regards, Ernesto On Tue, Apr 15, 2025 at 7:28 PM Tim Holloway wrote: Although I've had this problem since at least Pacific, I

[ceph-users] Re: Request for Recommendations: Tiering/Archiving from NetApp to Ceph (with stub file support)

2025-05-05 Thread Tim Holloway
There are likely simpler answers if you want to tier entire buckets, but it sounds like you are hosting a filesystem(s) on NetApp and want to tier them. It would be nice to have NetApp running Ceph as a block store, but I don't think crush is sophisticated enough to migrate components of a file

[ceph-users] Dashboard lies to me

2025-04-15 Thread Tim Holloway
Although I've had this problem since at least Pacific, I'm still seeing it on Reef. After much pain and suffering (covered elsewhere), I got my Prometheus services deployed as intended, Ceph health OK, green across the board. However, over the weekend, the dreaded "CephMgrPrometheusModuleInactive

[ceph-users] Re: NIH Datasets

2025-04-10 Thread Tim Holloway
Hi Alex, "Cost concerns" is the fig leaf that is being used in many cases, but often a closer look indicates political motivations. The current US administration is actively engaged in the destruction of anything that would conflict with their view of the world. That includes health practic

[ceph-users] Re: ceph deployment best practice

2025-04-15 Thread Tim Holloway
I haven't had the need for capacity or speed that many ceph users do, but I AM insistent on reliability, and ceph has never failed me on that point even when I've made a wreck of my hardware and/or configuration. I don't think that it was explicitly stated, but I'm pretty sure that Ceph doesn't (a

[ceph-users] Re: Dashboard lies to me

2025-04-21 Thread Tim Holloway
nesto Puerta wrote: You could check Alertmanager container logs <https://docs.ceph.com/en/quincy/cephadm/operations/#example-of-logging-to-journald> . Kind Regards, Ernesto On Wed, Apr 16, 2025 at 4:54 PM Tim Holloway wrote: I'm thinking more some sort of latency error. I h

[ceph-users] Re: Dashboard lies to me

2025-04-21 Thread Tim Holloway
. Anyway, thanks all for the help!    Tim On 4/21/25 09:46, Tim Holloway wrote: Thanks, but all I'm getting is the following every 10 minutes from the prometheus nodes: Apr 21 09:29:32 dell02.mousetech.com podman[997331]: 2025-04-21 09:29:32.252358201 -040

[ceph-users] Re: external multipath disk not mounted after power off/on the server

2025-02-27 Thread Tim Holloway
I'm coming in late so I don't know the whole story here, but that name's indicative of a Managed (containerized) resource. You can't manually construct, delete or change the systemd services for such items. I learned that the hard way. The service declaration/control files are dynamically created

[ceph-users] Re: Schrödinger's Server

2025-02-28 Thread Tim Holloway
reboot and in the process flush out any other issues that might have arisen. On Thu, 2025-02-27 at 15:47 -0500, Anthony D'Atri wrote: > > > > On Feb 27, 2025, at 8:14 AM, Tim Holloway > > wrote: > > > > System is now stable. The rebalancing was doing what it shou

[ceph-users] Re: Schrödinger's Server

2025-03-01 Thread Tim Holloway
ince I not only maintain Ceph, but every other service on the farm, including appservers, LDAP, NFS, DNS, and much more, I haven't had the luxury to dig into Ceph as deeply as I'd like, so the fact that it works so well under such shoddy administration is also a point in its favor.

[ceph-users] Re: Schrödinger's Server

2025-02-26 Thread Tim Holloway
then looking > into/editing/removing ceph-config keys like 'mgr/cephadm/inventory' > and 'mgr/cephadm/host.ceph07.internal.mousetech.com' that 'ceph > config-key dump' output shows might help. > > Regards, > Frédéric. > > - Le 25 Fév 25,

[ceph-users] Schrödinger's Server

2025-02-25 Thread Tim Holloway
Ack. Another fine mess. I was trying to clean things up and the process of tossing around OSD's kept getting me reports of slow responses and hanging PG operations. This is Ceph Pacific, by the way. I found a deprecated server that claimed to have an OSD even though it didn't show in either "cep

[ceph-users] Re: Schrödinger's Server

2025-02-27 Thread Tim Holloway
le or set of OSDs that it seemed to hang on, I just picked a server with the most OSDs reported and rebooted that on. I suspect, however, that any server would have done. Thanks, Tim On Thu, 2025-02-27 at 08:28 +0100, Frédéric Nass wrote: > > > - Le 26 Fév 25, à 16:40,

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-28 Thread Tim Holloway
Only the stuff that defines the rgw daemon on dell02. On 3/28/25 19:23, Eugen Block wrote: Do you find anything related to dell02 in config dump? ceph config dump | grep -C2 dell02 Zitat von Tim Holloway : I'm guessing that the configuration issues come from the dashboard wantin

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-28 Thread Tim Holloway
other services. Zitat von Tim Holloway : Thanks for the info on removing stubborn dead OSDs. The actual syntax required was: cephadm rm-daemon --name osd.2 --fsid --force On the "too many pgs", that's because I'm down 2 OSDs. I've got new drives, but they were wait

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-28 Thread Tim Holloway
rvice not known'); too many PGs per OSD (648 > max 560) [ERR] MGR_MODULE_ERROR: Module 'prometheus' has failed: gaierror(-2, 'Name or service not known')     Module 'prometheus' has failed: gaierror(-2, 'Name or service not known') [WRN] TOO_MANY_PGS:

[ceph-users] Re: space size issue

2025-03-28 Thread Tim Holloway
server VMs, though! On 3/28/25 16:53, Anthony D'Atri wrote: On Mar 28, 2025, at 4:38 PM, Tim Holloway wrote: We're glad to have been of help. There is no One Size Fits All solution. For you, it seems that speed is more important than high availability. For me, it's HA+redundan

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-28 Thread Tim Holloway
Almost forgot to say. I switched out disks and got rid of the OSD errors. I actually found a third independent location, so it should be a lot more failure resistant now. So now it's only the prometheus stuff that's still complaining. Everything else is happy.

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-29 Thread Tim Holloway
mgr mgr/prometheus/server_port 9095 Those should be: ceph config get mgr mgr/prometheus/server_addr 0.0.0.0 ceph config get mgr mgr/prometheus/server_port 9283 I assume that's why the module is still failing. Can you give that a try and report back? Zitat von Tim Holloway : OK. I didn'

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-29 Thread Tim Holloway
theus node. On 3/29/25 05:13, Eugen Block wrote: How about this: ceph config-key dump | grep -v history Can you spot any key regarding dell02 that doesn't belong there? Zitat von Tim Holloway : Only the stuff that defines the rgw daemon on dell02. On 3/28/25 19:23, Eugen Block wrote: D

[ceph-users] Re: Prometheus anomaly in Reef

2025-03-27 Thread Tim Holloway
t 1 more [ERR] MGR_MODULE_ERROR: 2 mgr modules have failed     Module 'cephadm' has failed: 'ceph06.internal.mousetech.com'     Module 'prometheus' has failed: gaierror(-2, 'Name or service not known') [WRN] TOO_MANY_PGS: too many PGs per OSD (648 > m

[ceph-users] Re: Prometheus anomaly in Reef

2025-04-04 Thread Tim Holloway
y other service (mon, mgr etc). To get the daemon logs you need to provide the daemon name (prometheus.ceph02.andsopn), not just the service name (prometheus). Can you run the cephadm command I provided? It should show something like I pasted in the previous message. Zitat von Tim Hollo

[ceph-users] Re: rgw + LDAP

2025-06-10 Thread Tim Holloway
I think that that information is located in the ceph configuration database. Which is edited via the "ceph config set" command and which should be readable via the "ceph config get" command and probably via the config browser in the Ceph dashboard. As I mentioned earlier, /etc/ceph doesn't car

[ceph-users] Re: Configure RGW without ceph.conf

2025-06-10 Thread Tim Holloway
I use Puppet for my complex servers, but my Ceph machines are lightweight, and Puppet, for all its virtues does require a Puppet agent to be installed on each target and have a corresponding node manifest. For the Ceph machines I just use Ansible. Since all my /etc/ceph files are identical on

[ceph-users] Re: Configure RGW without ceph.conf

2025-06-10 Thread Tim Holloway
h wrote: Le 10/06/2025 à 09:56:34-0400, Tim Holloway a écrit Hi, I use Puppet for my complex servers, but my Ceph machines are lightweight, and Puppet, for all its virtues does require a Puppet agent to be installed on each target and have a corresponding node manifest. For the Ceph machines

[ceph-users] Re: rgw + LDAP

2025-06-10 Thread Tim Holloway
I think you're a bit confused. Then again, when it comes to LDAP, I'm usually more than a bit confused myself. Generally, there are 2 ways to authenticate to LDAP: 1. Connect via a binddn and do an LDAP lookup 2. Connect via a user search to test for found/not-found Option 1 requires a "unive

<    1   2