[ceph-users] Newbie Requesting Help - Please, This Is Driving Me Mad/Crazy!

2021-02-24 Thread matthew
Hi Everyone, Let me apologise upfront: If this isn't the correct List to post to If this has been answered already (& I've missed it in my searching) If this has ended up double posted If I've in any way given (or about to give) offence to anyone I really need some help. I'm try

[ceph-users] Re: Ceph Cluster Config File Locations?

2024-03-06 Thread matthew
Thanks Eugen, you pointed me in the right direction :-) Yes, the config files I mentioned were the ones in `/var/lib/ceph/{FSID}/mgr.{MGR}/config` - I wasn't aware there were others (well, I suspected their was, hence my Q). The `global public-network` was (re-)set to the old subnet, while the

[ceph-users] Mounting A RBD Image via Kernal Modules

2024-03-25 Thread matthew
Hi All, I'm looking for a bit of advice on the subject of this post. I've been "staring at the trees so long I can't see the forest any more". :-) Rocky Linux Client latest version. Ceph Reef latest version. I have read *all* the doco on the Ceph website. I have created a pool (my_pool) and an

[ceph-users] Linux Laptop Losing CephFS mounts on Sleep/Hibernate

2024-03-25 Thread matthew
Hi All, So I've got a Ceph Reef Cluster (latest version) with a CephFS system set up with a number of directories on it. On a Laptop (running Rocky Linux (latest version)) I've used fstab to mount a number of those directories - all good, everything works, happy happy joy joy! :-) However, wh

[ceph-users] Mysterious HDD-Space Eating Issue

2023-01-16 Thread matthew
Hi Guys, I've got a funny one I'm hoping someone can point me in the right direction with: We've got three identical(?) Ceph nodes running 4 OSDs, Mon, Man, and iSCSI G/W each (we're only a small shop) on Rocky Linux 8 / Ceph Quincy. Everything is running fine, no bottle-necks (as far as we ca

[ceph-users] Ceph Dashboard TLS

2024-09-22 Thread matthew
Hi All, I'm running an (experimental) 3-Node Ceph Reef (v18.2.4) Cluster. Each of the 3 nodes runs (amongst other services) the Ceph Dashboard - for fail-over purposes. I can connect to the Ceph Dashboard when not using TLS (ie ceph config set mgr mgr/dashboard/ssl false). I've got a private

[ceph-users] Ceph Dashboard TLS

2024-09-22 Thread matthew
Hi All, I'm running an (experimental) 3-Node Ceph Reef (v18.2.4) Cluster. Each of the 3 nodes runs (amongst other services) the Ceph Dashboard - for fail-over purposes. I can connect to the Ceph Dashboard when not using TLS (ie ceph config set mgr mgr/dashboard/ssl false). I've got a private

[ceph-users] Re: Ceph Dashboard TLS

2024-09-26 Thread matthew
Yeap, that was my issue (forgot to open up port 8443 in the firewall) Thanks for the help PS Oh, and you *can* use ECC TLS Certs - if anyone wanted to know. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le

[ceph-users] ceph rgw zone create fails EINVAL

2024-06-19 Thread Matthew Vernon
up the spec file, but it looks like the one in the docs[0]. Can anyone point me in the right direction, please? [if the underlying command emits anything useful, I can't find it in the logs] Thanks, Matthew [0] https://docs.ceph.com/en/reef/mgr/rgw/#realm-credentials-token ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-24 Thread Matthew Vernon
rking out what the problem is quite challenging... Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-24 Thread Matthew Vernon
On 24/06/2024 20:49, Matthew Vernon wrote: On 19/06/2024 19:45, Adam King wrote: I think this is at least partially a code bug in the rgw module. Where ...the code path seems to have a bunch of places it might raise an exception; are those likely to result in some entry in a log-file? I&#x

[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-25 Thread Matthew Vernon
On 24/06/2024 21:18, Matthew Vernon wrote: 2024-06-24T17:33:26.880065+00:00 moss-be2001 ceph-mgr[129346]: [rgw ERROR root] Non-zero return from ['radosgw-admin', '-k', '/var/lib/ceph/mgr/ceph-moss-be2001.qvwcaq/keyring', '-n', 'mgr.moss-be20

[ceph-users] multipart file in broken state

2024-07-03 Thread Matthew Darwin
When trying to clean up multi-part files, I get the following error: $ rclone backend cleanup s3:bucket-name 2024/07/04 02:42:19 ERROR : S3 bucket bucket-name: failed to remove pending multipart upload for bucket "bucket-name" key "0a424a15dee6fecb241130e9e4e49d99ed120f05/outputs/012149-0

[ceph-users] Re: cephadm basic questions: image config, OS reimages

2024-08-27 Thread Matthew Vernon
warning message in that case... Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Discovery (port 8765) service not starting

2024-09-02 Thread Matthew Vernon
get the service discovery endpoint working? Thanks, Matthew [0] https://docs.ceph.com/en/reef/cephadm/services/monitoring/#deploying-monitoring-without-cephadm ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email t

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
yment where it's only listening on v4. Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
Hi, On 03/09/2024 11:46, Eugen Block wrote: Do you see the port definition in the unit.meta file? Oddly: "ports": [ 9283, 8765, 8765, 8765, 8765 ], which doesn't look right... Regards, Mattew ___ ce

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
I can tell from the docs it should just get started when you enable the prometheus endpoint (which does seem to be working)... Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
rometheus config file (under "./etc) and see if there are irregularities there. It's not, it's the mgr container (I've enabled the prometheus mgr module, which makes an endpoint available from whence metrics can be scraped, rather than the prometheus container which r

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-04 Thread Matthew Vernon
g. an external Prometheus scraper at the service discovery endpoint of any mgr and it would then tell Prometheus where to scrape metrics from (i.e. the active mgr)? Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an ema

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-05 Thread Matthew Vernon
). Right; it wasn't running because I have an IPv6 deployment (that bug's fixed in 18.2.4 - https://tracker.ceph.com/issues/63448). Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-05 Thread Matthew Vernon
On 05/09/2024 15:03, Matthew Vernon wrote: Hi, On 05/09/2024 12:49, Redouane Kachach wrote: The port 8765 is the "service discovery" (an internal server that runs in the mgr... you can change the port by changing the variable service_discovery_port of cephadm). Normally it is ope

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Matthew Vernon
ly redeploy from time to time) that external monitoring could be pointed at? Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Matthew Vernon
On 06/09/2024 10:27, Matthew Vernon wrote: On 06/09/2024 08:08, Redouane Kachach wrote: That makes sense. The ipv6 BUG can lead to the issue you described. In the current implementation whenever a mgr failover takes place, prometheus configuration (when using the monitoring stack deployed by

[ceph-users] Re: RGW sync gets stuck every day

2024-09-11 Thread Matthew Darwin
I'm on quincy. I had lots of problems with RGW getting stuck.  Once I dedicated 1 single RGW on each side to do replication, my problems went away.  Having a cluster of RGW behind a load balancer seemed to be confusing things. I still have multiple RGW for user-facing load, but a single RGW

[ceph-users] Hardware needs for MDS for HPC/OpenStack workloads?

2020-10-22 Thread Matthew Vernon
likely to be useful in production. I've also seen it suggested that an SSD-only pool is sensible for the CephFS metadata pool; how big is that likely to get? I'd be grateful for any pointers :) Regards, Matthew -- The Wellcome Sanger Institute is operated by Genome Research

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-26 Thread Matthew Vernon
itself out thereafter. Regards, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, Londo

[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11 [EXT]

2020-12-16 Thread Matthew Vernon
would continue to be consistent with time sync issues. Regards, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston R

[ceph-users] Re: Sequence replacing a failed OSD disk? [EXT]

2021-01-04 Thread Matthew Vernon
oy the old OSD now - as you say, if the system reboots or somesuch you don't want the OSD to try and restart on a fail{ed,ing} disk. Regards, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a compan

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-04 Thread Matthew Vernon
me of the RGW code (I think there's a librgw) to re-use a bunch of its code for your use case; this feels more natural to me than using RBD for this. Regards, Matthew [pleased software heritage are still looking at Ceph :) ] -- The Wellcome Sanger Institute is operated by Genome Research

[ceph-users] Re: share haproxy config for radosgw [EXT]

2021-02-09 Thread Matthew Vernon
The aim is to use all available CPU on the RGWs at peak load, but to also try and prevent one user overwhelming the service for everyone else - hence the dropping of idle connections and soft (and then hard) limits on per-IP connections. Regards, Matthew -- The Wellcome Sanger Institute i

[ceph-users] Re: Backups of monitor [EXT]

2021-02-15 Thread Matthew Vernon
had any issues as a result... [it's slightly fiddly to add more, since we give them a bunch of extra storage than our other nodes since the Mon store can get pretty big in a large cluster if you have to do a big rebalance] Regards, Matthew -- The Wellcome Sanger Institute is operated

[ceph-users] Re: share haproxy config for radosgw [EXT]

2021-02-15 Thread Matthew Vernon
On 14/02/2021 21:31, Graham Allan wrote: On Tue, Feb 9, 2021 at 11:00 AM Matthew Vernon <mailto:m...@sanger.ac.uk>> wrote: On 07/02/2021 22:19, Marc wrote: > > I was wondering if someone could post a config for haproxy. Is there something specific to configur

[ceph-users] Consequences of setting bluestore_fsck_quick_fix_on_mount to false?

2021-02-15 Thread Matthew Vernon
a of how this is likely to work at scale... Thanks, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered of

[ceph-users] Re: Consequences of setting bluestore_fsck_quick_fix_on_mount to false?

2021-02-16 Thread Matthew Vernon
;s as we updated the osds. Thanks; I'll see what it looks like on the test cluster. Regards, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742

[ceph-users] "optimal" tunables on release upgrade

2021-02-26 Thread Matthew Vernon
fresh invocation of ceph osd crush tunables... [I assume the same answer applies to "default"?] Regards, Matthew [0] I foolishly thought a cluster initially installed as Jewel would have jewel tunables -- The Wellcome Sanger Institute is operated by Genome Research Limited, a

[ceph-users] Octopus auto-scale causing HEALTH_WARN re object numbers

2021-03-02 Thread Matthew Vernon
3) is more than 23.4063 times cluster average (13379) ...which seems like the wrong thing for the auto-scaler to be doing. Is this a known problem? Regards, Matthew More details: ceph df: root@sto-t1-1:~# ceph df --- RAW STORAGE --- CLASS SIZE AVAILUSED RAW USED %RAW USED hdd

[ceph-users] Re: Octopus auto-scale causing HEALTH_WARN re object numbers [EXT]

2021-03-03 Thread Matthew Vernon
On 02/03/2021 16:38, Matthew Vernon wrote: root@sto-t1-1:~# ceph health detail HEALTH_WARN 1 pools have many more objects per pg than average; 9 pgs not deep-scrubbed in time [WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than average     pool default.rgw.buckets.data

[ceph-users] Re: Questions RE: Ceph/CentOS/IBM [EXT]

2021-03-03 Thread Matthew Vernon
nning ;-) Regards, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW

[ceph-users] Re: lvm fix for reseated reseated device [EXT]

2021-03-15 Thread Matthew Vernon
1 > /sys/block/sdNEW/device/delete rescan-scsi-bus.sh -a -r ? I've been trying this when replacing drives (ceph-ansible gets confused if the drives on a host change too much), so I don't know if udev will DTRT. Regards, Matthew -- The Wellcome Sanger Institute is operated by

[ceph-users] Re: lvm fix for reseated reseated device [EXT]

2021-03-15 Thread Matthew Vernon
On 15/03/2021 11:29, Matthew Vernon wrote: On 15/03/2021 11:09, Dan van der Ster wrote: Occasionally we see a bus glitch which causes a device to disappear then reappear with a new /dev/sd name. This crashes the osd (giving IO errors) but after a reboot the OSD will be perfectly fine. We&#x

[ceph-users] Re: millions slow ops on a cluster without load

2021-03-15 Thread Matthew H
Might be an MTU problem, have you checked your network and MTU settings? From: Szabo, Istvan (Agoda) Sent: Monday, March 15, 2021 12:08 PM To: Ceph Users Subject: [ceph-users] millions slow ops on a cluster without load We have a cluster with a huge amount of

[ceph-users] ceph-ansible in Pacific and beyond?

2021-03-17 Thread Matthew Vernon
alarmed if it was going to go away... Regards, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, Londo

[ceph-users] Telemetry ident use?

2021-03-17 Thread Matthew Vernon
Hi, What use is made of the ident data in the telemetry module? It's disabled by default, and the docs don't seem to say what it's used for... Thanks, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with numbe

[ceph-users] Re: ceph-ansible in Pacific and beyond?

2021-03-17 Thread Matthew H
d and significantly faster for deployments then ceph-ansible is. From: Matthew Vernon Sent: Wednesday, March 17, 2021 12:50 PM To: ceph-users Subject: [ceph-users] ceph-ansible in Pacific and beyond? Hi, I caught up with Sage's talk on what to expect

[ceph-users] Re: ceph-ansible in Pacific and beyond?

2021-03-17 Thread Matthew H
hat has not been the case? From: Teoman Onay Sent: Wednesday, March 17, 2021 1:38 PM To: Matthew H Cc: Matthew Vernon ; ceph-users Subject: Re: [ceph-users] Re: ceph-ansible in Pacific and beyond? A containerized environment just makes troubleshooting more diffi

[ceph-users] Re: Email alerts from Ceph [EXT]

2021-03-18 Thread Matthew Vernon
cript that runs daily to report on failed OSDs. Our existing metrics infrastructure is collectd/graphite/grafana so we have dashboards and so on, but as far as I'm aware the Octopus dashboard only supports prometheus, so we're a bit stuck there :-( Regards, Matthew -- The Wellcome S

[ceph-users] Re: ceph-ansible in Pacific and beyond? [EXT]

2021-03-18 Thread Matthew Vernon
ning ceph-ansible if there are interests, but people must be aware that: This is good to know, thank you :) I hadn't realised my question would spawn such a monster thread! Regards, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity register

[ceph-users] Re: Nautilus, Ceph-Ansible, existing OSDs, and ceph.conf updates [EXT]

2021-04-12 Thread Matthew Vernon
evice names aren't exactly what it's expecting. Regards, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Eust

[ceph-users] Re: time duration of radosgw-admin [EXT]

2021-06-02 Thread Matthew Vernon
Hi, On 01/06/2021 21:29, Rok Jaklič wrote: is it normal that radosgw-admin user info --uid=user ... takes around 3s or more? Seems to take about 1s on our production cluster (Octopus), which isn't exactly speedy, but good enough... Regards, Matthew -- The Wellcome Sanger Institu

[ceph-users] Why you might want packages not containers for Ceph deployments

2021-06-02 Thread Matthew Vernon
setups where it's an obvious win), but if I were advising someone who wanted to set up and use a 'boring' Ceph cluster for the medium term, I'd still advise on using packages. I don't think this makes me a luddite :) Regards, and apologies for the wall of text, Matthe

[ceph-users] Re: ceph buckets [EXT]

2021-06-08 Thread Matthew Vernon
an do multi-tenancy if you want, however: https://docs.ceph.com/en/latest/radosgw/multitenancy/ Regards, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 27

[ceph-users] Re: How can I check my rgw quota ? [EXT]

2021-06-23 Thread Matthew Vernon
can't via S3; we collect these data and publish them out-of-band (via a CSV file and some trend graphs). The Ceph dashboard can also show you this, I think, if you don't mind all your users being able to see each others' quotas. Regards, Matthew -- The Wellcome Sanger Institute

[ceph-users] RGW Swift & multi-site

2021-08-16 Thread Matthew Vernon
Hi, Are there any issues to be aware of when using RGW's newer multi-site features with the Swift front-end? I've, perhaps unfairly, gathered the impression that the Swift support in RGW gets less love than S3... Thanks, Matthew ps: new email address, as I've

[ceph-users] Re: Howto upgrade AND change distro

2021-08-27 Thread Matthew Vernon
ll reduce the amount of rebalancing you have to do when it rejoins the cluster post upgrade. Regards, Matthew [one good thing about Ubuntu's cloud archive is that e.g. you can get the same version that's default in 20.04 available as packages for 18.04 via UCA meaning you can upgrad

[ceph-users] cephadm Pacific bootstrap hangs waiting for mon

2021-08-30 Thread Matthew Pounsett
I'm just getting started with Pacific, and I've run into this problem trying to get bootstrapped. cephadm is waiting for the mon to start, and waiting, and waiting ... checking docker ps it looks like it's running, but I guess it's never finishing its startup tasks? I waited about 30 minutes t

[ceph-users] Re: cephadm Pacific bootstrap hangs waiting for mon

2021-08-31 Thread Matthew Pounsett
On Tue, 31 Aug 2021 at 03:24, Arnaud MARTEL wrote: > > Hi Matthew, > > I dont' know if it will be helpful but I had the same problem using debian 10 > and the solution was to install docker from docker.io and not from the debian > package (too old). > Ah, that makes

[ceph-users] New Pacific deployment, "failed to find osd.# in keyring" errors

2021-09-02 Thread Matthew Pounsett
I'm trying to bring up a new cluster, just installed, and I'm getting errors while trying to deploy OSDs. Of the 85 candidates found, I've got 63 in and 0 up. All of the hosts were successfully added to the cluster using 'ceph orch host add ...' , but I'm seeing things in the logs like the large

[ceph-users] Re: cephadm Pacific bootstrap hangs waiting for mon

2021-09-02 Thread Matthew Pounsett
On Thu, 2 Sept 2021 at 04:47, Sebastian Wagner wrote: > > by chance do you still have the logs of the mon the never went up? > > https://docs.ceph.com/en/latest/cephadm/troubleshooting/#checking-cephadm-logs > Not an

[ceph-users] Re: [Ceph Upgrade] - Rollback Support during Upgrade failure

2021-09-03 Thread Matthew Vernon
ur cluster; I'd expect a cluster mid-upgrade to still be operational, so you should still be able to access your OSDs. Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Ceph Upgrade] - Rollback Support during Upgrade failure

2021-09-08 Thread Matthew Vernon
are tools to extract data from OSDs (e.g. https://docs.ceph.com/en/latest/man/8/ceph-objectstore-tool/ ), you won't get complete objects this way. Instead, the advice would be to try and get enough mons back up to get your cluster at least to a read-only state and then attempt reco

[ceph-users] Re: OSD Service Advanced Specification db_slots

2021-09-10 Thread Matthew Vernon
ition per OSD. [not attempted this with cephadm, this was ceph-ansible] Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Stretch cluster experiences in production?

2021-10-15 Thread Matthew Vernon
y do the right thing location-wise? i.e. DC A RGWs will talk to DC A OSDs wherever possible? Thanks, Matthew [0] https://docs.ceph.com/en/latest/rados/operations/stretch-mode/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an em

[ceph-users] 17.2.7 quincy

2023-10-29 Thread Matthew Darwin
Hi all, I see17.2.7 quincy is published as debian-bullseye packages.  So I tried it on a test cluster. I must say I was not expecting the big dashboard change in a patch release.  Also all the "cluster utilization" numbers are all blank now (any way to fix it?), so the dashboard is much less

[ceph-users] Re: 17.2.7 quincy

2023-10-30 Thread Matthew Darwin
ff by default. "ceph dashboard feature disable dashboard" works to put the old dashboard back.  Thanks. On 2023-10-30 00:09, Nizamudeen A wrote: Hi Matthew, Is the prometheus configured in the cluster? And also the PROMETHUEUS_API_URL is set? You can set it manually by ceph dashboard set

[ceph-users] Re: 17.2.7 quincy dashboard issues

2023-10-30 Thread Matthew Darwin
t's why the utilization charts are empty because it relies on the prometheus info. And I raised a PR to disable the new dashboard in quincy. https://github.com/ceph/ceph/pull/54250 Regards, Nizam On Mon, Oct 30, 2023 at 6:09 PM Matthew Darwin wrote: Hello, We're not using

[ceph-users] Re: 17.2.7 quincy dashboard issues

2023-11-02 Thread Matthew Darwin
ome filtering done with cluster id or something to properly identify it. FYI @Pedro Gonzalez Gomez <mailto:pegon...@redhat.com> @Ankush Behl <mailto:anb...@redhat.com> @Aashish Sharma <mailto:aasha...@redhat.com> Regards, Nizam On Mon, Oct 30, 2023 at 11:05 PM Matthew Darwin w

[ceph-users] Many pgs inactive after node failure

2023-11-04 Thread Matthew Booth
so I will most likely rebuild it. I'm running rook, and I will most likely delete the old node and create a new one with the same name. AFAIK, the OSDs are fine. When rook rediscovers the OSDs, will it add them back with data intact? If not, is there any way I can make it so it will? Thanks! --

[ceph-users] Re: Many pgs inactive after node failure

2023-11-06 Thread Matthew Booth
king I had enough space. Thanks! Matt > > Regards, > Eugen > > [1] https://docs.ceph.com/en/reef/cephadm/services/osd/#activate-existing-osds > > Zitat von Matthew Booth : > > > I have a 3 node ceph cluster in my home lab. One of the pools spans 3 > > hdds,

[ceph-users] OSD fails to start after 17.2.6 to 17.2.7 update

2023-11-07 Thread Matthew Booth
self.list(args) File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python3.6/site-packages/ceph_volume/devices/raw/list.py", line 122, in list report = self.generate(args.device) File &qu

[ceph-users] Re: OSD fails to start after 17.2.6 to 17.2.7 update

2023-11-07 Thread Matthew Booth
that regression. Fixes: https://tracker.ceph.com/issues/62001 Signed-off-by: Guillaume Abrioux (cherry picked from commit b3fd5b513176fb9ba1e6e0595ded4b41d401c68e) It feels like a regression to me. Matt On Tue, 7 Nov 2023 at 16:13, Matthew Booth wrote: > Firstly I'm rolli

[ceph-users] Re: OSD fails to start after 17.2.6 to 17.2.7 update

2023-11-07 Thread Matthew Booth
On Tue, 7 Nov 2023 at 16:26, Matthew Booth wrote: > FYI I left rook as is and reverted to ceph 17.2.6 and the issue is > resolved. > > The code change was added by > commit 2e52c029bc2b052bb96f4731c6bb00e30ed209be: > ceph-volume: fix broken workaround for atari partitions

[ceph-users] Re: OSD fails to start after 17.2.6 to 17.2.7 update

2023-11-07 Thread Matthew Booth
I just discovered that rook is tracking this here: https://github.com/rook/rook/issues/13136 On Tue, 7 Nov 2023 at 18:09, Matthew Booth wrote: > On Tue, 7 Nov 2023 at 16:26, Matthew Booth wrote: > >> FYI I left rook as is and reverted to ceph 17.2.6 and the issue is >> resolv

[ceph-users] Re: Debian 12 support

2023-11-12 Thread Matthew Darwin
We are still waiting on debian 12 support.  Currently our ceph is stuck on debian 11 due to lack of debian 12 releases. On 2023-11-01 03:23, nessero karuzo wrote: Hi to all ceph community. I have a question about Debian 12 support for ceph 17. I didn’t find repo for that release athttps://down

[ceph-users] Re: v17.2.7 Quincy released

2023-11-12 Thread Matthew Darwin
It would be nice if the dashboard changes which are very big would have been covered in the release notes, especially since they are not really backwards compatible. (See my previous messages on this topic) On 2023-10-30 10:50, Yuri Weinstein wrote: We're happy to announce the 7th backport rel

[ceph-users] Re: Debian 12 support

2023-11-13 Thread Matthew Vernon
ctation is that the next point release of Reef (due soon!) will have Debian packages built as part of it. Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: v18.2.1 Reef released

2023-12-19 Thread Matthew Vernon
18.2.1 (whereas the reporter is still on 18.2.0)? i.e. one has to upgrade to 18.2.1 before this bug will be fixed and so the upgrade _to_ 18.2.1 is still affected. Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send

[ceph-users] Understanding subvolumes

2024-01-31 Thread Matthew Melendy
3c3c6f96ffcf [root@ceph1 ~]# ceph fs subvolume ls cephfs csvg [ { "name": "staff" } ] -- Sincerely, Matthew Melendy IT Services Specialist CS System Services Group FEC 3550, University of New Mexico ___ ceph-users mail

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-02 Thread Matthew Darwin
Chris, Thanks for all the investigations you are doing here. We're on quincy/debian11.  Is there any working path at this point to reef/debian12?  Ideally I want to go in two steps.  Upgrade ceph first or upgrade debian first, then do the upgrade to the other one. Most of our infra is already

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-21 Thread Matthew Vernon
the dashboard); there is a MR to fix just the dashboard issue which got merged into main. I've opened a MR to backport that change to Reef: https://github.com/ceph/ceph/pull/55689 I don't know what the devs' plans are for dealing with the broader pyO3 issue, but I'll ask on

[ceph-users] Ceph-storage slack access

2024-03-06 Thread Matthew Vernon
Hi, How does one get an invite to the ceph-storage slack, please? Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph-storage slack access

2024-03-06 Thread Matthew Vernon
e to https://docs.ceph.com/en/latest/start/get-involved/ which lacks the registration link. Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Reconstructing an OSD server when the boot OS is corrupted

2024-05-02 Thread Matthew Vernon
On 24/04/2024 13:43, Bailey Allison wrote: A simple ceph-volume lvm activate should get all of the OSDs back up and running once you install the proper packages/restore the ceph config file/etc., What's the equivalent procedure in a cephadm-managed cluster? Thanks, Ma

[ceph-users] How to define a read-only sub-user?

2024-05-08 Thread Matthew Darwin
Hi, I'm new to bucket policies. I'm trying to create a sub-user that has only read-only access to all the buckets of the main user. I created the below policy, I can't create or delete files, but I can still create buckets using "rclone mkdir".  Any idea what I'm doing wrong? I'm using ceph

[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-10 Thread Matthew Darwin
We have had pgs get stuck  in quincy (17.2.7).  After changing to wpq, no such problems were observed.  We're using a replicated (x3) pool. On 2024-05-02 10:02, Wesley Dillingham wrote: In our case it was with a EC pool as well. I believe the PG state was degraded+recovering / recovery_wait and

[ceph-users] cephadm basic questions: image config, OS reimages

2024-05-16 Thread Matthew Vernon
Ds and away you went; how does one do this in a cephadm cluster? [I presume involves telling cephadm to download a new image for podman to use and suchlike] Would the process be smoother if we arranged to leave /var/lib/ceph intact between reimages

[ceph-users] cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Matthew Vernon
thing to want to do with cephadm? I'm running ceph version 18.2.2 (531c0d11a1c5d39fbfe6aa8a521f023abf3bf3e2) reef (stable) Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Matthew Vernon
Hi, On 20/05/2024 17:29, Anthony D'Atri wrote: On May 20, 2024, at 12:21 PM, Matthew Vernon wrote: This has left me with a single sad pg: [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive pg 1.0 is stuck inactive for 33m, current state unknown, last acting [] .mgr

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Matthew Vernon
work, and vgdisplay on the vg that pvs tells me the nvme device is in shows 24 LVs... Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-21 Thread Matthew Vernon
quot; and similar for the others, but is there a way to have what I want done by cephadm bootstrap? Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-22 Thread Matthew Vernon
hope it's at least useful as a starter-for-ten: https://github.com/ceph/ceph/pull/57633 Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] ceph orch osd rm --zap --replace leaves cluster in odd state

2024-05-28 Thread Matthew Vernon
emoved. What did I do wrong? I don't much care about the OSD id (but obviously it's neater to not just incrementally increase OSD numbers every time a disk died), but I thought that telling ceph orch not to make new OSDs then using ceph orch osd rm to zap the disk and NVME lv

[ceph-users] Re: ceph orch osd rm --zap --replace leaves cluster in odd state

2024-05-28 Thread Matthew Vernon
roy osd.35 ; echo $? OSD(s) 35 are safe to destroy without reducing data durability. 0 I should have said - this is a reef 18.2.2 cluster, cephadm deployed. Regards, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] rgw mgr module not shipped? (in reef at least)

2024-05-31 Thread Matthew Vernon
of modules already, and the rgw one is effectively one small python file, I think... I'm using 18.2.2. Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Setting hostnames for zonegroups via cephadm / rgw mgr module?

2024-06-04 Thread Matthew Vernon
of the zonegroup (and thus control what hostname(s) the rgws are expecting to serve)? Have I missed something, or do I need to set up the realm/zonegroup/zone, extract the zonegroup json and edit hostnames by hand? Thanks, Matthew ___ ceph-users ma

[ceph-users] strange OSD status when rebooting one server

2022-10-14 Thread Matthew Darwin
Hi, I am hoping someone can help explain this strange message.  I took 1 physical server offline which contains 11 OSDs.  "ceph -s" reports 11 osd down.  Great. But on the next line it says "4 hosts" are impacted.  It should only be 1 single host?  When I look the manager dashboard all the O

[ceph-users] Re: strange OSD status when rebooting one server

2022-10-14 Thread Matthew Darwin
hint... Hth Am 14. Oktober 2022 18:45:40 MESZ schrieb Matthew Darwin : Hi, I am hoping someone can help explain this strange message.  I took 1 physical server offline which contains 11 OSDs.  "ceph -s" reports 11 osd down.  Great. But on the next line it says &qu

[ceph-users] Re: strange OSD status when rebooting one server

2022-10-14 Thread Matthew Darwin
9, rum S14 ____ From: Matthew Darwin Sent: 14 October 2022 18:57:37 To:c...@elchaka.de;ceph-users@ceph.io Subject: [ceph-users] Re: strange OSD status when rebooting one server https://gist.githubusercontent.com/matthewdarwin/aec3c2b16ba5e74beb4af1d49e

[ceph-users] Re: S3 Deletes in Multisite Sometimes Not Syncing

2022-12-23 Thread Matthew Darwin
Hi Alex, We also have a multi-site setup (17.2.5). I just deleted a bunch of files from one side and some files got deleted on the other side but not others. I waited 10 hours to see if the files would delete. I didn't do an exhaustive test like yours, but seems similar issues. In our case, l

[ceph-users] Laggy PGs on a fairly high performance cluster

2023-01-12 Thread Matthew Stroud
We have a 14 osd node all ssd cluster and for some reason we are continually getting laggy PGs and those seem to correlate to slow requests on Quincy (doesn't seem to happen on our Pacific clusters). These laggy pgs seem to shift between osds. The network seems solid, as in I'm not seeing errors

  1   2   >