[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Matthew Vernon
On 06/09/2024 10:27, Matthew Vernon wrote: On 06/09/2024 08:08, Redouane Kachach wrote: That makes sense. The ipv6 BUG can lead to the issue you described. In the current implementation whenever a mgr failover takes place, prometheus configuration (when using the monitoring stack deployed by

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-06 Thread Matthew Vernon
Hi, On 06/09/2024 08:08, Redouane Kachach wrote: That makes sense. The ipv6 BUG can lead to the issue you described. In the current implementation whenever a mgr failover takes place, prometheus configuration (when using the monitoring stack deployed by Ceph) is updated automatically to point

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-05 Thread Matthew Vernon
On 05/09/2024 15:03, Matthew Vernon wrote: Hi, On 05/09/2024 12:49, Redouane Kachach wrote: The port 8765 is the "service discovery" (an internal server that runs in the mgr... you can change the port by changing the variable service_discovery_port of cephadm). Normally it is ope

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-05 Thread Matthew Vernon
Hi, On 05/09/2024 12:49, Redouane Kachach wrote: The port 8765 is the "service discovery" (an internal server that runs in the mgr... you can change the port by changing the variable service_discovery_port of cephadm). Normally it is opened in the active mgr and the service is used by prometheu

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-04 Thread Matthew Vernon
Hi, I tracked it down to 2 issues: * our ipv6-only deployment (a bug fixed in 18.2.4, though that has buggy .debs) * Discovery service is only run on the active mgr The latter point is surely a bug? Isn't the point of running a service discovery endpoint that one could point e.g. an externa

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
Hi, On 03/09/2024 14:27, Tim Holloway wrote: FWIW, I'm using podman not docker. The netstat command is not available in the stock Ceph containers, but the "ss" command is, so use that to see if there is in fact a process listening on that port. I have done this, and there's nothing listening

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
On 03/09/2024 13:33, Eugen Block wrote: Oh that's interesting :-D I have no explanation for that, except maybe some flaw in your custom images? Or in the service specs? Not sure, to be honest... So obviously it _could_ be something in our images, but we're using Ceph's published .debs (18.2.2

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
Hi, On 03/09/2024 11:46, Eugen Block wrote: Do you see the port definition in the unit.meta file? Oddly: "ports": [ 9283, 8765, 8765, 8765, 8765 ], which doesn't look right... Regards, Mattew ___ ce

[ceph-users] Re: Discovery (port 8765) service not starting

2024-09-03 Thread Matthew Vernon
Hi, On 02/09/2024 21:24, Eugen Block wrote: Without having looked too closely, do you run ceph with IPv6? There’s a tracker issue: https://tracker.ceph.com/issues/66426 It will be backported to Reef. I do run IPv6, but the problem is that nothing is listening on port 8765 at all, not that

[ceph-users] Discovery (port 8765) service not starting

2024-09-02 Thread Matthew Vernon
Hi, I'm running reef, with locally-built containers based on upstream .debs. I've now enabled prometheus metrics thus: ceph mgr module enable prometheus And that seems to have worked (the active mgr is listening on port 9283); but per the docs[0] there should also be a service discovery endpo

[ceph-users] Re: cephadm basic questions: image config, OS reimages

2024-08-27 Thread Matthew Vernon
Hi, On 16/05/2024 17:03, Adam King wrote: At least for the current up-to-date reef branch (not sure what reef version you're on) when --image is not provided to the shell, it should try to infer the image in this order 1. from the CEPHADM_IMAGE env. variable 2. if you pass --name with a dae

[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-25 Thread Matthew Vernon
On 24/06/2024 21:18, Matthew Vernon wrote: 2024-06-24T17:33:26.880065+00:00 moss-be2001 ceph-mgr[129346]: [rgw ERROR root] Non-zero return from ['radosgw-admin', '-k', '/var/lib/ceph/mgr/ceph-moss-be2001.qvwcaq/keyring', '-n', 'mgr.moss-be20

[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-24 Thread Matthew Vernon
On 24/06/2024 20:49, Matthew Vernon wrote: On 19/06/2024 19:45, Adam King wrote: I think this is at least partially a code bug in the rgw module. Where ...the code path seems to have a bunch of places it might raise an exception; are those likely to result in some entry in a log-file? I&#x

[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-24 Thread Matthew Vernon
On 19/06/2024 19:45, Adam King wrote: I think this is at least partially a code bug in the rgw module. Where ...the code path seems to have a bunch of places it might raise an exception; are those likely to result in some entry in a log-file? I've not found anything, which is making working o

[ceph-users] ceph rgw zone create fails EINVAL

2024-06-19 Thread Matthew Vernon
Hi, I'm running cephadm/reef 18.2.2. I'm trying to set up multisite. I created realm/zonegroup/master zone OK (I think!), edited the zonegroup json to include hostnames. I have this spec file for the secondary zone: rgw_zone: codfw rgw_realm_token: "SECRET" placement: label: "rgw" [I get

[ceph-users] Setting hostnames for zonegroups via cephadm / rgw mgr module?

2024-06-04 Thread Matthew Vernon
Hi, I'm using reef (18.2.2); the docs talk about setting up a multi-site setup with a spec file e.g. rgw_realm: apus rgw_zonegroup: apus_zg rgw_zone: eqiad placement: label: "rgw" but I don't think it's possible to configure the "hostnames" parameter of the zonegroup (and thus control what

[ceph-users] rgw mgr module not shipped? (in reef at least)

2024-05-31 Thread Matthew Vernon
Hi, As far as I can tell, the rgw mgr module is not shipped in the published reef Debian packages (nor, I suspect, the ubuntu ones, but I've not actually checked). Is there a reason why it couldn't just be added to ceph-mgr-modules-core ? That contains quite a large number of modules already

[ceph-users] Re: ceph orch osd rm --zap --replace leaves cluster in odd state

2024-05-28 Thread Matthew Vernon
On 28/05/2024 17:07, Wesley Dillingham wrote: What is the state of your PGs? could you post "ceph -s" PGs all good: root@moss-be1001:/# ceph -s cluster: id: d7849d66-183c-11ef-b973-bc97e1bb7c18 health: HEALTH_WARN 1 stray daemon(s) not managed by cephadm services:

[ceph-users] ceph orch osd rm --zap --replace leaves cluster in odd state

2024-05-28 Thread Matthew Vernon
Hi, I want to prepare a failed disk for replacement. I did: ceph orch osd rm 35 --zap --replace and it's now in the state "Done, waiting for purge", with 0 pgs, and REPLACE and ZAP set to true. It's been like this for some hours, and now my cluster is unhappy: [WRN] CEPHADM_STRAY_DAEMON: 1 s

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-22 Thread Matthew Vernon
Hi, On 22/05/2024 12:44, Eugen Block wrote: you can specify the entire tree in the location statement, if you need to: [snip] Brilliant, that's just the ticket, thank you :) This should be made a bit clearer in the docs [0], I added Zac. I've opened a MR to update the docs, I hope it's a

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-21 Thread Matthew Vernon
Hi, Returning to this, it looks like the issue wasn't to do with how osd_crush_chooseleaf_type ; I destroyed and re-created my cluster as before, and I have the same problem again: pg 1.0 is stuck inactive for 10m, current state unknown, last acting [] as before, ceph osd tree: root@mos

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Matthew Vernon
Hi, Thanks for your help! On 20/05/2024 18:13, Anthony D'Atri wrote: You do that with the CRUSH rule, not with osd_crush_chooseleaf_type. Set that back to the default value of `1`. This option is marked `dev` for a reason ;) OK [though not obviously at https://docs.ceph.com/en/reef/rados

[ceph-users] Re: cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Matthew Vernon
Hi, On 20/05/2024 17:29, Anthony D'Atri wrote: On May 20, 2024, at 12:21 PM, Matthew Vernon wrote: This has left me with a single sad pg: [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive pg 1.0 is stuck inactive for 33m, current state unknown, last acting [] .mgr

[ceph-users] cephadm bootstraps cluster with bad CRUSH map(?)

2024-05-20 Thread Matthew Vernon
Hi, I'm probably Doing It Wrong here, but. My hosts are in racks, and I wanted ceph to use that information from the get-go, so I tried to achieve this during bootstrap. This has left me with a single sad pg: [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive pg 1.0 is stuck

[ceph-users] cephadm basic questions: image config, OS reimages

2024-05-16 Thread Matthew Vernon
Hi, I've some experience with Ceph, but haven't used cephadm much before, and am trying to configure a pair of reef clusters with cephadm. A couple of newbie questions, if I may: * cephadm shell image I'm in an isolated environment, so pulling from a local repository. I bootstrapped OK with

[ceph-users] Re: Reconstructing an OSD server when the boot OS is corrupted

2024-05-02 Thread Matthew Vernon
On 24/04/2024 13:43, Bailey Allison wrote: A simple ceph-volume lvm activate should get all of the OSDs back up and running once you install the proper packages/restore the ceph config file/etc., What's the equivalent procedure in a cephadm-managed cluster? Thanks, Matthew __

[ceph-users] Re: Ceph-storage slack access

2024-03-06 Thread Matthew Vernon
Hi, On 06/03/2024 16:49, Gregory Farnum wrote: Has the link on the website broken? https://ceph.com/en/community/connect/ We've had trouble keeping it alive in the past (getting a non-expiring invite), but I thought that was finally sorted out. Ah, yes, that works. Sorry, I'd gone to https://d

[ceph-users] Ceph-storage slack access

2024-03-06 Thread Matthew Vernon
Hi, How does one get an invite to the ceph-storage slack, please? Thanks, Matthew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-21 Thread Matthew Vernon
[mgr modules failing because pyO3 can't be imported more than once] On 29/01/2024 12:27, Chris Palmer wrote: I have logged this as https://tracker.ceph.com/issues/64213 I've noted there that it's related to https://tracker.ceph.com/issues/63529 (an earlier report relating to the dashboard);

[ceph-users] Re: v18.2.1 Reef released

2023-12-19 Thread Matthew Vernon
On 19/12/2023 06:37, Eugen Block wrote: Hi, I thought the fix for that would have made it into 18.2.1. It was marked as resolved two months ago (https://tracker.ceph.com/issues/63150, https://github.com/ceph/ceph/pull/53922). Presumably that will only take effect once ceph orch is version 18

[ceph-users] Re: Debian 12 support

2023-11-13 Thread Matthew Vernon
Hi, On 13/11/2023 10:42, Chris Palmer wrote: And another big +1 for debian12 reef from us. We're unable to upgrade to either debian12 or reef. I've been keeping an eye on the debian12 bug, and it looks as though it might be fixed if you start from the latest repo release. My expectation is th

[ceph-users] Re: Debian/bullseye build for reef

2023-09-07 Thread Matthew Vernon
Hi, On 21/08/2023 17:16, Josh Durgin wrote: We weren't targeting bullseye once we discovered the compiler version problem, the focus shifted to bookworm. If anyone would like to help maintaining debian builds, or looking into these issues, it would be welcome: https://bugs.debian.org/cgi-bin

[ceph-users] Re: Debian/bullseye build for reef

2023-09-04 Thread Matthew Vernon
Hi, On 21/08/2023 17:16, Josh Durgin wrote: We weren't targeting bullseye once we discovered the compiler version problem, the focus shifted to bookworm. If anyone would like to help maintaining debian builds, or looking into these issues, it would be welcome: https://bugs.debian.org/cgi-bin

[ceph-users] Building Ceph containers

2023-01-16 Thread Matthew Vernon
Hi, Is it possible/supported to build Ceph containers on Debian? The build instructions[0] talk about building packages (incl. .debs), but now building containers. Cephadm only supports containerised deployments, but our local policy is that we should only deploy containers we've built ourse

[ceph-users] Re: OS suggestion for further ceph installations (centos stream, rocky, ubuntu)?

2022-02-04 Thread Matthew Vernon
On 01/02/2022 12:40, Boris Behrens wrote: Personally I like ubuntu a lot, but most of the ceph developers seem to come from redhat (or at least a RH flavored background) to I could imagine that this might be a slightly more optimal way. If you want to run with Ubuntu, you might find the Ubuntu

[ceph-users] Re: [RGW] bi_list(): (5) Input/output error blocking resharding

2022-01-10 Thread Matthew Vernon
Hi, On 07/01/2022 18:39, Gilles Mocellin wrote: Anyone who had that problem find a workaround ? Are you trying to reshard a bucket in a multisite setup? That isn't expected to work (and, IIRC, the changes to support doing so aren't going to make it into quincy). Regards, Matthew

[ceph-users] Re: switching ceph-ansible from /dev/sd to /dev/disk/by-path

2022-01-07 Thread Matthew Vernon
Hi, On 06/01/2022 17:42, Dave Holland wrote: The right solution appears to be to configure ceph-ansible to use /dev/disk/by-path device names, allowing for the expander IDs being embedded in the device name -- so those would have to be set per-host with host vars. Has anyone done that change fr

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-17 Thread Matthew Vernon
On 17/11/2021 15:19, Marc wrote: The CLT is discussing a more feasible alternative to LTS, namely to publish an RC for each point release and involve the user community to help test it. How many users even have the availability of a 'test cluster'? The Sanger has one (3 hosts), which was a re

[ceph-users] Re: Stretch cluster experiences in production?

2021-10-19 Thread Matthew Vernon
Hi, On 18/10/2021 23:34, Gregory Farnum wrote: On Fri, Oct 15, 2021 at 8:22 AM Matthew Vernon wrote: Also, if I'm using RGWs, will they do the right thing location-wise? i.e. DC A RGWs will talk to DC A OSDs wherever possible? Stretch clusters are entirely a feature of the RADOS lay

[ceph-users] Stretch cluster experiences in production?

2021-10-15 Thread Matthew Vernon
Hi, Stretch clusters[0] are new in Pacific; does anyone have experience of using one in production? I ask because I'm thinking about new RGW cluster (split across two main DCs), which I would naturally be doing using RGW multi-site between two clusters. But it strikes me that a stretch clu

[ceph-users] Re: OSD Service Advanced Specification db_slots

2021-09-10 Thread Matthew Vernon
On 10/09/2021 15:20, Edward R Huyer wrote: Question 2: If db_slots still *doesn't* work, is there a coherent way to divide up a solid state DB drive for use by a bunch of OSDs when the OSDs may not all be created in one go? At first I thought it was related to limit, but re-reading the advance

[ceph-users] Re: [Ceph Upgrade] - Rollback Support during Upgrade failure

2021-09-08 Thread Matthew Vernon
Hi, On 06/09/2021 08:37, Lokendra Rathour wrote: Thanks, Mathew for the Update. The upgrade got failed for some random wired reasons, Checking further Ceph's status shows that "Ceph health is OK" and times it gives certain warnings but I think that is ok. OK... but what if we see the Versio

[ceph-users] Re: [Ceph Upgrade] - Rollback Support during Upgrade failure

2021-09-03 Thread Matthew Vernon
On 02/09/2021 09:34, Lokendra Rathour wrote: We have deployed the Ceph Octopus release using Ceph-Ansible. During the upgrade from Octopus to Pacific release we saw the upgrade got failed. I'm afraid you'll need to provide some more details (e.g. ceph -s output) on the state of your cluster;

[ceph-users] Re: Howto upgrade AND change distro

2021-08-27 Thread Matthew Vernon
Hi, On 27/08/2021 16:16, Francois Legrand wrote: We are running a ceph nautilus cluster under centos 7. To upgrade to pacific we need to change to a more recent distro (probably debian or ubuntu because of the recent announcement about centos 8, but the distro doesn't matter very much). How

[ceph-users] RGW Swift & multi-site

2021-08-16 Thread Matthew Vernon
Hi, Are there any issues to be aware of when using RGW's newer multi-site features with the Swift front-end? I've, perhaps unfairly, gathered the impression that the Swift support in RGW gets less love than S3... Thanks, Matthew ps: new email address, as I've moved employer

[ceph-users] Re: How can I check my rgw quota ? [EXT]

2021-06-23 Thread Matthew Vernon
On 22/06/2021 12:58, Massimo Sgaravatto wrote: Sorry for the very naive question: I know how to set/check the rgw quota for a user (using radosgw-admin) But how can a radosgw user check what is the quota assigned to his/her account , using the S3 and/or the swift interface ? I think you ca

[ceph-users] Re: ceph buckets [EXT]

2021-06-08 Thread Matthew Vernon
Hi, On 08/06/2021 11:37, Rok Jaklič wrote: I try to create buckets through rgw in following order: - *bucket1* with *user1* with *access_key1* and *secret_key1* - *bucket1* with *user2* with *access_key2* and *secret_key2* when I try to create a second bucket1 with user2 I get *Error response

[ceph-users] Why you might want packages not containers for Ceph deployments

2021-06-02 Thread Matthew Vernon
Hi, In the discussion after the Ceph Month talks yesterday, there was a bit of chat about cephadm / containers / packages. IIRC, Sage observed that a common reason in the recent user survey for not using cephadm was that it only worked on containerised deployments. I think he then went on to

[ceph-users] Re: time duration of radosgw-admin [EXT]

2021-06-02 Thread Matthew Vernon
Hi, On 01/06/2021 21:29, Rok Jaklič wrote: is it normal that radosgw-admin user info --uid=user ... takes around 3s or more? Seems to take about 1s on our production cluster (Octopus), which isn't exactly speedy, but good enough... Regards, Matthew -- The Wellcome Sanger Institute is o

[ceph-users] Re: Nautilus, Ceph-Ansible, existing OSDs, and ceph.conf updates [EXT]

2021-04-12 Thread Matthew Vernon
On 10/04/2021 13:03, Dave Hall wrote: Hello, A while back I asked about the troubles I was having with Ceph-Ansible when I kept existing OSDs in my inventory file when managing my Nautilus cluster. At the time it was suggested that once the OSDs have been configured they should be excluded from

[ceph-users] Re: ceph-ansible in Pacific and beyond? [EXT]

2021-03-18 Thread Matthew Vernon
Hi, On 18/03/2021 15:03, Guillaume Abrioux wrote: ceph-ansible@stable-6.0 supports pacific and the current content in the branch 'master' (future stable-7.0) is intended to support Ceph Quincy. I can't speak on behalf of Dimitri but I'm personally willing to keep maintaining ceph-ansible if th

[ceph-users] Re: Email alerts from Ceph [EXT]

2021-03-18 Thread Matthew Vernon
Hi, On 17/03/2021 22:26, Andrew Walker-Brown wrote: How have folks implemented getting email or snmp alerts out of Ceph? Getting things like osd/pool nearly full or osd/daemon failures etc. I'm afraid we used our existing Nagios infrastructure for checking HEALTH status, and have a script that

[ceph-users] Telemetry ident use?

2021-03-17 Thread Matthew Vernon
Hi, What use is made of the ident data in the telemetry module? It's disabled by default, and the docs don't seem to say what it's used for... Thanks, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a

[ceph-users] ceph-ansible in Pacific and beyond?

2021-03-17 Thread Matthew Vernon
Hi, I caught up with Sage's talk on what to expect in Pacific ( https://www.youtube.com/watch?v=PVtn53MbxTc ) and there was no mention of ceph-ansible at all. Is it going to continue to be supported? We use it (and uncontainerised packages) for all our clusters, so I'd be a bit alarmed if it

[ceph-users] Re: lvm fix for reseated reseated device [EXT]

2021-03-15 Thread Matthew Vernon
On 15/03/2021 11:29, Matthew Vernon wrote: On 15/03/2021 11:09, Dan van der Ster wrote: Occasionally we see a bus glitch which causes a device to disappear then reappear with a new /dev/sd name. This crashes the osd (giving IO errors) but after a reboot the OSD will be perfectly fine. We&#x

[ceph-users] Re: lvm fix for reseated reseated device [EXT]

2021-03-15 Thread Matthew Vernon
On 15/03/2021 11:09, Dan van der Ster wrote: Occasionally we see a bus glitch which causes a device to disappear then reappear with a new /dev/sd name. This crashes the osd (giving IO errors) but after a reboot the OSD will be perfectly fine. We're looking for a way to reeactivate osd like this

[ceph-users] Re: Questions RE: Ceph/CentOS/IBM [EXT]

2021-03-03 Thread Matthew Vernon
Hi, You can get support for running Ceph on a number of distributions - RH support both RHEL and Ubuntu, Canonical support Ubuntu, the smaller consultancies seem happy to support anything plausible (e.g. Debian), this mailing list will opine regardless of what distro you're running ;-) Regar

[ceph-users] Re: Octopus auto-scale causing HEALTH_WARN re object numbers [EXT]

2021-03-03 Thread Matthew Vernon
On 02/03/2021 16:38, Matthew Vernon wrote: root@sto-t1-1:~# ceph health detail HEALTH_WARN 1 pools have many more objects per pg than average; 9 pgs not deep-scrubbed in time [WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than average     pool default.rgw.buckets.data

[ceph-users] Octopus auto-scale causing HEALTH_WARN re object numbers

2021-03-02 Thread Matthew Vernon
Hi, I've upgraded our test cluster to Octopus, and enabled the auto-scaler. It's nearly finished: PG autoscaler decreasing pool 11 PGs from 1024 to 32 (4d) [==..] (remaining: 3h) But I notice it looks to be making pool 11 smaller when HEALTH_WARN thinks it s

[ceph-users] "optimal" tunables on release upgrade

2021-02-26 Thread Matthew Vernon
Hi, Having been slightly caught out by tunables on my Octopus upgrade[0], can I just check that if I do ceph osd crush tunables optimal That will update the tunables on the cluster to the current "optimal" values (and move a lot of data around), but that this doesn't mean they'll change next

[ceph-users] Re: Consequences of setting bluestore_fsck_quick_fix_on_mount to false?

2021-02-16 Thread Matthew Vernon
Hi, On 16/02/2021 08:06, Dan van der Ster wrote: Which version are you upgrading from? If recent nautilus, you may have already completed this conversion. Mimic (well, really Luminous with a pit-stop at Mimic). When we did this fsck (not with octopus, but to a nautilus point release that ha

[ceph-users] Consequences of setting bluestore_fsck_quick_fix_on_mount to false?

2021-02-15 Thread Matthew Vernon
Hi, Looking at the Octopus upgrade instructions, I see "the first time each OSD starts, it will do a format conversion to improve the accounting for “omap” data. This may take a few minutes to as much as a few hours (for an HDD with lots of omap data)." and that I can disable this by setting

[ceph-users] Re: share haproxy config for radosgw [EXT]

2021-02-15 Thread Matthew Vernon
On 14/02/2021 21:31, Graham Allan wrote: On Tue, Feb 9, 2021 at 11:00 AM Matthew Vernon <mailto:m...@sanger.ac.uk>> wrote: On 07/02/2021 22:19, Marc wrote: > > I was wondering if someone could post a config for haproxy. Is there something specific to configur

[ceph-users] Re: Backups of monitor [EXT]

2021-02-15 Thread Matthew Vernon
On 12/02/2021 15:47, Freddy Andersen wrote: I would say everyone recommends at least 3 monitors and since they need to be 1,3,5 or 7 I always read that as 5 is the best number (if you have 5 servers in your cluster). We have 3 on all our clusters, and at the risk of tempting fate, haven't had a

[ceph-users] Re: share haproxy config for radosgw [EXT]

2021-02-09 Thread Matthew Vernon
On 07/02/2021 22:19, Marc wrote: I was wondering if someone could post a config for haproxy. Is there something specific to configure? Like binding clients to a specific backend server, client timeouts, security specific to rgw etc. Ours is templated out by ceph-ansible; to try and condense

[ceph-users] Re: Using RBD to pack billions of small files

2021-02-04 Thread Matthew Vernon
Hi, On 04/02/2021 07:41, Loïc Dachary wrote: On 04/02/2021 05:51, Federico Lucifredi wrote: Hi Loïc,    I am intrigued, but am missing something: why not using RGW, and store the source code files as objects? RGW has native compression and can take care of that behind the scenes. Excellent

[ceph-users] Re: Sequence replacing a failed OSD disk? [EXT]

2021-01-04 Thread Matthew Vernon
On 31/12/2020 09:10, Rainer Krienke wrote: Yesterday my ceph nautilus 14.2.15 cluster had a disk with unreadable sectors, after several tries the OSD was marked down and rebalancing started and has also finished successfully. ceph osd stat shows the osd now as "autoout,exists". Usually the step

[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11 [EXT]

2020-12-16 Thread Matthew Vernon
Hi, On 15/12/2020 20:44, Suresh Rama wrote: TL;DR: use a real NTP client, not systemd-timesyncd 1) We audited the network (inspecting TOR, iperf, MTR) and nothing was indicating any issue but OSD logs were keep complaining about BADAUTHORIZER ...this is quite possibly due to clock skew on yo

[ceph-users] Re: Huge HDD ceph monitor usage [EXT]

2020-10-26 Thread Matthew Vernon
On 26/10/2020 14:13, Ing. Luis Felipe Domínguez Vega wrote: How can i free the store of ceph monitor?: root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle# du -h -d1 542G    ./store.db 542G    .

[ceph-users] Hardware needs for MDS for HPC/OpenStack workloads?

2020-10-22 Thread Matthew Vernon
Hi, We're considering the merits of enabling CephFS for our main Ceph cluster (which provides object storage for OpenStack), and one of the obvious questions is what sort of hardware we would need for the MDSs (and how many!). These would be for our users scientific workloads, so they would

[ceph-users] Re: Ceph RGW Performance [EXT]

2020-09-28 Thread Matthew Vernon
Hi, On 25/09/2020 20:39, Dylan Griff wrote: We have 10Gb network to our two RGW nodes behind a single ip on haproxy, and some iperf testing shows I can push that much; latencies look okay. However, when using a small cosbench cluster I am unable to get more than ~250Mb of read speed total. A

[ceph-users] Re: radosgw beast access logs [EXT]

2020-08-19 Thread Matthew Vernon
On 19/08/2020 14:01, Casey Bodley wrote: Yes, this was implemented by Mark Kogan in https://github.com/ceph/ceph/pull/33083 . It looks like it was backported to Octopus for 15.2.5 in https://tracker.ceph.com/issues/45951. Is there interest in a nautilus backport too? I don't think we'd be ab

[ceph-users] Ceph not warning about clock skew on an OSD-only host?

2020-08-11 Thread Matthew Vernon
Hi, Our production cluster runs Luminous. Yesterday, one of our OSD-only hosts came up with its clock about 8 hours wrong(!) having been out of the cluster for a week or so. Initially, ceph seemed entirely happy, and then after an hour or so it all went South (OSDs start logging about bad aut

[ceph-users] Re: Ceph SSH orchestrator? [EXT]

2020-07-06 Thread Matthew Vernon
Hi, On 03/07/2020 19:44, Oliver Freyermuth wrote: Am 03.07.20 um 20:29 schrieb Dimitri Savineau: You can try to use ceph-ansible which supports baremetal and containerized deployment. https://github.com/ceph/ceph-ansible Thanks for the pointer! I know about ceph-ansible. The problem is that

[ceph-users] Re: Re-run ansible to add monitor and RGWs

2020-06-15 Thread Matthew Vernon
On 14/06/2020 17:07, Khodayar Doustar wrote: Now I want to add the other two nodes as monitor and rgw. Can I just modify the ansible host file and re-run the site.yml? Yes. I've done some modification in Storage classes, I've added some OSD and uploaded a lot of data up to now. Is it safe t

[ceph-users] Using Ceph-ansible for a luminous -> nautilus upgrade?

2020-06-01 Thread Matthew Vernon
Hi, For previous Ceph version upgrades, we've used the rolling_upgrade playbook from Ceph-ansible - for example, the stable-3.0 branch supports both Jewel and Luminous, so we used it to migrate our clusters from Jewel to Luminous. As I understand it, upgrading direct from Luminous to Nautilu

[ceph-users] Re: Servicing multiple OpenStack clusters from the same Ceph cluster [EXT]

2020-01-29 Thread Matthew Vernon
Hi, On 29/01/2020 16:40, Paul Browne wrote: > Recently we deployed a brand new Stein cluster however, and I'm curious > whether the idea of pointing the new OpenStack cluster at the same RBD > pools for Cinder/Glance/Nova as the Luminous cluster would be considered > bad practice, or even potenti

[ceph-users] Re: Ceph User Survey 2019 [EXT]

2019-11-28 Thread Matthew Vernon
Hi, On 27/11/2019 18:28, Mike Perez wrote: To better understand how our current users utilize Ceph, we conducted a public community survey. This information is a guide to the community of how we spend our contribution efforts for future development. The survey results will remain anonymous an

[ceph-users] Re: ceph-volume lvm create leaves half-built OSDs lying around

2019-09-12 Thread Matthew Vernon
On 11/09/2019 12:23, Jan Fajerski wrote: On Wed, Sep 11, 2019 at 11:17:47AM +0100, Matthew Vernon wrote: We keep finding part-made OSDs (they appear not attached to any host, and down and out; but still counting towards the number of OSDs); we never saw this with ceph-disk. On investigation