On 06/09/2024 10:27, Matthew Vernon wrote:
On 06/09/2024 08:08, Redouane Kachach wrote:
That makes sense. The ipv6 BUG can lead to the issue you described. In
the current implementation whenever a mgr failover takes place,
prometheus configuration (when using the monitoring stack deployed by
Hi,
On 06/09/2024 08:08, Redouane Kachach wrote:
That makes sense. The ipv6 BUG can lead to the issue you described. In
the current implementation whenever a mgr failover takes place,
prometheus configuration (when using the monitoring stack deployed by
Ceph) is updated automatically to point
On 05/09/2024 15:03, Matthew Vernon wrote:
Hi,
On 05/09/2024 12:49, Redouane Kachach wrote:
The port 8765 is the "service discovery" (an internal server that runs in
the mgr... you can change the port by changing the
variable service_discovery_port of cephadm). Normally it is ope
Hi,
On 05/09/2024 12:49, Redouane Kachach wrote:
The port 8765 is the "service discovery" (an internal server that runs in
the mgr... you can change the port by changing the
variable service_discovery_port of cephadm). Normally it is opened in the
active mgr and the service is used by prometheu
Hi,
I tracked it down to 2 issues:
* our ipv6-only deployment (a bug fixed in 18.2.4, though that has buggy
.debs)
* Discovery service is only run on the active mgr
The latter point is surely a bug? Isn't the point of running a service
discovery endpoint that one could point e.g. an externa
Hi,
On 03/09/2024 14:27, Tim Holloway wrote:
FWIW, I'm using podman not docker.
The netstat command is not available in the stock Ceph containers, but
the "ss" command is, so use that to see if there is in fact a process
listening on that port.
I have done this, and there's nothing listening
On 03/09/2024 13:33, Eugen Block wrote:
Oh that's interesting :-D I have no explanation for that, except maybe
some flaw in your custom images? Or in the service specs? Not sure, to
be honest...
So obviously it _could_ be something in our images, but we're using
Ceph's published .debs (18.2.2
Hi,
On 03/09/2024 11:46, Eugen Block wrote:
Do you see the port definition in the unit.meta file?
Oddly:
"ports": [
9283,
8765,
8765,
8765,
8765
],
which doesn't look right...
Regards,
Mattew
___
ce
Hi,
On 02/09/2024 21:24, Eugen Block wrote:
Without having looked too closely, do you run ceph with IPv6? There’s a
tracker issue:
https://tracker.ceph.com/issues/66426
It will be backported to Reef.
I do run IPv6, but the problem is that nothing is listening on port 8765
at all, not that
Hi,
I'm running reef, with locally-built containers based on upstream .debs.
I've now enabled prometheus metrics thus:
ceph mgr module enable prometheus
And that seems to have worked (the active mgr is listening on port
9283); but per the docs[0] there should also be a service discovery
endpo
Hi,
On 16/05/2024 17:03, Adam King wrote:
At least for the current up-to-date reef branch (not sure what reef
version you're on) when --image is not provided to the shell, it should
try to infer the image in this order
1. from the CEPHADM_IMAGE env. variable
2. if you pass --name with a dae
On 24/06/2024 21:18, Matthew Vernon wrote:
2024-06-24T17:33:26.880065+00:00 moss-be2001 ceph-mgr[129346]: [rgw
ERROR root] Non-zero return from ['radosgw-admin', '-k',
'/var/lib/ceph/mgr/ceph-moss-be2001.qvwcaq/keyring', '-n',
'mgr.moss-be20
On 24/06/2024 20:49, Matthew Vernon wrote:
On 19/06/2024 19:45, Adam King wrote:
I think this is at least partially a code bug in the rgw module. Where
...the code path seems to have a bunch of places it might raise an
exception; are those likely to result in some entry in a log-file? I
On 19/06/2024 19:45, Adam King wrote:
I think this is at least partially a code bug in the rgw module. Where
...the code path seems to have a bunch of places it might raise an
exception; are those likely to result in some entry in a log-file? I've
not found anything, which is making working o
Hi,
I'm running cephadm/reef 18.2.2. I'm trying to set up multisite.
I created realm/zonegroup/master zone OK (I think!), edited the
zonegroup json to include hostnames. I have this spec file for the
secondary zone:
rgw_zone: codfw
rgw_realm_token: "SECRET"
placement:
label: "rgw"
[I get
Hi,
I'm using reef (18.2.2); the docs talk about setting up a multi-site
setup with a spec file e.g.
rgw_realm: apus
rgw_zonegroup: apus_zg
rgw_zone: eqiad
placement:
label: "rgw"
but I don't think it's possible to configure the "hostnames" parameter
of the zonegroup (and thus control what
Hi,
As far as I can tell, the rgw mgr module is not shipped in the published
reef Debian packages (nor, I suspect, the ubuntu ones, but I've not
actually checked).
Is there a reason why it couldn't just be added to ceph-mgr-modules-core
? That contains quite a large number of modules already
On 28/05/2024 17:07, Wesley Dillingham wrote:
What is the state of your PGs? could you post "ceph -s"
PGs all good:
root@moss-be1001:/# ceph -s
cluster:
id: d7849d66-183c-11ef-b973-bc97e1bb7c18
health: HEALTH_WARN
1 stray daemon(s) not managed by cephadm
services:
Hi,
I want to prepare a failed disk for replacement. I did:
ceph orch osd rm 35 --zap --replace
and it's now in the state "Done, waiting for purge", with 0 pgs, and
REPLACE and ZAP set to true. It's been like this for some hours, and now
my cluster is unhappy:
[WRN] CEPHADM_STRAY_DAEMON: 1 s
Hi,
On 22/05/2024 12:44, Eugen Block wrote:
you can specify the entire tree in the location statement, if you need to:
[snip]
Brilliant, that's just the ticket, thank you :)
This should be made a bit clearer in the docs [0], I added Zac.
I've opened a MR to update the docs, I hope it's a
Hi,
Returning to this, it looks like the issue wasn't to do with how
osd_crush_chooseleaf_type ; I destroyed and re-created my cluster as
before, and I have the same problem again:
pg 1.0 is stuck inactive for 10m, current state unknown, last acting []
as before, ceph osd tree:
root@mos
Hi,
Thanks for your help!
On 20/05/2024 18:13, Anthony D'Atri wrote:
You do that with the CRUSH rule, not with osd_crush_chooseleaf_type. Set that
back to the default value of `1`. This option is marked `dev` for a reason ;)
OK [though not obviously at
https://docs.ceph.com/en/reef/rados
Hi,
On 20/05/2024 17:29, Anthony D'Atri wrote:
On May 20, 2024, at 12:21 PM, Matthew Vernon wrote:
This has left me with a single sad pg:
[WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive
pg 1.0 is stuck inactive for 33m, current state unknown, last acting []
.mgr
Hi,
I'm probably Doing It Wrong here, but. My hosts are in racks, and I
wanted ceph to use that information from the get-go, so I tried to
achieve this during bootstrap.
This has left me with a single sad pg:
[WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive
pg 1.0 is stuck
Hi,
I've some experience with Ceph, but haven't used cephadm much before,
and am trying to configure a pair of reef clusters with cephadm. A
couple of newbie questions, if I may:
* cephadm shell image
I'm in an isolated environment, so pulling from a local repository. I
bootstrapped OK with
On 24/04/2024 13:43, Bailey Allison wrote:
A simple ceph-volume lvm activate should get all of the OSDs back up and
running once you install the proper packages/restore the ceph config
file/etc.,
What's the equivalent procedure in a cephadm-managed cluster?
Thanks,
Matthew
__
Hi,
On 06/03/2024 16:49, Gregory Farnum wrote:
Has the link on the website broken? https://ceph.com/en/community/connect/
We've had trouble keeping it alive in the past (getting a non-expiring
invite), but I thought that was finally sorted out.
Ah, yes, that works. Sorry, I'd gone to
https://d
Hi,
How does one get an invite to the ceph-storage slack, please?
Thanks,
Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
[mgr modules failing because pyO3 can't be imported more than once]
On 29/01/2024 12:27, Chris Palmer wrote:
I have logged this as https://tracker.ceph.com/issues/64213
I've noted there that it's related to
https://tracker.ceph.com/issues/63529 (an earlier report relating to the
dashboard);
On 19/12/2023 06:37, Eugen Block wrote:
Hi,
I thought the fix for that would have made it into 18.2.1. It was marked
as resolved two months ago (https://tracker.ceph.com/issues/63150,
https://github.com/ceph/ceph/pull/53922).
Presumably that will only take effect once ceph orch is version 18
Hi,
On 13/11/2023 10:42, Chris Palmer wrote:
And another big +1 for debian12 reef from us. We're unable to upgrade to
either debian12 or reef.
I've been keeping an eye on the debian12 bug, and it looks as though it
might be fixed if you start from the latest repo release.
My expectation is th
Hi,
On 21/08/2023 17:16, Josh Durgin wrote:
We weren't targeting bullseye once we discovered the compiler version
problem, the focus shifted to bookworm. If anyone would like to help
maintaining debian builds, or looking into these issues, it would be
welcome:
https://bugs.debian.org/cgi-bin
Hi,
On 21/08/2023 17:16, Josh Durgin wrote:
We weren't targeting bullseye once we discovered the compiler version
problem, the focus shifted to bookworm. If anyone would like to help
maintaining debian builds, or looking into these issues, it would be
welcome:
https://bugs.debian.org/cgi-bin
Hi,
Is it possible/supported to build Ceph containers on Debian? The build
instructions[0] talk about building packages (incl. .debs), but now
building containers.
Cephadm only supports containerised deployments, but our local policy is
that we should only deploy containers we've built ourse
On 01/02/2022 12:40, Boris Behrens wrote:
Personally I like ubuntu a lot, but most of the ceph developers seem to
come from redhat (or at least a RH flavored background) to I could imagine
that this might be a slightly more optimal way.
If you want to run with Ubuntu, you might find the Ubuntu
Hi,
On 07/01/2022 18:39, Gilles Mocellin wrote:
Anyone who had that problem find a workaround ?
Are you trying to reshard a bucket in a multisite setup? That isn't
expected to work (and, IIRC, the changes to support doing so aren't
going to make it into quincy).
Regards,
Matthew
Hi,
On 06/01/2022 17:42, Dave Holland wrote:
The right solution appears to be to configure ceph-ansible to use
/dev/disk/by-path device names, allowing for the expander IDs being
embedded in the device name -- so those would have to be set per-host
with host vars. Has anyone done that change fr
On 17/11/2021 15:19, Marc wrote:
The CLT is discussing a more feasible alternative to LTS, namely to
publish an RC for each point release and involve the user community to
help test it.
How many users even have the availability of a 'test cluster'?
The Sanger has one (3 hosts), which was a re
Hi,
On 18/10/2021 23:34, Gregory Farnum wrote:
On Fri, Oct 15, 2021 at 8:22 AM Matthew Vernon wrote:
Also, if I'm using RGWs, will they do the right thing location-wise?
i.e. DC A RGWs will talk to DC A OSDs wherever possible?
Stretch clusters are entirely a feature of the RADOS lay
Hi,
Stretch clusters[0] are new in Pacific; does anyone have experience of
using one in production?
I ask because I'm thinking about new RGW cluster (split across two main
DCs), which I would naturally be doing using RGW multi-site between two
clusters.
But it strikes me that a stretch clu
On 10/09/2021 15:20, Edward R Huyer wrote:
Question 2: If db_slots still *doesn't* work, is there a coherent
way to divide up a solid state DB drive for use by a bunch of OSDs
when the OSDs may not all be created in one go? At first I thought
it was related to limit, but re-reading the advance
Hi,
On 06/09/2021 08:37, Lokendra Rathour wrote:
Thanks, Mathew for the Update.
The upgrade got failed for some random wired reasons, Checking further
Ceph's status shows that "Ceph health is OK" and times it gives certain
warnings but I think that is ok.
OK...
but what if we see the Versio
On 02/09/2021 09:34, Lokendra Rathour wrote:
We have deployed the Ceph Octopus release using Ceph-Ansible.
During the upgrade from Octopus to Pacific release we saw the upgrade got
failed.
I'm afraid you'll need to provide some more details (e.g. ceph -s
output) on the state of your cluster;
Hi,
On 27/08/2021 16:16, Francois Legrand wrote:
We are running a ceph nautilus cluster under centos 7. To upgrade to
pacific we need to change to a more recent distro (probably debian or
ubuntu because of the recent announcement about centos 8, but the distro
doesn't matter very much).
How
Hi,
Are there any issues to be aware of when using RGW's newer multi-site
features with the Swift front-end? I've, perhaps unfairly, gathered the
impression that the Swift support in RGW gets less love than S3...
Thanks,
Matthew
ps: new email address, as I've moved employer
On 22/06/2021 12:58, Massimo Sgaravatto wrote:
Sorry for the very naive question:
I know how to set/check the rgw quota for a user (using radosgw-admin)
But how can a radosgw user check what is the quota assigned to his/her
account , using the S3 and/or the swift interface ?
I think you ca
Hi,
On 08/06/2021 11:37, Rok Jaklič wrote:
I try to create buckets through rgw in following order:
- *bucket1* with *user1* with *access_key1* and *secret_key1*
- *bucket1* with *user2* with *access_key2* and *secret_key2*
when I try to create a second bucket1 with user2 I get *Error response
Hi,
In the discussion after the Ceph Month talks yesterday, there was a bit
of chat about cephadm / containers / packages. IIRC, Sage observed that
a common reason in the recent user survey for not using cephadm was that
it only worked on containerised deployments. I think he then went on to
Hi,
On 01/06/2021 21:29, Rok Jaklič wrote:
is it normal that radosgw-admin user info --uid=user ... takes around 3s or
more?
Seems to take about 1s on our production cluster (Octopus), which isn't
exactly speedy, but good enough...
Regards,
Matthew
--
The Wellcome Sanger Institute is o
On 10/04/2021 13:03, Dave Hall wrote:
Hello,
A while back I asked about the troubles I was having with Ceph-Ansible when
I kept existing OSDs in my inventory file when managing my Nautilus cluster.
At the time it was suggested that once the OSDs have been configured they
should be excluded from
Hi,
On 18/03/2021 15:03, Guillaume Abrioux wrote:
ceph-ansible@stable-6.0 supports pacific and the current content in the
branch 'master' (future stable-7.0) is intended to support Ceph Quincy.
I can't speak on behalf of Dimitri but I'm personally willing to keep
maintaining ceph-ansible if th
Hi,
On 17/03/2021 22:26, Andrew Walker-Brown wrote:
How have folks implemented getting email or snmp alerts out of Ceph?
Getting things like osd/pool nearly full or osd/daemon failures etc.
I'm afraid we used our existing Nagios infrastructure for checking
HEALTH status, and have a script that
Hi,
What use is made of the ident data in the telemetry module? It's
disabled by default, and the docs don't seem to say what it's used for...
Thanks,
Matthew
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
Hi,
I caught up with Sage's talk on what to expect in Pacific (
https://www.youtube.com/watch?v=PVtn53MbxTc ) and there was no mention
of ceph-ansible at all.
Is it going to continue to be supported? We use it (and uncontainerised
packages) for all our clusters, so I'd be a bit alarmed if it
On 15/03/2021 11:29, Matthew Vernon wrote:
On 15/03/2021 11:09, Dan van der Ster wrote:
Occasionally we see a bus glitch which causes a device to disappear
then reappear with a new /dev/sd name. This crashes the osd (giving IO
errors) but after a reboot the OSD will be perfectly fine.
We
On 15/03/2021 11:09, Dan van der Ster wrote:
Occasionally we see a bus glitch which causes a device to disappear
then reappear with a new /dev/sd name. This crashes the osd (giving IO
errors) but after a reboot the OSD will be perfectly fine.
We're looking for a way to reeactivate osd like this
Hi,
You can get support for running Ceph on a number of distributions - RH
support both RHEL and Ubuntu, Canonical support Ubuntu, the smaller
consultancies seem happy to support anything plausible (e.g. Debian),
this mailing list will opine regardless of what distro you're running ;-)
Regar
On 02/03/2021 16:38, Matthew Vernon wrote:
root@sto-t1-1:~# ceph health detail
HEALTH_WARN 1 pools have many more objects per pg than average; 9 pgs
not deep-scrubbed in time
[WRN] MANY_OBJECTS_PER_PG: 1 pools have many more objects per pg than
average
pool default.rgw.buckets.data
Hi,
I've upgraded our test cluster to Octopus, and enabled the auto-scaler.
It's nearly finished:
PG autoscaler decreasing pool 11 PGs from 1024 to 32 (4d)
[==..] (remaining: 3h)
But I notice it looks to be making pool 11 smaller when HEALTH_WARN
thinks it s
Hi,
Having been slightly caught out by tunables on my Octopus upgrade[0],
can I just check that if I do
ceph osd crush tunables optimal
That will update the tunables on the cluster to the current "optimal"
values (and move a lot of data around), but that this doesn't mean
they'll change next
Hi,
On 16/02/2021 08:06, Dan van der Ster wrote:
Which version are you upgrading from? If recent nautilus, you may have
already completed this conversion.
Mimic (well, really Luminous with a pit-stop at Mimic).
When we did this fsck (not with octopus, but to a nautilus point
release that ha
Hi,
Looking at the Octopus upgrade instructions, I see "the first time each
OSD starts, it will do a format conversion to improve the accounting for
“omap” data. This may take a few minutes to as much as a few hours (for
an HDD with lots of omap data)." and that I can disable this by setting
On 14/02/2021 21:31, Graham Allan wrote:
On Tue, Feb 9, 2021 at 11:00 AM Matthew Vernon <mailto:m...@sanger.ac.uk>> wrote:
On 07/02/2021 22:19, Marc wrote:
>
> I was wondering if someone could post a config for haproxy. Is
there something specific to configur
On 12/02/2021 15:47, Freddy Andersen wrote:
I would say everyone recommends at least 3 monitors and since they
need to be 1,3,5 or 7 I always read that as 5 is the best number (if
you have 5 servers in your cluster).
We have 3 on all our clusters, and at the risk of tempting fate, haven't
had a
On 07/02/2021 22:19, Marc wrote:
I was wondering if someone could post a config for haproxy. Is there something
specific to configure? Like binding clients to a specific backend server,
client timeouts, security specific to rgw etc.
Ours is templated out by ceph-ansible; to try and condense
Hi,
On 04/02/2021 07:41, Loïc Dachary wrote:
On 04/02/2021 05:51, Federico Lucifredi wrote:
Hi Loïc,
I am intrigued, but am missing something: why not using RGW, and store the
source code files as objects? RGW has native compression and can take care of
that behind the scenes.
Excellent
On 31/12/2020 09:10, Rainer Krienke wrote:
Yesterday my ceph nautilus 14.2.15 cluster had a disk with unreadable
sectors, after several tries the OSD was marked down and rebalancing
started and has also finished successfully. ceph osd stat shows the osd
now as "autoout,exists".
Usually the step
Hi,
On 15/12/2020 20:44, Suresh Rama wrote:
TL;DR: use a real NTP client, not systemd-timesyncd
1) We audited the network (inspecting TOR, iperf, MTR) and nothing was
indicating any issue but OSD logs were keep complaining about
BADAUTHORIZER
...this is quite possibly due to clock skew on yo
On 26/10/2020 14:13, Ing. Luis Felipe Domínguez Vega wrote:
How can i free the store of ceph monitor?:
root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle# du -h -d1
542G ./store.db
542G .
Hi,
We're considering the merits of enabling CephFS for our main Ceph
cluster (which provides object storage for OpenStack), and one of the
obvious questions is what sort of hardware we would need for the MDSs
(and how many!).
These would be for our users scientific workloads, so they would
Hi,
On 25/09/2020 20:39, Dylan Griff wrote:
We have 10Gb network to our two RGW nodes behind a single ip on
haproxy, and some iperf testing shows I can push that much; latencies
look okay. However, when using a small cosbench cluster I am unable to
get more than ~250Mb of read speed total.
A
On 19/08/2020 14:01, Casey Bodley wrote:
Yes, this was implemented by Mark Kogan in
https://github.com/ceph/ceph/pull/33083 . It looks like it was
backported to Octopus for 15.2.5 in https://tracker.ceph.com/issues/45951. Is
there interest in a nautilus
backport too?
I don't think we'd be ab
Hi,
Our production cluster runs Luminous.
Yesterday, one of our OSD-only hosts came up with its clock about 8
hours wrong(!) having been out of the cluster for a week or so.
Initially, ceph seemed entirely happy, and then after an hour or so it
all went South (OSDs start logging about bad aut
Hi,
On 03/07/2020 19:44, Oliver Freyermuth wrote:
Am 03.07.20 um 20:29 schrieb Dimitri Savineau:
You can try to use ceph-ansible which supports baremetal and
containerized deployment.
https://github.com/ceph/ceph-ansible
Thanks for the pointer! I know about ceph-ansible. The problem is
that
On 14/06/2020 17:07, Khodayar Doustar wrote:
Now I want to add the other two nodes as monitor and rgw.
Can I just modify the ansible host file and re-run the site.yml?
Yes.
I've done some modification in Storage classes, I've added some OSD and
uploaded a lot of data up to now. Is it safe t
Hi,
For previous Ceph version upgrades, we've used the rolling_upgrade
playbook from Ceph-ansible - for example, the stable-3.0 branch supports
both Jewel and Luminous, so we used it to migrate our clusters from
Jewel to Luminous.
As I understand it, upgrading direct from Luminous to Nautilu
Hi,
On 29/01/2020 16:40, Paul Browne wrote:
> Recently we deployed a brand new Stein cluster however, and I'm curious
> whether the idea of pointing the new OpenStack cluster at the same RBD
> pools for Cinder/Glance/Nova as the Luminous cluster would be considered
> bad practice, or even potenti
Hi,
On 27/11/2019 18:28, Mike Perez wrote:
To better understand how our current users utilize Ceph, we conducted a
public community survey. This information is a guide to the community of
how we spend our contribution efforts for future development. The survey
results will remain anonymous an
On 11/09/2019 12:23, Jan Fajerski wrote:
On Wed, Sep 11, 2019 at 11:17:47AM +0100, Matthew Vernon wrote:
We keep finding part-made OSDs (they appear not attached to any host,
and down and out; but still counting towards the number of OSDs); we
never saw this with ceph-disk. On investigation
79 matches
Mail list logo