I think this is at least partially a code bug in the rgw module. Where it's
actually failing in the traceback is generating the return message for the
user at the end, because it assumes `created_zones` will always be a list
of strings and that seems to not be the case in any error scenario. That
c
I don't remember how connected the dashboard is to the orchestrator in
pacific, but the only thing I could think to do here is just restart it.
(ceph mgr module disable dashboard, ceph mgr module enable dashboard). You
could also totally fail over the mgr (ceph mgr fail) although that might
change
Interesting. Given this is coming from a radosgw-admin call being done from
within the rgw mgr module, I wonder if a radosgw-admin log file is ending
up in the active mgr container when this happens.
On Wed, Jun 26, 2024 at 9:04 AM Daniel Gryniewicz wrote:
> On 6/25/24 3:21 PM, Matthew Vernon w
Weinstein wrote:
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/66756#note-1
>
> Release Notes - TBD
> LRC upgrade - TBD
>
> (Reruns were not done yet.)
>
> Seeking approvals/reviews for:
>
> smoke
> rados - Radek, Laura
>
als:
> smoke - n/a?
> orch - Adam King
> krbd - Ilya
> quincy-x, reef-x - Laura, Neha
> perf-basic - n/a
> crimson-rados - n/a
> ceph-volume - Guillaume
>
> Neha, Laura - I assume we don't plan gibba/LRC upgrade, pls confirm
>
> On Wed, Jul 3, 2024 at 5:55 AM Ven
To pull quay.io/prometheus/node-exporter:v1.5.0 the nodes would need
access to the internet, yes. I don't fully understand the reason for
> root@node-01:~# ceph config set mgr
> mgr/cephadm/container_image_node_exporter
> quay.io/prometheus/node-exporter:v1.5.0
though. Why not tell it to point t
I wouldn't worry about the one the config option gives you right now. The
one on your local repo looks like the same version. For isolated
deployments like this, the default options aren't going to work, as they'll
always point to images that require internet access to pull. I'd just
update the con
The thing that stands out to me from that output was that the image has no
repo_digests. It's possible cephadm is expecting there to be digests and is
crashing out trying to grab them for this image. I think it's worth a try
to set mgr/cephadm/use_repo_digest to false, and then restart the mgr. FWI
ceph-exporter should get deployed by default with new installations on
recent versions, but as a general principle we've avoided adding/removing
services from the cluster during an upgrade. There is perhaps a case for
this service in particular if the user also has the rest of the monitoring
stack
point of bootstrapping?
>
> I confess I don't really understand why this field is not set by the
> docker client running locally. I wonder if I can do anything on the docker
> client side to add a repo digest. I'll explore that a bit.
>
> Thanks,
> Alex
>
> --
If you're using VMs,
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/6X6QIEMWDYSA6XOKEYH5OJ4TIQSBD5BL/
might be relevant
On Tue, Aug 6, 2024 at 3:21 AM Nicola Mori wrote:
> I think I found the problem. Setting the cephadm log level to debug and
> then watching the logs during th
It might be worth trying to manually upgrade one of the mgr daemons. If you
go to the host with a mgr and edit the
/var/lib/ceph///unit.run so that the image specified
in the long podman/docker run command in there is the 17.2.7 image. Then
just restart its systemd unit (don't tell the orchestrator
> rados - Radek, Laura (https://github.com/ceph/ceph/pull/59020 is being
> tested and will be cherry-picked when ready)
>
> rgw - Eric, Adam E
> fs - Venky
> orch - Adam King
> rbd, krbd - Ilya
>
> quincy-x, reef-x - Laura, Neha
>
> powercycle - Brad
> crimson-rad
Looking through the code it doesn't seem like this will work currently. I
found that the --data-dir arg to the cephadm binary was from the initial
implementation of the cephadm binary (so early that it was actually called
"ceph-daemon" at the time rather than "cephadm") but it doesn't look like
tha
but if
> it's a fix that's appropriate for someone who doesn't know the Ceph
> codebase (me) I'd be happy to have a look at implementing a fix.
>
> Best Wishes,
> Alex
>
> --
> *From:* Adam King
> *Sent:* Monday, August 12, 2
I don't think pacific has the upgrade error handling work so it's a bit
tougher to debug here. I think it should have printed a traceback into the
logs though. Maybe right after it crashes if you check `ceph log last 200
cephadm` there might be something. If not, you might need to do a `ceph mgr
fa
If you're referring to https://tracker.ceph.com/issues/57675, it got into
16.2.14, although there was another issue where running a `ceph orch
restart mgr` or `ceph orch redeploy mgr` would cause an endless loop of the
mgr daemons restarting, which would block all operations, that might be
what we
" which doesn't look very serious
anyway, I don't think there's any reason for the failure to hold up the
release
On Thu, Aug 15, 2024 at 6:53 PM Laura Flores wrote:
> The upgrade suites look mostly good to me, except for one tracker I think
> would be in @Adam King &
- [travisn] Arm64 OSDs crashing on v18.2.4, need a fix in v18.2.5
- https://tracker.ceph.com/issues/67213
- tcmalloc issue, solved by rebuilding the gperftools package
- Travis to reach out to Rongqi Sun about the issue
- moving away from tcmalloc would probably cause perform
rade - TBD
>
> It was decided and agreed upon that there would be limited testing for
> this release, given it is based on 19.1.1 rather than a full rebase.
>
> Seeking approvals/reviews for:
> (some reruns are still in progress)
>
> rgw - Eric, Adam E
> fs - Venky
> o
Wanted to respond to the original thread I saw archived on this topic but I
wasn't subscribed to the mailing list yet so don't have the thread in my
inbox to reply to. Hopefully, those involved in that thread still see this.
This issue looks the same as https://tracker.ceph.com/issues/51027 which
Does running "ceph mgr fail" then waiting a bit make the "ceph orch"
commands responsive? That's worked for me sometimes before when they
wouldn't respond.
On Thu, Sep 16, 2021 at 8:08 AM Javier Cacheiro
wrote:
> Hi,
>
> I have configured a ceph cluster with the new Pacific version (16.2.4)
> us
It looks like the output from a ceph-volume command was too long to handle.
If you run "cephadm ceph-volume -- inventory --format=json" (add
"--with-lsm" if you've turned on device_enhanced_scan) manually on each
host do any of them fail in a similar fashion?
On Fri, Sep 24, 2021 at 1:37 PM Marco
gt; run_until_complete
>> return future.result()
>> File "/usr/sbin/cephadm", line 1433, in run_with_timeout
>> stdout, stderr = await asyncio.gather(tee(process.stdout),
>> File "/usr/sbin/cephadm", line 1415, in tee
>> async for line i
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey
> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known is
ests:
> https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd
>
> Still seeing approvals.
> smoke - Laura, Radek, Prashant, Venky in progress
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey in progress
> fs - Venky
> orch - Adam King
> rb
>
> https://tracker.ceph.com/issues/63151 - Adam King do we need anything for
> this?
>
Yes, but not an actual code change in the main ceph repo. I'm looking into
a ceph-container change to alter the ganesha version in the container as a
solution.
On Wed, Nov 8, 2023 at 11:10
t; ran the tests below and asking for approvals:
>
> smoke - Laura
> rados/mgr - PASSED
> rados/dashboard - Nizamudeen
> orch - Adam King
>
> See Build 4 runs - https://tracker.ceph.com/issues/63443#note-1
>
> On Tue, Nov 14, 2023 at 12:21 AM Redouane Kachach
> wrote:
tart building.
>
> Travis, Adam King - any need to rerun any suites?
>
> On Thu, Nov 16, 2023 at 7:14 AM Guillaume Abrioux
> wrote:
> >
> > Hi Yuri,
> >
> >
> >
> > Backport PR [2] for reef has been merged.
> >
> >
>
I think I remember a bug that happened when there was a small mismatch
between the cephadm version being used for bootstrapping and the container.
In this case, the cephadm binary used for bootstrap knows about the
ceph-exporter service and the container image being used does not. The
ceph-exporter
data:
> pools: 0 pools, 0 pgs
> objects: 0 objects, 0 B
> usage: 0 B used, 0 B / 0 B avail
> pgs:
>
> root@node1-ceph:~#
>
> Regards
>
>
>
> On Wed, Nov 29, 2023 at 5:45 PM Adam King wrote:
>
>> I think I remember a bug that happened
G N/A
>N/ANo 27m agoHas a FileSystem, Insufficient space (<10
> extents) on vgs, LVM detected
> node3-ceph /dev/xvdb ssd 100G N/A
>N/ANo 27m agoHas a FileSystem, Insufficient space (<10
>
The first handling of nfs exports over rgw in the nfs module, including the
`ceph nfs export create rgw` command, wasn't added to the nfs module in
pacific until 16.2.7.
On Thu, Dec 7, 2023 at 1:35 PM Adiga, Anantha
wrote:
> Hi,
>
>
> oot@a001s016:~# cephadm version
>
> Using recent ceph image c
- Build/package PRs- who to best review these?
- Example: https://github.com/ceph/ceph/pull/55218
- Idea: create a GitHub team specifically for these types of PRs
https://github.com/orgs/ceph/teams
- Laura will try to organize people for the group
- Pacific 16.2.15 status
If you just manually run `ceph orch daemon rm
` does it get removed? I know there's
some logic in host drain that does some ok-to-stop checks that can cause
things to be delayed or stuck if it doesn't think it's safe to remove the
daemon for some reason. I wonder if it's being overly cautious here.
Does seem like a bug, actually in more than just this command. The `ceph
orch host ls` with the --label and/or --host-pattern flag just piggybacks
off of the existing filtering done for placements in service specs. I've
just taken a look and you actually can create the same behavior with the
placem
It seems the quincy backport for that feature (
https://github.com/ceph/ceph/pull/53098) was merged Oct 1st 2023. According
to the quincy part of
https://docs.ceph.com/en/latest/releases/#release-timeline it looks like
that would mean it would only be present in 17.2.7, but not 17.2.6.
On Wed, Feb
Cephadm does not have some variable that explicitly says it's an HCI
deployment. However, the HCI variable in ceph ansible I believe only
controlled the osd_memory_target attribute, which would automatically set
it to 20% or 70% respectively of the memory on the node divided by the
number of OSDs
In regards to
>
> From the reading you gave me I have understood the following :
> 1 - Set osd_memory_target_autotune to true then set
> autotune_memory_target_ratio to 0.2
> 2 - Or do the math. For my setup I have 384Go per node, each node has 4
> nvme disks of 7.6To, 0.2 of memory is 19.5G. So ea
>
> - I still have the ceph-crash container, what should I do with it?
>
If it's the old one, I think you can remove it. Cephadm can deploy its own
crash service (`ceph orch apply crash` if it hasn't). You can check if
`crash` is listed under `ceph orch ls` and if it is there you can do `ceph
orch
There have been bugs in the past where things have gotten "stuck". Usually
I'd say check the REFRESHED column in the output of `ceph orch ps`. It
should refresh the daemons on each host roughly every 10 minutes, so if you
see some value much larger than that, things are probably actually stuck.
If
According to https://tracker.ceph.com/issues/58933, that was only
backported as far as reef. If I remember correctly, the reason for that was
the ganehsa version itself we were including in our quincy containers
wasn't new enough to support the feature on that end, so backporting the
nfs/orchestrat
Okay, it seems like from what you're saying the RGW image itself isn't
special compared to the other ceph daemons, it's just that you want to use
the image on your local registry. In that case, I would still recommend
just using `ceph orch upgrade start --image ` with the image
from your local regi
There was a bug with this that was fixed by
https://github.com/ceph/ceph/pull/52122 (which also specifically added an
integration test for this case). It looks like it's missing a reef and
quincy backport though unfortunately. I'll try to open one for both.
On Tue, Mar 5, 2024 at 8:26 AM Eugen Blo
When you ran this, was it directly on the host, or did you run `cephadm
shell` first? The two things you tend to need to connect to the cluster
(that "RADOS timed out" error is generally what you get when connecting to
the cluster fails. A bunch of different causes all end with that error) are
a ke
If you want to be directly setting up the OSDs using ceph-volume commands
(I'll pretty much always recommend following
https://docs.ceph.com/en/latest/cephadm/services/osd/#dedicated-wal-db over
manual ceph-volume stuff in cephadm deployments unless what you're doing
can't be done with the spec fil
>
> Hi,
>
> On 3/21/24 14:50, Michael Worsham wrote:
> >
> > Now that Reef v18.2.2 has come out, is there a set of instructions on
> how to upgrade to the latest version via using Cephadm?
>
> Yes, there is: https://docs.ceph.com/en/reef/cephadm/upgrade/
>
Just a note on that docs section, it refe
For context, the value the autotune goes with takes the value from `cephadm
gather-facts` on the host (the "memory_total_kb" field) and then subtracts
from that per daemon on the host according to
min_size_by_type = {
'mds': 4096 * 1048576,
'mgr': 4096 * 1048576,
'mon':
ording to ceph orch
> ps. Then again, they are nowhere near the values stated in min_size_by_type
> that you list.
> Obviously yes, I could disable the auto tuning, but that would leave me
> none the wiser as to why this exact host is trying to do this.
>
>
>
> On Tue, Mar
From the ceph versions output I can see
"osd": {
"ceph version 16.2.10-160.el8cp
(6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160
},
It seems like all the OSD daemons on this cluster are using that
16.2.10-160 image, and I'm guessing most of them are running, so it mu
No, you can't use the image id for hte upgrade command, it has to be the
image name. So it should start, based on what you have,
registry.redhat.io/rhceph/. As for the full name, it depends which image
you want to go with. As for trying this on an OSD first, there is `ceph
orch daemon redeploy --i
From what I can see with the most recent cephadm binary on pacific, unless
you have the CEPHADM_IMAGE env variable set, it does a `podman images
--filter label=ceph=True --filter dangling=false` (or docker) and takes the
first image in the list. It seems to be getting sorted by creation time by
def
https://tracker.ceph.com/issues/64428 should be it. Backports are done for
quincy, reef, and squid and the patch will be present in the next release
for each of those versions. There isn't a pacific backport as, afaik, there
are no more pacific releases planned.
On Fri, Mar 29, 2024 at 6:03 PM Ale
First, I guess I would make sure that peon7 and peon12 actually could pass
the host check (you can run "cephadm check-host" on the host directly if
you have a copy of the cephadm binary there) Then I'd try a mgr failover
(ceph mgr fail) to clear out any in memory host values cephadm might have
and
1 running (3w)
> 7m ago 11M2698M4096M 17.2.6
> osd.9my-ceph01 running (3w)
> 7m ago 11M3364M4096M 17.2.6
> prometheus.my-ceph01 my-ceph01 *:9095 running (3w) 7m
> ago 13M 164M-
t; "memory_total_kb": 32827840,
>
> On Thu, Apr 4, 2024 at 10:14 PM Adam King wrote:
>
>> Sorry to keep asking for more info, but can I also get what `cephadm
>> gather-facts` on that host returns for "memory_total_kb". Might end up
>> creating a
>> Hi Adam
>>
>> Let me just finish tucking in a devlish tyke here and i’ll get to it
>> first thing
>>
>> tirs. 9. apr. 2024 kl. 18.09 skrev Adam King :
>>
>>> I did end up writing a unit test to see what we calculated here, as well
>>>
es, still trying, Laura PTL
>
> rados - Radek, Laura approved? Travis? Nizamudeen?
>
> rgw - Casey approved?
> fs - Venky approved?
> orch - Adam King approved?
>
> krbd - Ilya approved
> powercycle - seems fs related, Venky, Brad PTL
>
> ceph-volume - will
ph/ceph/pull/56714>
On Tue, Apr 16, 2024 at 1:39 PM Laura Flores wrote:
> On behalf of @Radoslaw Zarzynski , rados approved.
>
> Below is the summary of the rados suite failures, divided by component. @Adam
> King @Venky Shankar PTAL at the
> orch and cephfs failures to se
FWIW, cephadm uses `quay.io/ceph/ceph-grafana:9.4.7` as the default grafana
image in the quincy branch
On Tue, Apr 23, 2024 at 11:59 AM Osama Elswah
wrote:
> Hi,
>
>
> in quay.io I can find a lot of grafana versions for ceph (
> https://quay.io/repository/ceph/grafana?tab=tags) how can I find ou
>
> - Although I can mount the export I can't write on it
>
> What error are you getting trying to do the write? The way you set things
up doesn't look to different than one of our integration tests for ingress
over nfs (
https://github.com/ceph/ceph/blob/main/qa/suites/orch/cephadm/smoke-roleless/
- DigitalOcean credits
- things to ask
- what would promotional material require
- how much are credits worth
- Neha to ask
- 19.1.0 centos9 container status
- close to being ready
- will be building centos 8 and 9 containers simultaneously
- should test o
At least for the current up-to-date reef branch (not sure what reef version
you're on) when --image is not provided to the shell, it should try to
infer the image in this order
1. from the CEPHADM_IMAGE env. variable
2. if you pass --name with a daemon name to the shell command, it will
t
n
before running the maintenance enter command is necessary.
Regards,
- Adam King
On Wed, Jul 13, 2022 at 11:02 AM Steven Goodliff <
steven.goodl...@globalrelay.net> wrote:
>
> Hi,
>
>
> I'm trying to reboot a ceph cluster one instance at a time by running in a
> Ansible
orch approved. The test_cephadm_repos test failure is just a problem with
the test I believe, not any actual ceph code. The other selinux denial I
don't think is new.
Thanks,
- Adam King
On Sun, Jul 24, 2022 at 11:33 AM Yuri Weinstein wrote:
> Still seeking approvals for:
>
>
Do the journal logs for any of the OSDs that are marked down give any
useful info on why they're failing to start back up? If the host level ip
issues have gone away I think that would be the next place to check.
On Mon, Jul 25, 2022 at 5:03 PM Jeremy Hansen
wrote:
> I noticed this on the initia
Usually it's pretty explicit in "ceph health detail". What does it say
there?
On Mon, Jul 25, 2022 at 9:05 PM Jeremy Hansen
wrote:
> How do I track down what is the stray daemon?
>
> Thanks
> -jeremy
> ___
> ceph-users mailing list -- ceph-users@ceph.i
the unit.image file is just there for cpehadm to look at as part of
gathering metadata I think. What you'd want to edit is the unit.run file
(in the same directory as the unit.image). It should have a really long
line specifying a podman/docker run command and somewhere in there will be
"CONTAINER_
mon with the new image? at least
> this is something I did in our testing here[1].
>
> ceph orch daemon redeploy mgr.
> quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531
>
> [1] https://github.com/ceph/ceph/pull/47270#issuecomment-1196062363
>
> On Wed, Jul 27
0f0bc2c6791f
>
> because of this I can't run a "ceph orch upgrade" because it always
> complains about having only one.
> Is there something else that needs to be changed to get the cluster to a
> normal state?
>
> Thanks!
>
> On Wed, 2022-07-27 at 12:23 -04
looks like it's not. Even
> on the dashboard the daemon
> shows as errored but it's running (confirmed via podman and systemctl).
> My take is that something is not communicating some information with
> "cephadm" but I don't know
> what. ceph itself knows the m
Cephadm has a config option to say whether to use the repo digest or the
tag name. If you want it to use tags "ceph config set mgr
mgr/cephadm/use_repo_digest false" should make that happen (it defaults to
true/using the digest). Beyond that, it's possible you may need to change
the config option f
It's possible there's a bug in cephadm around placements where hosts have
the _no_schedule label. There was
https://tracker.ceph.com/issues/56972 recently
for an issue with how _no_schedule interacts with placements using explicit
hostnames. It might be something similar here where it thinks there'
keyring and ceph.conf files.
>
> Should I open a bug somewhere?
>
> On Tue, 2022-08-02 at 08:39 -0400, Adam King wrote:
> > It's possible there's a bug in cephadm around placements where hosts
> have the _no_schedule label.
> > There was https://tracker.ceph.c
If you try shuffling some daemon around on some of the working hosts (e.g.
changing the placement of the node-exporter spec so that one of the working
hosts is excluded so the node-exporter there should be removed) is
cephadm able to actually complete that? Also, does device info for any or
all of
ize: 3 and running:
> 3)
>
> I'm running ceph -W cephadm with log_to_cluster_level set to debug, but
> except for the walls of text with the inventories, nothing (except
> _kick_service_loop) shows up in the logs after the INF level messages that
> host has been added or service sp
FWIW, cephadm only writes that file out if it doesn't exist entirely. You
might be able to just remove anything actional functional from it and just
leave a sort of dummy file with only a comment there as a workaround. Also,
was this an upgraded cluster? I tried quickly bootstrapping a
cephadm clus
You were correct about the difference between the distros. Was able to
reproduce fine on ubuntu 20.04 (was using centos 8.stream before). I
opened a tracker as well https://tracker.ceph.com/issues/57293
On Thu, Aug 25, 2022 at 7:44 AM Robert Sander
wrote:
> Am 25.08.22 um 13:41 schrieb A
Are there any extra directories in /var/lib/ceph or /var/lib/ceph/
that appear to be for those OSDs on that host? When cephadm builds the info
it uses for "ceph orch ps" it's actually scraping those directories. The
output of "cephadm ls" on the host with the duplicates could also
potentially have
Does "ceph orch upgrade status" give any insights (e.g. an error message of
some kind)? If not, maybe you could try looking at
https://tracker.ceph.com/issues/56485#note-2 because it seems like a
similar issue and I see you're using --ceph-version (which we need to fix,
sorry about that).
On Wed,
"container_image_name": "quay.io/ceph/ceph:v15",
> "container_image_id": null,
> "version": null,
> "started": null,
> "created": "2022-08-19T03:36:22.815608Z",
> "
> ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@mgr.ceph1.xmbvsb.service" and
> "journalctl -xe" for details.
> Traceback (most recent call last):
> File "/usr/sbin/cephadm", line 6250, in
> r = args.func()
> File "/usr/sbin/cephadm", line 1357, i
names, how does
> cephadm decide naming?
>
>
> https://achchusnulchikam.medium.com/deploy-ceph-cluster-with-cephadm-on-centos-8-257b300e7b42
>
> On Thu, Sep 1, 2022 at 6:20 PM Satish Patel wrote:
>
>> Hi Adam,
>>
>> Getting the following error, not sure why i
you see i have ceph1 two time. :(
>
> 10.73.0.191 ceph1.example.com ceph1
> 10.73.0.192 ceph2.example.com ceph1
>
> On Thu, Sep 1, 2022 at 8:06 PM Adam King wrote:
>
>> the naming for daemons is a bit different for each daemon type, but for
>> mgr daemons it's alwa
this looks like an old traceback you would get if you ended up with a
service type that shouldn't be there somehow. The things I'd probably check
are that "cephadm ls" on either host definitely doesn't report and strange
things that aren't actually daemons in your cluster such as
"cephadm.". Anothe
emove didn't work
>
> root@ceph1:~# ceph orch rm cephadm
> Failed to remove service. was not found.
>
> root@ceph1:~# ceph orch rm
> cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
> Failed to remove service.
>
> was not found.
>
> O
8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d
>
> But still getting the same error, do i need to do anything else?
>
> On Fri, Sep 2, 2022 at 9:51 AM Adam King wrote:
>
>> Okay, I'm wondering if this is an issue with version mismatch.
43f 3c963693ff2b
>>>>> grafana.ceph1
>>>>> ceph1 running (9h) 64s ago2w 6.7.4
>>>>> quay.io/ceph/ceph-grafana:6.7.4
>>>>> 557c83e11646 7583a8dc4c61
>>>>> mgr.ceph1.smfvfd
>>>
13 months ago
> 486MB
> quay.io/prometheus/prometheus v2.18.1 de242295e225 2 years ago
> 140MB
> quay.io/prometheus/alertmanagerv0.20.0 0881eb8f169f 2 years ago
> 52.1MB
> quay.io/prometheus/node-exporter v0.18.1 e5a616e4b9cf 3 years ago
> 22.9MB
>
&
mgr.ceph2.huidoh (mgr.344392) 211206 :
>> cephadm [DBG] 0 OSDs are scheduled for removal: []
>> 2022-09-02T18:38:21.762480+ mgr.ceph2.huidoh (mgr.344392) 211207 :
>> cephadm [DBG] Saving [] to store
>>
>> On Fri, Sep 2, 2022 at 12:17 PM Adam King wrote:
>>
>
orch suite failures fall under
https://tracker.ceph.com/issues/49287
https://tracker.ceph.com/issues/57290
https://tracker.ceph.com/issues/57268
https://tracker.ceph.com/issues/52321
For rados/cephadm the failures are both
https://tracker.ceph.com/issues/57290
Overall, nothing new/unexpected. orc
Budget Discussion
- Going to investigate current resources being used, see if any costs
can be cut
- What can be moved from virtual environments to internal ones?
- Need to take inventory of what resources we currently have and what
their costs are
17.2.4
- Gibba and LRC cluste
For the weird image, perhaps just "ceph orch daemon redeploy
rgw.testrgw.svtcephrgwv1.invwmo --image quay.io/ceph/ceph:v16.2.10" will
resolve it. Not sure about the other things wrong with it yet but I think
the image should be fixed before looking into that.
On Fri, Oct 14, 2022 at 5:47 AM Jean-M
Do the journal logs for the OSDs say anything about why they couldn't start
up? ("cephadm ls --no-detail" run on the host will give the systemd units
for each daemon on the host so you can get them easier).
On Mon, Oct 17, 2022 at 1:37 PM Brent Kennedy wrote:
> Below is what the ceph mgr log is
If you're using a fairly recent cephadm version, there is the ability to
provide miscellaneous container arguments in the service spec
https://docs.ceph.com/en/quincy/cephadm/services/#extra-container-arguments.
This means you can have cephadm deploy each container in that service with,
for example
We had actually considered adding an `extra_daemon_args` to be the
equivalent to `extra_container_args` but for the daemon itself rather than
a flag for the podman/docker run command. IIRC we thought it was a good
idea but nobody actually pushed to add it in then since (at the time) we
weren't awar
I don't know for sure if it will fix the issue, but the migrations happen
based on a config option "mgr/cephadm/migration_current". You could try
setting that back to 0 and it would at least trigger the migrations to
happen again after restarting/failing over the mgr. They're meant to be
idempotent
d been trying to
do before.
On Sat, Nov 19, 2022 at 8:05 AM Adam King wrote:
> I don't know for sure if it will fix the issue, but the migrations happen
> based on a config option "mgr/cephadm/migration_current". You could try
> setting that back to 0 and it would at leas
I typically don't see this when I do OSD replacement. If you do a mgr
failover ("ceph mgr fail") and wait a few minutes does this still show up?
The stray daemon/host warning is roughly equivalent to comparing the
daemons in `ceph node ls` and `ceph orch ps` and seeing if there's anything
in the fo
This can't be done in a very nice way currently. There's actually an open
PR against main to allow setting the crush location for mons in the service
spec specifically because others found that this was annoying as well. What
I think should work as a workaround is, go to the host where the mon that
1 - 100 of 214 matches
Mail list logo