The daemons cephadm "knows" about is actually just based on the contents of
the /var/lib/ceph// directory on each given host cephadm is managing.
If osd.6 was present, got removed by the host drain process, and then its
daemon directory was still on the host / there was still a container
running fo
I can't say I know why this is happening, but I can try to give some
context into what cephadm is doing here in case it helps give something to
look at. This is when cephadm creates the initial monmap. When we do so we
write a python "NamedTemporaryFile" and then mount that into a container
that co
ote-1
> >
> > Release Notes - TBD
> > LRC upgrade - TBD
> >
> > Seeking approvals/reviews for:
> >
> > rados - Radek, Laura
> > rgw- Adam Emerson
> > fs - Venky
> > orch - Adam King
> > rbd, krbd - Ilya
> > quincy-x, reef-x - Laura,
>
> > However, in practice,
> > many operations (e.g., using ceph-bluestore-tool
>
> Using that tool, to be fair, should be rare. Notably that tool requires
> that the OSD on which it operates not be running. I would think it might
> be possible to enter an OSD container and kill the ceph-osd pro
; as some people reported that it will help solve NFS HA issue ( e.g.
> haproxy,cfg deployed missing "check")
>
> Now neither NFS nor RGW works :-(
>
> How do I fix this ?
>
> thanks
> Steven
>
>
> https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/te
That flag got added to cephadm's haproxy template as part of
https://github.com/ceph/ceph/pull/61833. I'm very confused as to how you're
seeing it affect reef though, as we never backported it. It doesn't seem to
exist at all in the reef branch when I checked
adking@fedora:~/orch-ceph/ceph/src$ gi
You could try setting `ceph config set mgr mgr/cephadm/use_repo_digest
false` and give it another go. I've seen some issues in the past with using
the image digest with local repos.
On Thu, Apr 24, 2025 at 10:15 AM Sake Ceph wrote:
> We're used a local registry (docker registry), but needed to s
[Matt] required github checks (and make check) instability (and long run
times) greatly hurt developer productivity
- thanks for renewed attention to this from several folks
- can we break up make check into component wise portions
- can we take ceph api test out of the CI checks
- it also r
ues/70938#note-1
> Release Notes - TBD
> LRC upgrade - N/A
>
> Seeking approvals/reviews for:
>
> smoke - same as in 18.2.5
> rados - Radek, Laura approved?
> orch - Adam King, Guillaume approved?
>
> This release has two PRs:
> https://github.com/ceph/ceph/pull/62791
.ceph.com/issues/70563#note-1
> Release Notes - TBD
> LRC upgrade - TBD
>
> Seeking approvals/reviews for:
>
> smoke - Laura approved?
>
> rados - Radek, Laura approved? Travis? Nizamudeen? Adam King approved?
>
> rgw - Adam E approved?
>
> fs - Venky is fixing QA su
. So this doesn't really
> work as a workaround, it seems. I feel like the proper solution would
> be to include keepalive in the list of
> RESCHEDULE_FROM_OFFLINE_HOSTS_TYPES.
>
> Zitat von Adam King :
>
> > Which daemons get moved around like that is controll
Which daemons get moved around like that is controlled by
https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/utils.py#L30,
which appears to only include nfs and haproxy, so maybe this keepalive only
case was missed in that sense. I do think that you could alter the
placement of the ingre
Regarding the
ValueError: "'xwork.MethodAccessor.denyMethodExecution'" does not appear to
be an IPv4 or IPv6 address
can you check `ceph config-key get mgr/cephadm/inventory` and see if you
see something related to that (such as
"'xwork.MethodAccessor.denyMethodExecution'" being present as the ad
ISCSI is still being used in the LRC (long running cluster) that is a
storage backend for parts of the ceph team's infrastructure, so I don't
think it's going to disappear in the near future. I believe the plan is to
eventually swap over to nvmeof instead (
https://docs.ceph.com/en/reef/rbd/nvmeof-
It looks like the "resource not found" message is being directly output by
podman. Is there anything in the cephadm.log (/var/log/ceph/cephadm.log) on
one of the hosts where this is happening that says what podman command
cephadm was running that hit this error?
On Wed, Jan 8, 2025 at 5:27 AM tobi
ckers for failures so we avoid duplicates.
> Seeking approvals/reviews for:
>
> rados - Radek, Laura
> rgw - Eric, Adam E
> fs - Venky
> orch - Adam King
> rbd, krbd - Ilya
>
> quincy-x, reef-x - Laura, Neha
>
> crimson-rados - Matan, Samuel
>
> ceph-volume
Given the reference to that cherrypy backports stuff in the traceback, I'll
just mention we are in the process of removing that from the code as we've
seen issues with it in our testing as well (
https://github.com/ceph/ceph/pull/60602 /
https://tracker.ceph.com/issues/68802). We want that patch in
Quick comment on the CLI argos vs. the spec file. It actually shouldn't
allow you to do both for any flags that actually affect the service. If you
run `ceph orch apply -i ` it will only make use of the spec file
and should return an error if flags that affect the service like
`--unamanged` or `--p
I see you mentioned apparmor and MongoDB, so I guess there's a chance you
found https://tracker.ceph.com/issues/66389 already (your traceback also
looks the same). Other than making sure that relevant apparmor file it's
parsing doesn't contain settings with spaces or trying to manually apply
the fi
Just noticed this thread. A couple questions. Is what we want to have MDS
daemons in say zone A and zone B, but the ones in zone A are prioritized to
be active and ones in zone B remain as standby unless absolutely necessary
(all the ones in zone A are down) or is it that we want to have some subse
Where did the copy of cephadm you're using for the bootstrap come from? I'm
aware of a bug around that flag (https://tracker.ceph.com/issues/54137) but
that fix should have come in some time ago. I've seen some people,
especially if they're using the distros version of the cephadm package, end
up w
d nfs, it should now be safe to perform this
upgrade.
On Fri, Sep 27, 2024 at 11:40 AM Adam King wrote:
> WARNING, if you're using cephadm and nfs please don't upgrade to this
> release for the time being. There are compatibility issues with cephadm's
> deployment of
WARNING, if you're using cephadm and nfs please don't upgrade to this
release for the time being. There are compatibility issues with cephadm's
deployment of the NFS daemon and ganesha v6 which made its way into the
release container.
On Thu, Sep 26, 2024 at 6:20 PM Laura Flores wrote:
> We're v
Cybersecurity and Information Assurance
> 4 Brindabella Cct
> Brindabella Business Park
> Canberra Airport, ACT 2609
>
> www.raytheonaustralia.com.au
> LinkedIn | Twitter | Facebook | Instagram
>
> -Original Message-
> From: Adam King
> Sent: Monday, September 23, 202
Cephadm stored the key internally within the cluster and it can be grabbed
with `ceph config-key get mgr/cephadm/ssh_identity_key`. As for if you
already have keys setup, I'd recommend passing filepaths to those keys to
the `--ssh-private-key` and `--ssh-public-key` flags the bootstrap command
has
rade - TBD
>
> It was decided and agreed upon that there would be limited testing for
> this release, given it is based on 19.1.1 rather than a full rebase.
>
> Seeking approvals/reviews for:
> (some reruns are still in progress)
>
> rgw - Eric, Adam E
> fs - Venky
> o
- [travisn] Arm64 OSDs crashing on v18.2.4, need a fix in v18.2.5
- https://tracker.ceph.com/issues/67213
- tcmalloc issue, solved by rebuilding the gperftools package
- Travis to reach out to Rongqi Sun about the issue
- moving away from tcmalloc would probably cause perform
" which doesn't look very serious
anyway, I don't think there's any reason for the failure to hold up the
release
On Thu, Aug 15, 2024 at 6:53 PM Laura Flores wrote:
> The upgrade suites look mostly good to me, except for one tracker I think
> would be in @Adam King &
If you're referring to https://tracker.ceph.com/issues/57675, it got into
16.2.14, although there was another issue where running a `ceph orch
restart mgr` or `ceph orch redeploy mgr` would cause an endless loop of the
mgr daemons restarting, which would block all operations, that might be
what we
I don't think pacific has the upgrade error handling work so it's a bit
tougher to debug here. I think it should have printed a traceback into the
logs though. Maybe right after it crashes if you check `ceph log last 200
cephadm` there might be something. If not, you might need to do a `ceph mgr
fa
but if
> it's a fix that's appropriate for someone who doesn't know the Ceph
> codebase (me) I'd be happy to have a look at implementing a fix.
>
> Best Wishes,
> Alex
>
> --
> *From:* Adam King
> *Sent:* Monday, August 12, 2
Looking through the code it doesn't seem like this will work currently. I
found that the --data-dir arg to the cephadm binary was from the initial
implementation of the cephadm binary (so early that it was actually called
"ceph-daemon" at the time rather than "cephadm") but it doesn't look like
tha
> rados - Radek, Laura (https://github.com/ceph/ceph/pull/59020 is being
> tested and will be cherry-picked when ready)
>
> rgw - Eric, Adam E
> fs - Venky
> orch - Adam King
> rbd, krbd - Ilya
>
> quincy-x, reef-x - Laura, Neha
>
> powercycle - Brad
> crimson-rad
It might be worth trying to manually upgrade one of the mgr daemons. If you
go to the host with a mgr and edit the
/var/lib/ceph///unit.run so that the image specified
in the long podman/docker run command in there is the 17.2.7 image. Then
just restart its systemd unit (don't tell the orchestrator
If you're using VMs,
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/6X6QIEMWDYSA6XOKEYH5OJ4TIQSBD5BL/
might be relevant
On Tue, Aug 6, 2024 at 3:21 AM Nicola Mori wrote:
> I think I found the problem. Setting the cephadm log level to debug and
> then watching the logs during th
point of bootstrapping?
>
> I confess I don't really understand why this field is not set by the
> docker client running locally. I wonder if I can do anything on the docker
> client side to add a repo digest. I'll explore that a bit.
>
> Thanks,
> Alex
>
> --
ceph-exporter should get deployed by default with new installations on
recent versions, but as a general principle we've avoided adding/removing
services from the cluster during an upgrade. There is perhaps a case for
this service in particular if the user also has the rest of the monitoring
stack
The thing that stands out to me from that output was that the image has no
repo_digests. It's possible cephadm is expecting there to be digests and is
crashing out trying to grab them for this image. I think it's worth a try
to set mgr/cephadm/use_repo_digest to false, and then restart the mgr. FWI
I wouldn't worry about the one the config option gives you right now. The
one on your local repo looks like the same version. For isolated
deployments like this, the default options aren't going to work, as they'll
always point to images that require internet access to pull. I'd just
update the con
To pull quay.io/prometheus/node-exporter:v1.5.0 the nodes would need
access to the internet, yes. I don't fully understand the reason for
> root@node-01:~# ceph config set mgr
> mgr/cephadm/container_image_node_exporter
> quay.io/prometheus/node-exporter:v1.5.0
though. Why not tell it to point t
als:
> smoke - n/a?
> orch - Adam King
> krbd - Ilya
> quincy-x, reef-x - Laura, Neha
> perf-basic - n/a
> crimson-rados - n/a
> ceph-volume - Guillaume
>
> Neha, Laura - I assume we don't plan gibba/LRC upgrade, pls confirm
>
> On Wed, Jul 3, 2024 at 5:55 AM Ven
Weinstein wrote:
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/66756#note-1
>
> Release Notes - TBD
> LRC upgrade - TBD
>
> (Reruns were not done yet.)
>
> Seeking approvals/reviews for:
>
> smoke
> rados - Radek, Laura
>
Interesting. Given this is coming from a radosgw-admin call being done from
within the rgw mgr module, I wonder if a radosgw-admin log file is ending
up in the active mgr container when this happens.
On Wed, Jun 26, 2024 at 9:04 AM Daniel Gryniewicz wrote:
> On 6/25/24 3:21 PM, Matthew Vernon w
I don't remember how connected the dashboard is to the orchestrator in
pacific, but the only thing I could think to do here is just restart it.
(ceph mgr module disable dashboard, ceph mgr module enable dashboard). You
could also totally fail over the mgr (ceph mgr fail) although that might
change
I think this is at least partially a code bug in the rgw module. Where it's
actually failing in the traceback is generating the return message for the
user at the end, because it assumes `created_zones` will always be a list
of strings and that seems to not be the case in any error scenario. That
c
At least for the current up-to-date reef branch (not sure what reef version
you're on) when --image is not provided to the shell, it should try to
infer the image in this order
1. from the CEPHADM_IMAGE env. variable
2. if you pass --name with a daemon name to the shell command, it will
t
- DigitalOcean credits
- things to ask
- what would promotional material require
- how much are credits worth
- Neha to ask
- 19.1.0 centos9 container status
- close to being ready
- will be building centos 8 and 9 containers simultaneously
- should test o
>
> - Although I can mount the export I can't write on it
>
> What error are you getting trying to do the write? The way you set things
up doesn't look to different than one of our integration tests for ingress
over nfs (
https://github.com/ceph/ceph/blob/main/qa/suites/orch/cephadm/smoke-roleless/
FWIW, cephadm uses `quay.io/ceph/ceph-grafana:9.4.7` as the default grafana
image in the quincy branch
On Tue, Apr 23, 2024 at 11:59 AM Osama Elswah
wrote:
> Hi,
>
>
> in quay.io I can find a lot of grafana versions for ceph (
> https://quay.io/repository/ceph/grafana?tab=tags) how can I find ou
ph/ceph/pull/56714>
On Tue, Apr 16, 2024 at 1:39 PM Laura Flores wrote:
> On behalf of @Radoslaw Zarzynski , rados approved.
>
> Below is the summary of the rados suite failures, divided by component. @Adam
> King @Venky Shankar PTAL at the
> orch and cephfs failures to se
es, still trying, Laura PTL
>
> rados - Radek, Laura approved? Travis? Nizamudeen?
>
> rgw - Casey approved?
> fs - Venky approved?
> orch - Adam King approved?
>
> krbd - Ilya approved
> powercycle - seems fs related, Venky, Brad PTL
>
> ceph-volume - will
>> Hi Adam
>>
>> Let me just finish tucking in a devlish tyke here and i’ll get to it
>> first thing
>>
>> tirs. 9. apr. 2024 kl. 18.09 skrev Adam King :
>>
>>> I did end up writing a unit test to see what we calculated here, as well
>>>
t; "memory_total_kb": 32827840,
>
> On Thu, Apr 4, 2024 at 10:14 PM Adam King wrote:
>
>> Sorry to keep asking for more info, but can I also get what `cephadm
>> gather-facts` on that host returns for "memory_total_kb". Might end up
>> creating a
1 running (3w)
> 7m ago 11M2698M4096M 17.2.6
> osd.9my-ceph01 running (3w)
> 7m ago 11M3364M4096M 17.2.6
> prometheus.my-ceph01 my-ceph01 *:9095 running (3w) 7m
> ago 13M 164M-
First, I guess I would make sure that peon7 and peon12 actually could pass
the host check (you can run "cephadm check-host" on the host directly if
you have a copy of the cephadm binary there) Then I'd try a mgr failover
(ceph mgr fail) to clear out any in memory host values cephadm might have
and
https://tracker.ceph.com/issues/64428 should be it. Backports are done for
quincy, reef, and squid and the patch will be present in the next release
for each of those versions. There isn't a pacific backport as, afaik, there
are no more pacific releases planned.
On Fri, Mar 29, 2024 at 6:03 PM Ale
From what I can see with the most recent cephadm binary on pacific, unless
you have the CEPHADM_IMAGE env variable set, it does a `podman images
--filter label=ceph=True --filter dangling=false` (or docker) and takes the
first image in the list. It seems to be getting sorted by creation time by
def
No, you can't use the image id for hte upgrade command, it has to be the
image name. So it should start, based on what you have,
registry.redhat.io/rhceph/. As for the full name, it depends which image
you want to go with. As for trying this on an OSD first, there is `ceph
orch daemon redeploy --i
From the ceph versions output I can see
"osd": {
"ceph version 16.2.10-160.el8cp
(6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160
},
It seems like all the OSD daemons on this cluster are using that
16.2.10-160 image, and I'm guessing most of them are running, so it mu
ording to ceph orch
> ps. Then again, they are nowhere near the values stated in min_size_by_type
> that you list.
> Obviously yes, I could disable the auto tuning, but that would leave me
> none the wiser as to why this exact host is trying to do this.
>
>
>
> On Tue, Mar
For context, the value the autotune goes with takes the value from `cephadm
gather-facts` on the host (the "memory_total_kb" field) and then subtracts
from that per daemon on the host according to
min_size_by_type = {
'mds': 4096 * 1048576,
'mgr': 4096 * 1048576,
'mon':
>
> Hi,
>
> On 3/21/24 14:50, Michael Worsham wrote:
> >
> > Now that Reef v18.2.2 has come out, is there a set of instructions on
> how to upgrade to the latest version via using Cephadm?
>
> Yes, there is: https://docs.ceph.com/en/reef/cephadm/upgrade/
>
Just a note on that docs section, it refe
If you want to be directly setting up the OSDs using ceph-volume commands
(I'll pretty much always recommend following
https://docs.ceph.com/en/latest/cephadm/services/osd/#dedicated-wal-db over
manual ceph-volume stuff in cephadm deployments unless what you're doing
can't be done with the spec fil
When you ran this, was it directly on the host, or did you run `cephadm
shell` first? The two things you tend to need to connect to the cluster
(that "RADOS timed out" error is generally what you get when connecting to
the cluster fails. A bunch of different causes all end with that error) are
a ke
There was a bug with this that was fixed by
https://github.com/ceph/ceph/pull/52122 (which also specifically added an
integration test for this case). It looks like it's missing a reef and
quincy backport though unfortunately. I'll try to open one for both.
On Tue, Mar 5, 2024 at 8:26 AM Eugen Blo
Okay, it seems like from what you're saying the RGW image itself isn't
special compared to the other ceph daemons, it's just that you want to use
the image on your local registry. In that case, I would still recommend
just using `ceph orch upgrade start --image ` with the image
from your local regi
According to https://tracker.ceph.com/issues/58933, that was only
backported as far as reef. If I remember correctly, the reason for that was
the ganehsa version itself we were including in our quincy containers
wasn't new enough to support the feature on that end, so backporting the
nfs/orchestrat
There have been bugs in the past where things have gotten "stuck". Usually
I'd say check the REFRESHED column in the output of `ceph orch ps`. It
should refresh the daemons on each host roughly every 10 minutes, so if you
see some value much larger than that, things are probably actually stuck.
If
>
> - I still have the ceph-crash container, what should I do with it?
>
If it's the old one, I think you can remove it. Cephadm can deploy its own
crash service (`ceph orch apply crash` if it hasn't). You can check if
`crash` is listed under `ceph orch ls` and if it is there you can do `ceph
orch
In regards to
>
> From the reading you gave me I have understood the following :
> 1 - Set osd_memory_target_autotune to true then set
> autotune_memory_target_ratio to 0.2
> 2 - Or do the math. For my setup I have 384Go per node, each node has 4
> nvme disks of 7.6To, 0.2 of memory is 19.5G. So ea
Cephadm does not have some variable that explicitly says it's an HCI
deployment. However, the HCI variable in ceph ansible I believe only
controlled the osd_memory_target attribute, which would automatically set
it to 20% or 70% respectively of the memory on the node divided by the
number of OSDs
It seems the quincy backport for that feature (
https://github.com/ceph/ceph/pull/53098) was merged Oct 1st 2023. According
to the quincy part of
https://docs.ceph.com/en/latest/releases/#release-timeline it looks like
that would mean it would only be present in 17.2.7, but not 17.2.6.
On Wed, Feb
Does seem like a bug, actually in more than just this command. The `ceph
orch host ls` with the --label and/or --host-pattern flag just piggybacks
off of the existing filtering done for placements in service specs. I've
just taken a look and you actually can create the same behavior with the
placem
If you just manually run `ceph orch daemon rm
` does it get removed? I know there's
some logic in host drain that does some ok-to-stop checks that can cause
things to be delayed or stuck if it doesn't think it's safe to remove the
daemon for some reason. I wonder if it's being overly cautious here.
- Build/package PRs- who to best review these?
- Example: https://github.com/ceph/ceph/pull/55218
- Idea: create a GitHub team specifically for these types of PRs
https://github.com/orgs/ceph/teams
- Laura will try to organize people for the group
- Pacific 16.2.15 status
The first handling of nfs exports over rgw in the nfs module, including the
`ceph nfs export create rgw` command, wasn't added to the nfs module in
pacific until 16.2.7.
On Thu, Dec 7, 2023 at 1:35 PM Adiga, Anantha
wrote:
> Hi,
>
>
> oot@a001s016:~# cephadm version
>
> Using recent ceph image c
G N/A
>N/ANo 27m agoHas a FileSystem, Insufficient space (<10
> extents) on vgs, LVM detected
> node3-ceph /dev/xvdb ssd 100G N/A
>N/ANo 27m agoHas a FileSystem, Insufficient space (<10
>
data:
> pools: 0 pools, 0 pgs
> objects: 0 objects, 0 B
> usage: 0 B used, 0 B / 0 B avail
> pgs:
>
> root@node1-ceph:~#
>
> Regards
>
>
>
> On Wed, Nov 29, 2023 at 5:45 PM Adam King wrote:
>
>> I think I remember a bug that happened
I think I remember a bug that happened when there was a small mismatch
between the cephadm version being used for bootstrapping and the container.
In this case, the cephadm binary used for bootstrap knows about the
ceph-exporter service and the container image being used does not. The
ceph-exporter
tart building.
>
> Travis, Adam King - any need to rerun any suites?
>
> On Thu, Nov 16, 2023 at 7:14 AM Guillaume Abrioux
> wrote:
> >
> > Hi Yuri,
> >
> >
> >
> > Backport PR [2] for reef has been merged.
> >
> >
>
t; ran the tests below and asking for approvals:
>
> smoke - Laura
> rados/mgr - PASSED
> rados/dashboard - Nizamudeen
> orch - Adam King
>
> See Build 4 runs - https://tracker.ceph.com/issues/63443#note-1
>
> On Tue, Nov 14, 2023 at 12:21 AM Redouane Kachach
> wrote:
>
> https://tracker.ceph.com/issues/63151 - Adam King do we need anything for
> this?
>
Yes, but not an actual code change in the main ceph repo. I'm looking into
a ceph-container change to alter the ganesha version in the container as a
solution.
On Wed, Nov 8, 2023 at 11:10
ests:
> https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd
>
> Still seeing approvals.
> smoke - Laura, Radek, Prashant, Venky in progress
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey in progress
> fs - Venky
> orch - Adam King
> rb
> Should it be fixed for this release?
>
> Seeking approvals/reviews for:
>
> smoke - Laura
> rados - Laura, Radek, Travis, Ernesto, Adam King
>
> rgw - Casey
> fs - Venky
> orch - Adam King
>
> rbd - Ilya
> krbd - Ilya
>
> upgrade/quincy-p2p - Known is
ites and dropping
that as a build target
- Last Pacific?
- Yes, 17.2.7, then 18.2.1, then 16.2.15 (final)
- PTLs will need to go through and find what backports still need to get
into pacific
- A lot of open pacific backports right no
The CA signed keys working in pacific was sort of accidental. We found out
that it was a working use case in pacific but not in quincy earlier this
year, which resulted in this tracker https://tracker.ceph.com/issues/62009.
That has since been implemented in main, and backported to the reef branch
Looks like the orchestation side support for this got brought into pacific
with the rest of the drive group stuff, but the actual underlying feature
in ceph-volume (from https://github.com/ceph/ceph/pull/40659) never got a
pacific backport. I've opened the backport now
https://github.com/ceph/ceph/
up in the Jenkins api check, where these kinds of
>> conditions are expected. In that case, I would call #1 more of a test
>> issue, and say that the fix is to whitelist the warning for that test.
>> Would be good to have someone from CephFS weigh in though-- @Patrick
>> D
this should be possible by specifying a "data_devices" and "db_devices"
fields in the OSD spec file each with different filters. There's some
examples in the docs
https://docs.ceph.com/en/latest/cephadm/services/osd/#the-simple-case that
show roughly how that's done, and some other sections (
https
it looks like you've hit https://tracker.ceph.com/issues/58946 which has a
candidate fix open, but nothing merged. The description on the PR with the
candidate fix says "When osdspec_affinity is not set, the drive selection
code will fail. This can happen when a device has multiple LVs where some
o
I've seen this before where the ceph-volume process hanging causes the
whole serve loop to get stuck (we have a patch to get it to timeout
properly in reef and are backporting to quincy but nothing for pacific
unfortunately). That's why I was asking about the REFRESHED column in the
orch ps/ orch d
you could maybe try running "ceph config set global container
quay.io/ceph/ceph:v16.2.9" before running the adoption. It seems it still
thinks it should be deploying mons with the default image (
docker.io/ceph/daemon-base:latest-pacific-devel ) for some reason and maybe
that config option is why.
with the log to cluster level already on debug, if you do a "ceph mgr fail"
what does cephadm log to the cluster before it reports sleeping? It should
at least be doing something if it's responsive at all. Also, in "ceph orch
ps" and "ceph orch device ls" are the REFRESHED columns reporting that
t
ein wrote:
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/62231#note-1
>
> Seeking approvals/reviews for:
>
> smoke - Laura, Radek
> rados - Neha, Radek, Travis, Ernesto, Adam King
> rgw - Casey
> fs - Venky
> orch - Adam King
> rbd
Not currently. Those logs aren't generated by any daemons, they come
directly from anything done by the cephadm binary one the host, which tends
to be quite a bit since the cephadm mgr module runs most of its operations
on the host through a copy of the cephadm binary. It doesn't log to journal
bec
hestrator._interface.OrchestratorError: cephadm exited with an error
> code: 1, stderr:Deploy daemon node-exporter.darkside1 ...
> Verifying port 9100 ...
> Cannot bind to IP 0.0.0.0 port 9100: [Errno 98] Address already in use
> ERROR: TCP Port(s) '9100' required for node-exp
The logs you probably really want to look at here are the journal logs from
the mgr and mon. If you have a copy of the cephadm tool on the host, you
can do a "cephadm ls --no-detail | grep systemd" to list out the systemd
unit names for the ceph daemons on the host, or just look find the systemd
un
uot;db_uuid": "CUMgp7-Uscn-ASLo-bh14-7Sxe-80GE-EcywDb",
> > "name": "osd-block-db-5cb8edda-30f9-539f-b4c5-dbe420927911",
> > "osd_fsid": "089894cf-1782-4a3a-8ac0-9dd043f80c71",
> > "osd_id": "7",
> > "
in the "ceph orch device ls --format json-pretty" output, in the blob for
that specific device, is the "ceph_device" field set? There was a bug where
it wouldn't be set at all (https://tracker.ceph.com/issues/57100) and it
would make it so you couldn't use a device serving as a db device for any
fu
Someone hit what I think is this same issue the other day. Do you have a
"config" section in your rgw spec that sets the
"rgw_keystone_implicit_tenants" option to "True" or "true"? For them,
changing the value to be 1 (which should be equivalent to "true" here)
instead of "true" fixed it. Likely an
1 - 100 of 224 matches
Mail list logo