[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-19 Thread Adam King
I think this is at least partially a code bug in the rgw module. Where it's actually failing in the traceback is generating the return message for the user at the end, because it assumes `created_zones` will always be a list of strings and that seems to not be the case in any error scenario. That c

[ceph-users] Re: Phhantom host

2024-06-21 Thread Adam King
I don't remember how connected the dashboard is to the orchestrator in pacific, but the only thing I could think to do here is just restart it. (ceph mgr module disable dashboard, ceph mgr module enable dashboard). You could also totally fail over the mgr (ceph mgr fail) although that might change

[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-26 Thread Adam King
Interesting. Given this is coming from a radosgw-admin call being done from within the rgw mgr module, I wonder if a radosgw-admin log file is ending up in the active mgr container when this happens. On Wed, Jun 26, 2024 at 9:04 AM Daniel Gryniewicz wrote: > On 6/25/24 3:21 PM, Matthew Vernon w

[ceph-users] Re: squid 19.1.0 RC QE validation status

2024-07-01 Thread Adam King
Weinstein wrote: > Details of this release are summarized here: > > https://tracker.ceph.com/issues/66756#note-1 > > Release Notes - TBD > LRC upgrade - TBD > > (Reruns were not done yet.) > > Seeking approvals/reviews for: > > smoke > rados - Radek, Laura >

[ceph-users] Re: squid 19.1.0 RC QE validation status

2024-07-03 Thread Adam King
als: > smoke - n/a? > orch - Adam King > krbd - Ilya > quincy-x, reef-x - Laura, Neha > perf-basic - n/a > crimson-rados - n/a > ceph-volume - Guillaume > > Neha, Laura - I assume we don't plan gibba/LRC upgrade, pls confirm > > On Wed, Jul 3, 2024 at 5:55 AM Ven

[ceph-users] Re: Node Exporter keep failing while upgrading cluster in Air-gapped ( isolated environment ).

2024-07-15 Thread Adam King
To pull quay.io/prometheus/node-exporter:v1.5.0 the nodes would need access to the internet, yes. I don't fully understand the reason for > root@node-01:~# ceph config set mgr > mgr/cephadm/container_image_node_exporter > quay.io/prometheus/node-exporter:v1.5.0 though. Why not tell it to point t

[ceph-users] Re: Node Exporter keep failing while upgrading cluster in Air-gapped ( isolated environment ).

2024-07-16 Thread Adam King
I wouldn't worry about the one the config option gives you right now. The one on your local repo looks like the same version. For isolated deployments like this, the default options aren't going to work, as they'll always point to images that require internet access to pull. I'd just update the con

[ceph-users] Re: Cephadm Offline Bootstrapping Issue

2024-08-02 Thread Adam King
The thing that stands out to me from that output was that the image has no repo_digests. It's possible cephadm is expecting there to be digests and is crashing out trying to grab them for this image. I think it's worth a try to set mgr/cephadm/use_repo_digest to false, and then restart the mgr. FWI

[ceph-users] Re: ceph orchestrator upgrade quincy to reef, missing ceph-exporter

2024-08-02 Thread Adam King
ceph-exporter should get deployed by default with new installations on recent versions, but as a general principle we've avoided adding/removing services from the cluster during an upgrade. There is perhaps a case for this service in particular if the user also has the rest of the monitoring stack

[ceph-users] Re: [EXTERNAL] Re: Cephadm Offline Bootstrapping Issue

2024-08-05 Thread Adam King
point of bootstrapping? > > I confess I don't really understand why this field is not set by the > docker client running locally. I wonder if I can do anything on the docker > client side to add a repo digest. I'll explore that a bit. > > Thanks, > Alex > > --

[ceph-users] Re: Pull failed on cluster upgrade

2024-08-06 Thread Adam King
If you're using VMs, https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/6X6QIEMWDYSA6XOKEYH5OJ4TIQSBD5BL/ might be relevant On Tue, Aug 6, 2024 at 3:21 AM Nicola Mori wrote: > I think I found the problem. Setting the cephadm log level to debug and > then watching the logs during th

[ceph-users] Re: Cephadm: unable to copy ceph.conf.new

2024-08-07 Thread Adam King
It might be worth trying to manually upgrade one of the mgr daemons. If you go to the host with a mgr and edit the /var/lib/ceph///unit.run so that the image specified in the long podman/docker run command in there is the 17.2.7 image. Then just restart its systemd unit (don't tell the orchestrator

[ceph-users] Re: squid 19.1.1 RC QE validation status

2024-08-09 Thread Adam King
> rados - Radek, Laura (https://github.com/ceph/ceph/pull/59020 is being > tested and will be cherry-picked when ready) > > rgw - Eric, Adam E > fs - Venky > orch - Adam King > rbd, krbd - Ilya > > quincy-x, reef-x - Laura, Neha > > powercycle - Brad > crimson-rad

[ceph-users] Re: Cephadm and the "--data-dir" Argument

2024-08-12 Thread Adam King
Looking through the code it doesn't seem like this will work currently. I found that the --data-dir arg to the cephadm binary was from the initial implementation of the cephadm binary (so early that it was actually called "ceph-daemon" at the time rather than "cephadm") but it doesn't look like tha

[ceph-users] Re: [EXTERNAL] Re: Cephadm and the "--data-dir" Argument

2024-08-12 Thread Adam King
but if > it's a fix that's appropriate for someone who doesn't know the Ceph > codebase (me) I'd be happy to have a look at implementing a fix. > > Best Wishes, > Alex > > -- > *From:* Adam King > *Sent:* Monday, August 12, 2

[ceph-users] Re: Cephadm Upgrade Issue

2024-08-14 Thread Adam King
I don't think pacific has the upgrade error handling work so it's a bit tougher to debug here. I think it should have printed a traceback into the logs though. Maybe right after it crashes if you check `ceph log last 200 cephadm` there might be something. If not, you might need to do a `ceph mgr fa

[ceph-users] Re: Cephadm Upgrade Issue

2024-08-14 Thread Adam King
If you're referring to https://tracker.ceph.com/issues/57675, it got into 16.2.14, although there was another issue where running a `ceph orch restart mgr` or `ceph orch redeploy mgr` would cause an endless loop of the mgr daemons restarting, which would block all operations, that might be what we

[ceph-users] Re: squid 19.1.1 RC QE validation status

2024-08-19 Thread Adam King
" which doesn't look very serious anyway, I don't think there's any reason for the failure to hold up the release On Thu, Aug 15, 2024 at 6:53 PM Laura Flores wrote: > The upgrade suites look mostly good to me, except for one tracker I think > would be in @Adam King &

[ceph-users] CLT meeting notes August 19th 2024

2024-08-19 Thread Adam King
- [travisn] Arm64 OSDs crashing on v18.2.4, need a fix in v18.2.5 - https://tracker.ceph.com/issues/67213 - tcmalloc issue, solved by rebuilding the gperftools package - Travis to reach out to Rongqi Sun about the issue - moving away from tcmalloc would probably cause perform

[ceph-users] Re: squid 19.2.0 QE validation status

2024-09-04 Thread Adam King
rade - TBD > > It was decided and agreed upon that there would be limited testing for > this release, given it is based on 19.1.1 rather than a full rebase. > > Seeking approvals/reviews for: > (some reruns are still in progress) > > rgw - Eric, Adam E > fs - Venky > o

[ceph-users] Re: Ceph Pacific mon is not starting after host reboot

2021-08-09 Thread Adam King
Wanted to respond to the original thread I saw archived on this topic but I wasn't subscribed to the mailing list yet so don't have the thread in my inbox to reply to. Hopefully, those involved in that thread still see this. This issue looks the same as https://tracker.ceph.com/issues/51027 which

[ceph-users] Re: cephadm orchestrator not responding after cluster reboot

2021-09-16 Thread Adam King
Does running "ceph mgr fail" then waiting a bit make the "ceph orch" commands responsive? That's worked for me sometimes before when they wouldn't respond. On Thu, Sep 16, 2021 at 8:08 AM Javier Cacheiro wrote: > Hi, > > I have configured a ceph cluster with the new Pacific version (16.2.4) > us

[ceph-users] Re: 16.2.6 CEPHADM_REFRESH_FAILED New Cluster

2021-09-24 Thread Adam King
It looks like the output from a ceph-volume command was too long to handle. If you run "cephadm ceph-volume -- inventory --format=json" (add "--with-lsm" if you've turned on device_enhanced_scan) manually on each host do any of them fail in a similar fashion? On Fri, Sep 24, 2021 at 1:37 PM Marco

[ceph-users] Re: 16.2.6 CEPHADM_REFRESH_FAILED New Cluster

2021-09-27 Thread Adam King
gt; run_until_complete >> return future.result() >> File "/usr/sbin/cephadm", line 1433, in run_with_timeout >> stdout, stderr = await asyncio.gather(tee(process.stdout), >> File "/usr/sbin/cephadm", line 1415, in tee >> async for line i

[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-17 Thread Adam King
> Should it be fixed for this release? > > Seeking approvals/reviews for: > > smoke - Laura > rados - Laura, Radek, Travis, Ernesto, Adam King > > rgw - Casey > fs - Venky > orch - Adam King > > rbd - Ilya > krbd - Ilya > > upgrade/quincy-p2p - Known is

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-07 Thread Adam King
ests: > https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd > > Still seeing approvals. > smoke - Laura, Radek, Prashant, Venky in progress > rados - Neha, Radek, Travis, Ernesto, Adam King > rgw - Casey in progress > fs - Venky > orch - Adam King > rb

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-08 Thread Adam King
> > https://tracker.ceph.com/issues/63151 - Adam King do we need anything for > this? > Yes, but not an actual code change in the main ceph repo. I'm looking into a ceph-container change to alter the ganesha version in the container as a solution. On Wed, Nov 8, 2023 at 11:10 

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-14 Thread Adam King
t; ran the tests below and asking for approvals: > > smoke - Laura > rados/mgr - PASSED > rados/dashboard - Nizamudeen > orch - Adam King > > See Build 4 runs - https://tracker.ceph.com/issues/63443#note-1 > > On Tue, Nov 14, 2023 at 12:21 AM Redouane Kachach > wrote:

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-16 Thread Adam King
tart building. > > Travis, Adam King - any need to rerun any suites? > > On Thu, Nov 16, 2023 at 7:14 AM Guillaume Abrioux > wrote: > > > > Hi Yuri, > > > > > > > > Backport PR [2] for reef has been merged. > > > > >

[ceph-users] Re: error deploying ceph

2023-11-29 Thread Adam King
I think I remember a bug that happened when there was a small mismatch between the cephadm version being used for bootstrapping and the container. In this case, the cephadm binary used for bootstrap knows about the ceph-exporter service and the container image being used does not. The ceph-exporter

[ceph-users] Re: error deploying ceph

2023-11-29 Thread Adam King
data: > pools: 0 pools, 0 pgs > objects: 0 objects, 0 B > usage: 0 B used, 0 B / 0 B avail > pgs: > > root@node1-ceph:~# > > Regards > > > > On Wed, Nov 29, 2023 at 5:45 PM Adam King wrote: > >> I think I remember a bug that happened

[ceph-users] Re: error deploying ceph

2023-11-30 Thread Adam King
G N/A >N/ANo 27m agoHas a FileSystem, Insufficient space (<10 > extents) on vgs, LVM detected > node3-ceph /dev/xvdb ssd 100G N/A >N/ANo 27m agoHas a FileSystem, Insufficient space (<10 >

[ceph-users] Re: nfs export over RGW issue in Pacific

2023-12-07 Thread Adam King
The first handling of nfs exports over rgw in the nfs module, including the `ceph nfs export create rgw` command, wasn't added to the nfs module in pacific until 16.2.7. On Thu, Dec 7, 2023 at 1:35 PM Adiga, Anantha wrote: > Hi, > > > oot@a001s016:~# cephadm version > > Using recent ceph image c

[ceph-users] CLT meeting notes January 24th 2024

2024-01-24 Thread Adam King
- Build/package PRs- who to best review these? - Example: https://github.com/ceph/ceph/pull/55218 - Idea: create a GitHub team specifically for these types of PRs https://github.com/orgs/ceph/teams - Laura will try to organize people for the group - Pacific 16.2.15 status

[ceph-users] Re: Pacific: Drain hosts does not remove mgr daemon

2024-01-31 Thread Adam King
If you just manually run `ceph orch daemon rm ` does it get removed? I know there's some logic in host drain that does some ok-to-stop checks that can cause things to be delayed or stuck if it doesn't think it's safe to remove the daemon for some reason. I wonder if it's being overly cautious here.

[ceph-users] Re: Pacific Bug?

2024-02-14 Thread Adam King
Does seem like a bug, actually in more than just this command. The `ceph orch host ls` with the --label and/or --host-pattern flag just piggybacks off of the existing filtering done for placements in service specs. I've just taken a look and you actually can create the same behavior with the placem

[ceph-users] Re: first_virtual_router_id not allowed in ingress manifest

2024-02-21 Thread Adam King
It seems the quincy backport for that feature ( https://github.com/ceph/ceph/pull/53098) was merged Oct 1st 2023. According to the quincy part of https://docs.ceph.com/en/latest/releases/#release-timeline it looks like that would mean it would only be present in 17.2.7, but not 17.2.6. On Wed, Feb

[ceph-users] Re: Some questions about cephadm

2024-02-21 Thread Adam King
Cephadm does not have some variable that explicitly says it's an HCI deployment. However, the HCI variable in ceph ansible I believe only controlled the osd_memory_target attribute, which would automatically set it to 20% or 70% respectively of the memory on the node divided by the number of OSDs

[ceph-users] Re: Some questions about cephadm

2024-02-26 Thread Adam King
In regards to > > From the reading you gave me I have understood the following : > 1 - Set osd_memory_target_autotune to true then set > autotune_memory_target_ratio to 0.2 > 2 - Or do the math. For my setup I have 384Go per node, each node has 4 > nvme disks of 7.6To, 0.2 of memory is 19.5G. So ea

[ceph-users] Re: Migration from ceph-ansible to Cephadm

2024-02-29 Thread Adam King
> > - I still have the ceph-crash container, what should I do with it? > If it's the old one, I think you can remove it. Cephadm can deploy its own crash service (`ceph orch apply crash` if it hasn't). You can check if `crash` is listed under `ceph orch ls` and if it is there you can do `ceph orch

[ceph-users] Re: Ceph orch doesn't execute commands and doesn't report correct status of daemons

2024-03-01 Thread Adam King
There have been bugs in the past where things have gotten "stuck". Usually I'd say check the REFRESHED column in the output of `ceph orch ps`. It should refresh the daemons on each host roughly every 10 minutes, so if you see some value much larger than that, things are probably actually stuck. If

[ceph-users] Re: [Quincy] NFS ingress mode haproxy-protocol not recognized

2024-03-03 Thread Adam King
According to https://tracker.ceph.com/issues/58933, that was only backported as far as reef. If I remember correctly, the reason for that was the ganehsa version itself we were including in our quincy containers wasn't new enough to support the feature on that end, so backporting the nfs/orchestrat

[ceph-users] Re: Ceph orch doesn't execute commands and doesn't report correct status of daemons

2024-03-03 Thread Adam King
Okay, it seems like from what you're saying the RGW image itself isn't special compared to the other ceph daemons, it's just that you want to use the image on your local registry. In that case, I would still recommend just using `ceph orch upgrade start --image ` with the image from your local regi

[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Adam King
There was a bug with this that was fixed by https://github.com/ceph/ceph/pull/52122 (which also specifically added an integration test for this case). It looks like it's missing a reef and quincy backport though unfortunately. I'll try to open one for both. On Tue, Mar 5, 2024 at 8:26 AM Eugen Blo

[ceph-users] Re: Ceph reef mon is not starting after host reboot

2024-03-06 Thread Adam King
When you ran this, was it directly on the host, or did you run `cephadm shell` first? The two things you tend to need to connect to the cluster (that "RADOS timed out" error is generally what you get when connecting to the cluster fails. A bunch of different causes all end with that error) are a ke

[ceph-users] Re: ceph-volume fails when adding spearate DATA and DATA.DB volumes

2024-03-06 Thread Adam King
If you want to be directly setting up the OSDs using ceph-volume commands (I'll pretty much always recommend following https://docs.ceph.com/en/latest/cephadm/services/osd/#dedicated-wal-db over manual ceph-volume stuff in cephadm deployments unless what you're doing can't be done with the spec fil

[ceph-users] Re: Upgrading from Reef v18.2.1 to v18.2.2

2024-03-21 Thread Adam King
> > Hi, > > On 3/21/24 14:50, Michael Worsham wrote: > > > > Now that Reef v18.2.2 has come out, is there a set of instructions on > how to upgrade to the latest version via using Cephadm? > > Yes, there is: https://docs.ceph.com/en/reef/cephadm/upgrade/ > Just a note on that docs section, it refe

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-26 Thread Adam King
For context, the value the autotune goes with takes the value from `cephadm gather-facts` on the host (the "memory_total_kb" field) and then subtracts from that per daemon on the host according to min_size_by_type = { 'mds': 4096 * 1048576, 'mgr': 4096 * 1048576, 'mon':

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-27 Thread Adam King
ording to ceph orch > ps. Then again, they are nowhere near the values stated in min_size_by_type > that you list. > Obviously yes, I could disable the auto tuning, but that would leave me > none the wiser as to why this exact host is trying to do this. > > > > On Tue, Mar

[ceph-users] Re: Failed adding back a node

2024-03-27 Thread Adam King
From the ceph versions output I can see "osd": { "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160 }, It seems like all the OSD daemons on this cluster are using that 16.2.10-160 image, and I'm guessing most of them are running, so it mu

[ceph-users] Re: Failed adding back a node

2024-03-28 Thread Adam King
No, you can't use the image id for hte upgrade command, it has to be the image name. So it should start, based on what you have, registry.redhat.io/rhceph/. As for the full name, it depends which image you want to go with. As for trying this on an OSD first, there is `ceph orch daemon redeploy --i

[ceph-users] Re: cephadm shell version not consistent across monitors

2024-04-02 Thread Adam King
From what I can see with the most recent cephadm binary on pacific, unless you have the CEPHADM_IMAGE env variable set, it does a `podman images --filter label=ceph=True --filter dangling=false` (or docker) and takes the first image in the list. It seems to be getting sorted by creation time by def

[ceph-users] Re: Pacific Bug?

2024-04-02 Thread Adam King
https://tracker.ceph.com/issues/64428 should be it. Backports are done for quincy, reef, and squid and the patch will be present in the next release for each of those versions. There isn't a pacific backport as, afaik, there are no more pacific releases planned. On Fri, Mar 29, 2024 at 6:03 PM Ale

[ceph-users] Re: CEPHADM_HOST_CHECK_FAILED

2024-04-04 Thread Adam King
First, I guess I would make sure that peon7 and peon12 actually could pass the host check (you can run "cephadm check-host" on the host directly if you have a copy of the cephadm binary there) Then I'd try a mgr failover (ceph mgr fail) to clear out any in memory host values cephadm might have and

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-04 Thread Adam King
1 running (3w) > 7m ago 11M2698M4096M 17.2.6 > osd.9my-ceph01 running (3w) > 7m ago 11M3364M4096M 17.2.6 > prometheus.my-ceph01 my-ceph01 *:9095 running (3w) 7m > ago 13M 164M-

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-09 Thread Adam King
t; "memory_total_kb": 32827840, > > On Thu, Apr 4, 2024 at 10:14 PM Adam King wrote: > >> Sorry to keep asking for more info, but can I also get what `cephadm >> gather-facts` on that host returns for "memory_total_kb". Might end up >> creating a

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-09 Thread Adam King
>> Hi Adam >> >> Let me just finish tucking in a devlish tyke here and i’ll get to it >> first thing >> >> tirs. 9. apr. 2024 kl. 18.09 skrev Adam King : >> >>> I did end up writing a unit test to see what we calculated here, as well >>>

[ceph-users] Re: reef 18.2.3 QE validation status

2024-04-14 Thread Adam King
es, still trying, Laura PTL > > rados - Radek, Laura approved? Travis? Nizamudeen? > > rgw - Casey approved? > fs - Venky approved? > orch - Adam King approved? > > krbd - Ilya approved > powercycle - seems fs related, Venky, Brad PTL > > ceph-volume - will

[ceph-users] Re: reef 18.2.3 QE validation status

2024-04-16 Thread Adam King
ph/ceph/pull/56714> On Tue, Apr 16, 2024 at 1:39 PM Laura Flores wrote: > On behalf of @Radoslaw Zarzynski , rados approved. > > Below is the summary of the rados suite failures, divided by component. @Adam > King @Venky Shankar PTAL at the > orch and cephfs failures to se

[ceph-users] Re: which grafana version to use with 17.2.x ceph version

2024-04-23 Thread Adam King
FWIW, cephadm uses `quay.io/ceph/ceph-grafana:9.4.7` as the default grafana image in the quincy branch On Tue, Apr 23, 2024 at 11:59 AM Osama Elswah wrote: > Hi, > > > in quay.io I can find a lot of grafana versions for ceph ( > https://quay.io/repository/ceph/grafana?tab=tags) how can I find ou

[ceph-users] Re: ceph recipe for nfs exports

2024-04-24 Thread Adam King
> > - Although I can mount the export I can't write on it > > What error are you getting trying to do the write? The way you set things up doesn't look to different than one of our integration tests for ingress over nfs ( https://github.com/ceph/ceph/blob/main/qa/suites/orch/cephadm/smoke-roleless/

[ceph-users] CLT meeting notes May 6th 2024

2024-05-06 Thread Adam King
- DigitalOcean credits - things to ask - what would promotional material require - how much are credits worth - Neha to ask - 19.1.0 centos9 container status - close to being ready - will be building centos 8 and 9 containers simultaneously - should test o

[ceph-users] Re: cephadm basic questions: image config, OS reimages

2024-05-16 Thread Adam King
At least for the current up-to-date reef branch (not sure what reef version you're on) when --image is not provided to the shell, it should try to infer the image in this order 1. from the CEPHADM_IMAGE env. variable 2. if you pass --name with a daemon name to the shell command, it will t

[ceph-users] Re: cephadm host maintenance

2022-07-13 Thread Adam King
n before running the maintenance enter command is necessary. Regards, - Adam King On Wed, Jul 13, 2022 at 11:02 AM Steven Goodliff < steven.goodl...@globalrelay.net> wrote: > > Hi, > > > I'm trying to reboot a ceph cluster one instance at a time by running in a > Ansible

[ceph-users] Re: octopus v15.2.17 QE Validation status

2022-07-25 Thread Adam King
orch approved. The test_cephadm_repos test failure is just a problem with the test I believe, not any actual ceph code. The other selinux denial I don't think is new. Thanks, - Adam King On Sun, Jul 24, 2022 at 11:33 AM Yuri Weinstein wrote: > Still seeking approvals for: > >

[ceph-users] Re: [Warning Possible spam] Re: Issues after a shutdown

2022-07-25 Thread Adam King
Do the journal logs for any of the OSDs that are marked down give any useful info on why they're failing to start back up? If the host level ip issues have gone away I think that would be the next place to check. On Mon, Jul 25, 2022 at 5:03 PM Jeremy Hansen wrote: > I noticed this on the initia

[ceph-users] Re: 1 stray daemon(s) not managed by cephadm

2022-07-25 Thread Adam King
Usually it's pretty explicit in "ceph health detail". What does it say there? On Mon, Jul 25, 2022 at 9:05 PM Jeremy Hansen wrote: > How do I track down what is the stray daemon? > > Thanks > -jeremy > ___ > ceph-users mailing list -- ceph-users@ceph.i

[ceph-users] Re: 17.2.2: all MGRs crashing in fresh cephadm install

2022-07-27 Thread Adam King
the unit.image file is just there for cpehadm to look at as part of gathering metadata I think. What you'd want to edit is the unit.run file (in the same directory as the unit.image). It should have a really long line specifying a podman/docker run command and somewhere in there will be "CONTAINER_

[ceph-users] Re: 17.2.2: all MGRs crashing in fresh cephadm install

2022-07-27 Thread Adam King
mon with the new image? at least > this is something I did in our testing here[1]. > > ceph orch daemon redeploy mgr. > quay.ceph.io/ceph-ci/ceph:f516549e3e4815795ff0343ab71b3ebf567e5531 > > [1] https://github.com/ceph/ceph/pull/47270#issuecomment-1196062363 > > On Wed, Jul 27

[ceph-users] Re: 17.2.2: all MGRs crashing in fresh cephadm install

2022-07-27 Thread Adam King
0f0bc2c6791f > > because of this I can't run a "ceph orch upgrade" because it always > complains about having only one. > Is there something else that needs to be changed to get the cluster to a > normal state? > > Thanks! > > On Wed, 2022-07-27 at 12:23 -04

[ceph-users] Re: 17.2.2: all MGRs crashing in fresh cephadm install

2022-07-28 Thread Adam King
looks like it's not. Even > on the dashboard the daemon > shows as errored but it's running (confirmed via podman and systemctl). > My take is that something is not communicating some information with > "cephadm" but I don't know > what. ceph itself knows the m

[ceph-users] Re: deploy ceph cluster in isolated environment -- NO INTERNET

2022-07-31 Thread Adam King
Cephadm has a config option to say whether to use the repo digest or the tag name. If you want it to use tags "ceph config set mgr mgr/cephadm/use_repo_digest false" should make that happen (it defaults to true/using the digest). Beyond that, it's possible you may need to change the config option f

[ceph-users] Re: unable to calc client keyring: No matching hosts for label

2022-08-02 Thread Adam King
It's possible there's a bug in cephadm around placements where hosts have the _no_schedule label. There was https://tracker.ceph.com/issues/56972 recently for an issue with how _no_schedule interacts with placements using explicit hostnames. It might be something similar here where it thinks there'

[ceph-users] Re: unable to calc client keyring: No matching hosts for label

2022-08-02 Thread Adam King
keyring and ceph.conf files. > > Should I open a bug somewhere? > > On Tue, 2022-08-02 at 08:39 -0400, Adam King wrote: > > It's possible there's a bug in cephadm around placements where hosts > have the _no_schedule label. > > There was https://tracker.ceph.c

[ceph-users] Re: Issue adding host with cephadm - nothing is deployed

2022-08-18 Thread Adam King
If you try shuffling some daemon around on some of the working hosts (e.g. changing the placement of the node-exporter spec so that one of the working hosts is excluded so the node-exporter there should be removed) is cephadm able to actually complete that? Also, does device info for any or all of

[ceph-users] Re: Issue adding host with cephadm - nothing is deployed

2022-08-18 Thread Adam King
ize: 3 and running: > 3) > > I'm running ceph -W cephadm with log_to_cluster_level set to debug, but > except for the walls of text with the inventories, nothing (except > _kick_service_loop) shows up in the logs after the INF level messages that > host has been added or service sp

[ceph-users] Re: cephadm logrotate conflict

2022-08-25 Thread Adam King
FWIW, cephadm only writes that file out if it doesn't exist entirely. You might be able to just remove anything actional functional from it and just leave a sort of dummy file with only a comment there as a workaround. Also, was this an upgraded cluster? I tried quickly bootstrapping a cephadm clus

[ceph-users] Re: cephadm logrotate conflict

2022-08-25 Thread Adam King
You were correct about the difference between the distros. Was able to reproduce fine on ubuntu 20.04 (was using centos 8.stream before). I opened a tracker as well https://tracker.ceph.com/issues/57293 On Thu, Aug 25, 2022 at 7:44 AM Robert Sander wrote: > Am 25.08.22 um 13:41 schrieb A

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Adam King
Are there any extra directories in /var/lib/ceph or /var/lib/ceph/ that appear to be for those OSDs on that host? When cephadm builds the info it uses for "ceph orch ps" it's actually scraping those directories. The output of "cephadm ls" on the host with the duplicates could also potentially have

[ceph-users] Re: cephadm upgrade from octopus to pasific stuck

2022-09-01 Thread Adam King
Does "ceph orch upgrade status" give any insights (e.g. an error message of some kind)? If not, maybe you could try looking at https://tracker.ceph.com/issues/56485#note-2 because it seems like a similar issue and I see you're using --ceph-version (which we need to fix, sorry about that). On Wed,

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Adam King
"container_image_name": "quay.io/ceph/ceph:v15", > "container_image_id": null, > "version": null, > "started": null, > "created": "2022-08-19T03:36:22.815608Z", > "

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Adam King
> ceph-f270ad9e-1f6f-11ed-b6f8-a539d87379ea@mgr.ceph1.xmbvsb.service" and > "journalctl -xe" for details. > Traceback (most recent call last): > File "/usr/sbin/cephadm", line 6250, in > r = args.func() > File "/usr/sbin/cephadm", line 1357, i

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Adam King
names, how does > cephadm decide naming? > > > https://achchusnulchikam.medium.com/deploy-ceph-cluster-with-cephadm-on-centos-8-257b300e7b42 > > On Thu, Sep 1, 2022 at 6:20 PM Satish Patel wrote: > >> Hi Adam, >> >> Getting the following error, not sure why i

[ceph-users] Re: [cephadm] Found duplicate OSDs

2022-09-01 Thread Adam King
you see i have ceph1 two time. :( > > 10.73.0.191 ceph1.example.com ceph1 > 10.73.0.192 ceph2.example.com ceph1 > > On Thu, Sep 1, 2022 at 8:06 PM Adam King wrote: > >> the naming for daemons is a bit different for each daemon type, but for >> mgr daemons it's alwa

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Adam King
this looks like an old traceback you would get if you ended up with a service type that shouldn't be there somehow. The things I'd probably check are that "cephadm ls" on either host definitely doesn't report and strange things that aren't actually daemons in your cluster such as "cephadm.". Anothe

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Adam King
emove didn't work > > root@ceph1:~# ceph orch rm cephadm > Failed to remove service. was not found. > > root@ceph1:~# ceph orch rm > cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d > Failed to remove service. > > was not found. > > O

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Adam King
8-a539d87379ea/cephadm.7ce656a8721deb5054c37b0cfb90381522d521dde51fb0c5a2142314d663f63d > > But still getting the same error, do i need to do anything else? > > On Fri, Sep 2, 2022 at 9:51 AM Adam King wrote: > >> Okay, I'm wondering if this is an issue with version mismatch.

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Adam King
43f 3c963693ff2b >>>>> grafana.ceph1 >>>>> ceph1 running (9h) 64s ago2w 6.7.4 >>>>> quay.io/ceph/ceph-grafana:6.7.4 >>>>> 557c83e11646 7583a8dc4c61 >>>>> mgr.ceph1.smfvfd >>>

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Adam King
13 months ago > 486MB > quay.io/prometheus/prometheus v2.18.1 de242295e225 2 years ago > 140MB > quay.io/prometheus/alertmanagerv0.20.0 0881eb8f169f 2 years ago > 52.1MB > quay.io/prometheus/node-exporter v0.18.1 e5a616e4b9cf 3 years ago > 22.9MB > &

[ceph-users] Re: [cephadm] mgr: no daemons active

2022-09-02 Thread Adam King
mgr.ceph2.huidoh (mgr.344392) 211206 : >> cephadm [DBG] 0 OSDs are scheduled for removal: [] >> 2022-09-02T18:38:21.762480+ mgr.ceph2.huidoh (mgr.344392) 211207 : >> cephadm [DBG] Saving [] to store >> >> On Fri, Sep 2, 2022 at 12:17 PM Adam King wrote: >> >

[ceph-users] Re: quincy v17.2.4 QE Validation status

2022-09-14 Thread Adam King
orch suite failures fall under https://tracker.ceph.com/issues/49287 https://tracker.ceph.com/issues/57290 https://tracker.ceph.com/issues/57268 https://tracker.ceph.com/issues/52321 For rados/cephadm the failures are both https://tracker.ceph.com/issues/57290 Overall, nothing new/unexpected. orc

[ceph-users] CLT meeting summary 2022-09-28

2022-09-28 Thread Adam King
Budget Discussion - Going to investigate current resources being used, see if any costs can be cut - What can be moved from virtual environments to internal ones? - Need to take inventory of what resources we currently have and what their costs are 17.2.4 - Gibba and LRC cluste

[ceph-users] Re: Cephadm migration

2022-10-14 Thread Adam King
For the weird image, perhaps just "ceph orch daemon redeploy rgw.testrgw.svtcephrgwv1.invwmo --image quay.io/ceph/ceph:v16.2.10" will resolve it. Not sure about the other things wrong with it yet but I think the image should be fixed before looking into that. On Fri, Oct 14, 2022 at 5:47 AM Jean-M

[ceph-users] Re: Cephadm - Adding host to migrated cluster

2022-10-17 Thread Adam King
Do the journal logs for the OSDs say anything about why they couldn't start up? ("cephadm ls --no-detail" run on the host will give the systemd units for each daemon on the host so you can get them easier). On Mon, Oct 17, 2022 at 1:37 PM Brent Kennedy wrote: > Below is what the ceph mgr log is

[ceph-users] Re: Cephadm container configurations

2022-10-25 Thread Adam King
If you're using a fairly recent cephadm version, there is the ability to provide miscellaneous container arguments in the service spec https://docs.ceph.com/en/quincy/cephadm/services/#extra-container-arguments. This means you can have cephadm deploy each container in that service with, for example

[ceph-users] Re: cephadm node-exporter extra_container_args for textfile_collector

2022-10-28 Thread Adam King
We had actually considered adding an `extra_daemon_args` to be the equivalent to `extra_container_args` but for the daemon itself rather than a flag for the podman/docker run command. IIRC we thought it was a good idea but nobody actually pushed to add it in then since (at the time) we weren't awar

[ceph-users] Re: Issues upgrading cephadm cluster from Octopus.

2022-11-19 Thread Adam King
I don't know for sure if it will fix the issue, but the migrations happen based on a config option "mgr/cephadm/migration_current". You could try setting that back to 0 and it would at least trigger the migrations to happen again after restarting/failing over the mgr. They're meant to be idempotent

[ceph-users] Re: Issues upgrading cephadm cluster from Octopus.

2022-11-19 Thread Adam King
d been trying to do before. On Sat, Nov 19, 2022 at 8:05 AM Adam King wrote: > I don't know for sure if it will fix the issue, but the migrations happen > based on a config option "mgr/cephadm/migration_current". You could try > setting that back to 0 and it would at leas

[ceph-users] Re: osd removal leaves 'stray daemon'

2022-11-30 Thread Adam King
I typically don't see this when I do OSD replacement. If you do a mgr failover ("ceph mgr fail") and wait a few minutes does this still show up? The stray daemon/host warning is roughly equivalent to comparing the daemons in `ceph node ls` and `ceph orch ps` and seeing if there's anything in the fo

[ceph-users] Re: How to replace or add a monitor in stretch cluster?

2022-12-02 Thread Adam King
This can't be done in a very nice way currently. There's actually an open PR against main to allow setting the crush location for mons in the service spec specifically because others found that this was annoying as well. What I think should work as a workaround is, go to the host where the mon that

  1   2   3   >