[ceph-users] Re: Squid: successfully drained host can't be removed

2025-07-25 Thread Adam King
The daemons cephadm "knows" about is actually just based on the contents of the /var/lib/ceph// directory on each given host cephadm is managing. If osd.6 was present, got removed by the host drain process, and then its daemon directory was still on the host / there was still a container running fo

[ceph-users] Re: squid 19.2.2 - cannot bootstrap - error writing to /tmp/monmap (21) Is a directory

2025-07-24 Thread Adam King
I can't say I know why this is happening, but I can try to give some context into what cephadm is doing here in case it helps give something to look at. This is when cephadm creates the initial monmap. When we do so we write a python "NamedTemporaryFile" and then mount that into a container that co

[ceph-users] Re: squid 19.2.3 QE validation status

2025-07-03 Thread Adam King
ote-1 > > > > Release Notes - TBD > > LRC upgrade - TBD > > > > Seeking approvals/reviews for: > > > > rados - Radek, Laura > > rgw- Adam Emerson > > fs - Venky > > orch - Adam King > > rbd, krbd - Ilya > > quincy-x, reef-x - Laura,

[ceph-users] Re: [cephadm] Questions regarding cephadm infra-as-code philosophy, containerization, and trixie availability

2025-05-15 Thread Adam King
> > > However, in practice, > > many operations (e.g., using ceph-bluestore-tool > > Using that tool, to be fair, should be rare. Notably that tool requires > that the OSD on which it operates not be running. I would think it might > be possible to enter an OSD container and kill the ceph-osd pro

[ceph-users] Re: Ceph reef ingress service - v4v6_flag undefined

2025-05-07 Thread Adam King
; as some people reported that it will help solve NFS HA issue ( e.g. > haproxy,cfg deployed missing "check") > > Now neither NFS nor RGW works :-( > > How do I fix this ? > > thanks > Steven > > > https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/te

[ceph-users] Re: Ceph reef ingress service - v4v6_flag undefined

2025-05-07 Thread Adam King
That flag got added to cephadm's haproxy template as part of https://github.com/ceph/ceph/pull/61833. I'm very confused as to how you're seeing it affect reef though, as we never backported it. It doesn't seem to exist at all in the reef branch when I checked adking@fedora:~/orch-ceph/ceph/src$ gi

[ceph-users] Re: Pull from local registry fails

2025-04-24 Thread Adam King
You could try setting `ceph config set mgr mgr/cephadm/use_repo_digest false` and give it another go. I've seen some issues in the past with using the image digest with local repos. On Thu, Apr 24, 2025 at 10:15 AM Sake Ceph wrote: > We're used a local registry (docker registry), but needed to s

[ceph-users] April 21st CSC Meeting Notes

2025-04-21 Thread Adam King
[Matt] required github checks (and make check) instability (and long run times) greatly hurt developer productivity - thanks for renewed attention to this from several folks - can we break up make check into component wise portions - can we take ceph api test out of the CI checks - it also r

[ceph-users] Re: reef 18.2.6 hotfix QE validation status

2025-04-16 Thread Adam King
ues/70938#note-1 > Release Notes - TBD > LRC upgrade - N/A > > Seeking approvals/reviews for: > > smoke - same as in 18.2.5 > rados - Radek, Laura approved? > orch - Adam King, Guillaume approved? > > This release has two PRs: > https://github.com/ceph/ceph/pull/62791

[ceph-users] Re: reef 18.2.5 QE validation status

2025-04-05 Thread Adam King
.ceph.com/issues/70563#note-1 > Release Notes - TBD > LRC upgrade - TBD > > Seeking approvals/reviews for: > > smoke - Laura approved? > > rados - Radek, Laura approved? Travis? Nizamudeen? Adam King approved? > > rgw - Adam E approved? > > fs - Venky is fixing QA su

[ceph-users] Re: Reef: highly-available NFS with keepalive_only

2025-03-25 Thread Adam King
. So this doesn't really > work as a workaround, it seems. I feel like the proper solution would > be to include keepalive in the list of > RESCHEDULE_FROM_OFFLINE_HOSTS_TYPES. > > Zitat von Adam King : > > > Which daemons get moved around like that is controll

[ceph-users] Re: Reef: highly-available NFS with keepalive_only

2025-03-25 Thread Adam King
Which daemons get moved around like that is controlled by https://github.com/ceph/ceph/blob/main/src/pybind/mgr/cephadm/utils.py#L30, which appears to only include nfs and haproxy, so maybe this keepalive only case was missed in that sense. I do think that you could alter the placement of the ingre

[ceph-users] Re: ceph orch command not working anymore on squid (19.2.1)

2025-03-12 Thread Adam King
Regarding the ValueError: "'xwork.MethodAccessor.denyMethodExecution'" does not appear to be an IPv4 or IPv6 address can you check `ceph config-key get mgr/cephadm/inventory` and see if you see something related to that (such as "'xwork.MethodAccessor.denyMethodExecution'" being present as the ad

[ceph-users] Re: ceph iscsi gateway

2025-02-10 Thread Adam King
ISCSI is still being used in the LRC (long running cluster) that is a storage backend for parts of the ceph team's infrastructure, so I don't think it's going to disappear in the near future. I believe the plan is to eventually swap over to nvmeof instead ( https://docs.ceph.com/en/reef/rbd/nvmeof-

[ceph-users] Re: ceph orch upgrade tries to pull latest?

2025-01-08 Thread Adam King
It looks like the "resource not found" message is being directly output by podman. Is there anything in the cephadm.log (/var/log/ceph/cephadm.log) on one of the hosts where this is happening that says what podman command cephadm was running that hit this error? On Wed, Jan 8, 2025 at 5:27 AM tobi

[ceph-users] Re: squid 19.2.1 RC QE validation status

2025-01-02 Thread Adam King
ckers for failures so we avoid duplicates. > Seeking approvals/reviews for: > > rados - Radek, Laura > rgw - Eric, Adam E > fs - Venky > orch - Adam King > rbd, krbd - Ilya > > quincy-x, reef-x - Laura, Neha > > crimson-rados - Matan, Samuel > > ceph-volume

[ceph-users] Re: Centos 9 updates break Reef MGR

2024-11-19 Thread Adam King
Given the reference to that cherrypy backports stuff in the traceback, I'll just mention we are in the process of removing that from the code as we've seen issues with it in our testing as well ( https://github.com/ceph/ceph/pull/60602 / https://tracker.ceph.com/issues/68802). We want that patch in

[ceph-users] Re: [External Email] Re: Recreate Destroyed OSD

2024-11-06 Thread Adam King
Quick comment on the CLI argos vs. the spec file. It actually shouldn't allow you to do both for any flags that actually affect the service. If you run `ceph orch apply -i ` it will only make use of the spec file and should return an error if flags that affect the service like `--unamanged` or `--p

[ceph-users] Re: Unable to add OSD

2024-11-06 Thread Adam King
I see you mentioned apparmor and MongoDB, so I guess there's a chance you found https://tracker.ceph.com/issues/66389 already (your traceback also looks the same). Other than making sure that relevant apparmor file it's parsing doesn't contain settings with spaces or trying to manually apply the fi

[ceph-users] Re: MDS and stretched clusters

2024-10-31 Thread Adam King
Just noticed this thread. A couple questions. Is what we want to have MDS daemons in say zone A and zone B, but the ones in zone A are prioritized to be active and ones in zone B remain as standby unless absolutely necessary (all the ones in zone A are down) or is it that we want to have some subse

[ceph-users] Re: cephadm bootstrap ignoring --skip-firewalld

2024-10-16 Thread Adam King
Where did the copy of cephadm you're using for the bootstrap come from? I'm aware of a bug around that flag (https://tracker.ceph.com/issues/54137) but that fix should have come in some time ago. I've seen some people, especially if they're using the distros version of the cephadm package, end up w

[ceph-users] Re: v19.2.0 Squid released

2024-09-27 Thread Adam King
d nfs, it should now be safe to perform this upgrade. On Fri, Sep 27, 2024 at 11:40 AM Adam King wrote: > WARNING, if you're using cephadm and nfs please don't upgrade to this > release for the time being. There are compatibility issues with cephadm's > deployment of

[ceph-users] Re: v19.2.0 Squid released

2024-09-27 Thread Adam King
WARNING, if you're using cephadm and nfs please don't upgrade to this release for the time being. There are compatibility issues with cephadm's deployment of the NFS daemon and ganesha v6 which made its way into the release container. On Thu, Sep 26, 2024 at 6:20 PM Laura Flores wrote: > We're v

[ceph-users] Re: Help with cephadm bootstrap and ssh private key location

2024-09-23 Thread Adam King
Cybersecurity and Information Assurance > 4 Brindabella Cct > Brindabella Business Park > Canberra Airport, ACT 2609 > > www.raytheonaustralia.com.au > LinkedIn | Twitter | Facebook | Instagram > > -Original Message- > From: Adam King > Sent: Monday, September 23, 202

[ceph-users] Re: Help with cephadm bootstrap and ssh private key location

2024-09-22 Thread Adam King
Cephadm stored the key internally within the cluster and it can be grabbed with `ceph config-key get mgr/cephadm/ssh_identity_key`. As for if you already have keys setup, I'd recommend passing filepaths to those keys to the `--ssh-private-key` and `--ssh-public-key` flags the bootstrap command has

[ceph-users] Re: squid 19.2.0 QE validation status

2024-09-04 Thread Adam King
rade - TBD > > It was decided and agreed upon that there would be limited testing for > this release, given it is based on 19.1.1 rather than a full rebase. > > Seeking approvals/reviews for: > (some reruns are still in progress) > > rgw - Eric, Adam E > fs - Venky > o

[ceph-users] CLT meeting notes August 19th 2024

2024-08-19 Thread Adam King
- [travisn] Arm64 OSDs crashing on v18.2.4, need a fix in v18.2.5 - https://tracker.ceph.com/issues/67213 - tcmalloc issue, solved by rebuilding the gperftools package - Travis to reach out to Rongqi Sun about the issue - moving away from tcmalloc would probably cause perform

[ceph-users] Re: squid 19.1.1 RC QE validation status

2024-08-19 Thread Adam King
" which doesn't look very serious anyway, I don't think there's any reason for the failure to hold up the release On Thu, Aug 15, 2024 at 6:53 PM Laura Flores wrote: > The upgrade suites look mostly good to me, except for one tracker I think > would be in @Adam King &

[ceph-users] Re: Cephadm Upgrade Issue

2024-08-14 Thread Adam King
If you're referring to https://tracker.ceph.com/issues/57675, it got into 16.2.14, although there was another issue where running a `ceph orch restart mgr` or `ceph orch redeploy mgr` would cause an endless loop of the mgr daemons restarting, which would block all operations, that might be what we

[ceph-users] Re: Cephadm Upgrade Issue

2024-08-14 Thread Adam King
I don't think pacific has the upgrade error handling work so it's a bit tougher to debug here. I think it should have printed a traceback into the logs though. Maybe right after it crashes if you check `ceph log last 200 cephadm` there might be something. If not, you might need to do a `ceph mgr fa

[ceph-users] Re: [EXTERNAL] Re: Cephadm and the "--data-dir" Argument

2024-08-12 Thread Adam King
but if > it's a fix that's appropriate for someone who doesn't know the Ceph > codebase (me) I'd be happy to have a look at implementing a fix. > > Best Wishes, > Alex > > -- > *From:* Adam King > *Sent:* Monday, August 12, 2

[ceph-users] Re: Cephadm and the "--data-dir" Argument

2024-08-12 Thread Adam King
Looking through the code it doesn't seem like this will work currently. I found that the --data-dir arg to the cephadm binary was from the initial implementation of the cephadm binary (so early that it was actually called "ceph-daemon" at the time rather than "cephadm") but it doesn't look like tha

[ceph-users] Re: squid 19.1.1 RC QE validation status

2024-08-09 Thread Adam King
> rados - Radek, Laura (https://github.com/ceph/ceph/pull/59020 is being > tested and will be cherry-picked when ready) > > rgw - Eric, Adam E > fs - Venky > orch - Adam King > rbd, krbd - Ilya > > quincy-x, reef-x - Laura, Neha > > powercycle - Brad > crimson-rad

[ceph-users] Re: Cephadm: unable to copy ceph.conf.new

2024-08-07 Thread Adam King
It might be worth trying to manually upgrade one of the mgr daemons. If you go to the host with a mgr and edit the /var/lib/ceph///unit.run so that the image specified in the long podman/docker run command in there is the 17.2.7 image. Then just restart its systemd unit (don't tell the orchestrator

[ceph-users] Re: Pull failed on cluster upgrade

2024-08-06 Thread Adam King
If you're using VMs, https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/6X6QIEMWDYSA6XOKEYH5OJ4TIQSBD5BL/ might be relevant On Tue, Aug 6, 2024 at 3:21 AM Nicola Mori wrote: > I think I found the problem. Setting the cephadm log level to debug and > then watching the logs during th

[ceph-users] Re: [EXTERNAL] Re: Cephadm Offline Bootstrapping Issue

2024-08-05 Thread Adam King
point of bootstrapping? > > I confess I don't really understand why this field is not set by the > docker client running locally. I wonder if I can do anything on the docker > client side to add a repo digest. I'll explore that a bit. > > Thanks, > Alex > > --

[ceph-users] Re: ceph orchestrator upgrade quincy to reef, missing ceph-exporter

2024-08-02 Thread Adam King
ceph-exporter should get deployed by default with new installations on recent versions, but as a general principle we've avoided adding/removing services from the cluster during an upgrade. There is perhaps a case for this service in particular if the user also has the rest of the monitoring stack

[ceph-users] Re: Cephadm Offline Bootstrapping Issue

2024-08-02 Thread Adam King
The thing that stands out to me from that output was that the image has no repo_digests. It's possible cephadm is expecting there to be digests and is crashing out trying to grab them for this image. I think it's worth a try to set mgr/cephadm/use_repo_digest to false, and then restart the mgr. FWI

[ceph-users] Re: Node Exporter keep failing while upgrading cluster in Air-gapped ( isolated environment ).

2024-07-16 Thread Adam King
I wouldn't worry about the one the config option gives you right now. The one on your local repo looks like the same version. For isolated deployments like this, the default options aren't going to work, as they'll always point to images that require internet access to pull. I'd just update the con

[ceph-users] Re: Node Exporter keep failing while upgrading cluster in Air-gapped ( isolated environment ).

2024-07-15 Thread Adam King
To pull quay.io/prometheus/node-exporter:v1.5.0 the nodes would need access to the internet, yes. I don't fully understand the reason for > root@node-01:~# ceph config set mgr > mgr/cephadm/container_image_node_exporter > quay.io/prometheus/node-exporter:v1.5.0 though. Why not tell it to point t

[ceph-users] Re: squid 19.1.0 RC QE validation status

2024-07-03 Thread Adam King
als: > smoke - n/a? > orch - Adam King > krbd - Ilya > quincy-x, reef-x - Laura, Neha > perf-basic - n/a > crimson-rados - n/a > ceph-volume - Guillaume > > Neha, Laura - I assume we don't plan gibba/LRC upgrade, pls confirm > > On Wed, Jul 3, 2024 at 5:55 AM Ven

[ceph-users] Re: squid 19.1.0 RC QE validation status

2024-07-01 Thread Adam King
Weinstein wrote: > Details of this release are summarized here: > > https://tracker.ceph.com/issues/66756#note-1 > > Release Notes - TBD > LRC upgrade - TBD > > (Reruns were not done yet.) > > Seeking approvals/reviews for: > > smoke > rados - Radek, Laura >

[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-26 Thread Adam King
Interesting. Given this is coming from a radosgw-admin call being done from within the rgw mgr module, I wonder if a radosgw-admin log file is ending up in the active mgr container when this happens. On Wed, Jun 26, 2024 at 9:04 AM Daniel Gryniewicz wrote: > On 6/25/24 3:21 PM, Matthew Vernon w

[ceph-users] Re: Phhantom host

2024-06-21 Thread Adam King
I don't remember how connected the dashboard is to the orchestrator in pacific, but the only thing I could think to do here is just restart it. (ceph mgr module disable dashboard, ceph mgr module enable dashboard). You could also totally fail over the mgr (ceph mgr fail) although that might change

[ceph-users] Re: ceph rgw zone create fails EINVAL

2024-06-19 Thread Adam King
I think this is at least partially a code bug in the rgw module. Where it's actually failing in the traceback is generating the return message for the user at the end, because it assumes `created_zones` will always be a list of strings and that seems to not be the case in any error scenario. That c

[ceph-users] Re: cephadm basic questions: image config, OS reimages

2024-05-16 Thread Adam King
At least for the current up-to-date reef branch (not sure what reef version you're on) when --image is not provided to the shell, it should try to infer the image in this order 1. from the CEPHADM_IMAGE env. variable 2. if you pass --name with a daemon name to the shell command, it will t

[ceph-users] CLT meeting notes May 6th 2024

2024-05-06 Thread Adam King
- DigitalOcean credits - things to ask - what would promotional material require - how much are credits worth - Neha to ask - 19.1.0 centos9 container status - close to being ready - will be building centos 8 and 9 containers simultaneously - should test o

[ceph-users] Re: ceph recipe for nfs exports

2024-04-24 Thread Adam King
> > - Although I can mount the export I can't write on it > > What error are you getting trying to do the write? The way you set things up doesn't look to different than one of our integration tests for ingress over nfs ( https://github.com/ceph/ceph/blob/main/qa/suites/orch/cephadm/smoke-roleless/

[ceph-users] Re: which grafana version to use with 17.2.x ceph version

2024-04-23 Thread Adam King
FWIW, cephadm uses `quay.io/ceph/ceph-grafana:9.4.7` as the default grafana image in the quincy branch On Tue, Apr 23, 2024 at 11:59 AM Osama Elswah wrote: > Hi, > > > in quay.io I can find a lot of grafana versions for ceph ( > https://quay.io/repository/ceph/grafana?tab=tags) how can I find ou

[ceph-users] Re: reef 18.2.3 QE validation status

2024-04-16 Thread Adam King
ph/ceph/pull/56714> On Tue, Apr 16, 2024 at 1:39 PM Laura Flores wrote: > On behalf of @Radoslaw Zarzynski , rados approved. > > Below is the summary of the rados suite failures, divided by component. @Adam > King @Venky Shankar PTAL at the > orch and cephfs failures to se

[ceph-users] Re: reef 18.2.3 QE validation status

2024-04-14 Thread Adam King
es, still trying, Laura PTL > > rados - Radek, Laura approved? Travis? Nizamudeen? > > rgw - Casey approved? > fs - Venky approved? > orch - Adam King approved? > > krbd - Ilya approved > powercycle - seems fs related, Venky, Brad PTL > > ceph-volume - will

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-09 Thread Adam King
>> Hi Adam >> >> Let me just finish tucking in a devlish tyke here and i’ll get to it >> first thing >> >> tirs. 9. apr. 2024 kl. 18.09 skrev Adam King : >> >>> I did end up writing a unit test to see what we calculated here, as well >>>

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-09 Thread Adam King
t; "memory_total_kb": 32827840, > > On Thu, Apr 4, 2024 at 10:14 PM Adam King wrote: > >> Sorry to keep asking for more info, but can I also get what `cephadm >> gather-facts` on that host returns for "memory_total_kb". Might end up >> creating a

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-04-04 Thread Adam King
1 running (3w) > 7m ago 11M2698M4096M 17.2.6 > osd.9my-ceph01 running (3w) > 7m ago 11M3364M4096M 17.2.6 > prometheus.my-ceph01 my-ceph01 *:9095 running (3w) 7m > ago 13M 164M-

[ceph-users] Re: CEPHADM_HOST_CHECK_FAILED

2024-04-04 Thread Adam King
First, I guess I would make sure that peon7 and peon12 actually could pass the host check (you can run "cephadm check-host" on the host directly if you have a copy of the cephadm binary there) Then I'd try a mgr failover (ceph mgr fail) to clear out any in memory host values cephadm might have and

[ceph-users] Re: Pacific Bug?

2024-04-02 Thread Adam King
https://tracker.ceph.com/issues/64428 should be it. Backports are done for quincy, reef, and squid and the patch will be present in the next release for each of those versions. There isn't a pacific backport as, afaik, there are no more pacific releases planned. On Fri, Mar 29, 2024 at 6:03 PM Ale

[ceph-users] Re: cephadm shell version not consistent across monitors

2024-04-02 Thread Adam King
From what I can see with the most recent cephadm binary on pacific, unless you have the CEPHADM_IMAGE env variable set, it does a `podman images --filter label=ceph=True --filter dangling=false` (or docker) and takes the first image in the list. It seems to be getting sorted by creation time by def

[ceph-users] Re: Failed adding back a node

2024-03-28 Thread Adam King
No, you can't use the image id for hte upgrade command, it has to be the image name. So it should start, based on what you have, registry.redhat.io/rhceph/. As for the full name, it depends which image you want to go with. As for trying this on an OSD first, there is `ceph orch daemon redeploy --i

[ceph-users] Re: Failed adding back a node

2024-03-27 Thread Adam King
From the ceph versions output I can see "osd": { "ceph version 16.2.10-160.el8cp (6977980612de1db28e41e0a90ff779627cde7a8c) pacific (stable)": 160 }, It seems like all the OSD daemons on this cluster are using that 16.2.10-160 image, and I'm guessing most of them are running, so it mu

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-27 Thread Adam King
ording to ceph orch > ps. Then again, they are nowhere near the values stated in min_size_by_type > that you list. > Obviously yes, I could disable the auto tuning, but that would leave me > none the wiser as to why this exact host is trying to do this. > > > > On Tue, Mar

[ceph-users] Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

2024-03-26 Thread Adam King
For context, the value the autotune goes with takes the value from `cephadm gather-facts` on the host (the "memory_total_kb" field) and then subtracts from that per daemon on the host according to min_size_by_type = { 'mds': 4096 * 1048576, 'mgr': 4096 * 1048576, 'mon':

[ceph-users] Re: Upgrading from Reef v18.2.1 to v18.2.2

2024-03-21 Thread Adam King
> > Hi, > > On 3/21/24 14:50, Michael Worsham wrote: > > > > Now that Reef v18.2.2 has come out, is there a set of instructions on > how to upgrade to the latest version via using Cephadm? > > Yes, there is: https://docs.ceph.com/en/reef/cephadm/upgrade/ > Just a note on that docs section, it refe

[ceph-users] Re: ceph-volume fails when adding spearate DATA and DATA.DB volumes

2024-03-06 Thread Adam King
If you want to be directly setting up the OSDs using ceph-volume commands (I'll pretty much always recommend following https://docs.ceph.com/en/latest/cephadm/services/osd/#dedicated-wal-db over manual ceph-volume stuff in cephadm deployments unless what you're doing can't be done with the spec fil

[ceph-users] Re: Ceph reef mon is not starting after host reboot

2024-03-06 Thread Adam King
When you ran this, was it directly on the host, or did you run `cephadm shell` first? The two things you tend to need to connect to the cluster (that "RADOS timed out" error is generally what you get when connecting to the cluster fails. A bunch of different causes all end with that error) are a ke

[ceph-users] Re: Upgraded 16.2.14 to 16.2.15

2024-03-05 Thread Adam King
There was a bug with this that was fixed by https://github.com/ceph/ceph/pull/52122 (which also specifically added an integration test for this case). It looks like it's missing a reef and quincy backport though unfortunately. I'll try to open one for both. On Tue, Mar 5, 2024 at 8:26 AM Eugen Blo

[ceph-users] Re: Ceph orch doesn't execute commands and doesn't report correct status of daemons

2024-03-03 Thread Adam King
Okay, it seems like from what you're saying the RGW image itself isn't special compared to the other ceph daemons, it's just that you want to use the image on your local registry. In that case, I would still recommend just using `ceph orch upgrade start --image ` with the image from your local regi

[ceph-users] Re: [Quincy] NFS ingress mode haproxy-protocol not recognized

2024-03-03 Thread Adam King
According to https://tracker.ceph.com/issues/58933, that was only backported as far as reef. If I remember correctly, the reason for that was the ganehsa version itself we were including in our quincy containers wasn't new enough to support the feature on that end, so backporting the nfs/orchestrat

[ceph-users] Re: Ceph orch doesn't execute commands and doesn't report correct status of daemons

2024-03-01 Thread Adam King
There have been bugs in the past where things have gotten "stuck". Usually I'd say check the REFRESHED column in the output of `ceph orch ps`. It should refresh the daemons on each host roughly every 10 minutes, so if you see some value much larger than that, things are probably actually stuck. If

[ceph-users] Re: Migration from ceph-ansible to Cephadm

2024-02-29 Thread Adam King
> > - I still have the ceph-crash container, what should I do with it? > If it's the old one, I think you can remove it. Cephadm can deploy its own crash service (`ceph orch apply crash` if it hasn't). You can check if `crash` is listed under `ceph orch ls` and if it is there you can do `ceph orch

[ceph-users] Re: Some questions about cephadm

2024-02-26 Thread Adam King
In regards to > > From the reading you gave me I have understood the following : > 1 - Set osd_memory_target_autotune to true then set > autotune_memory_target_ratio to 0.2 > 2 - Or do the math. For my setup I have 384Go per node, each node has 4 > nvme disks of 7.6To, 0.2 of memory is 19.5G. So ea

[ceph-users] Re: Some questions about cephadm

2024-02-21 Thread Adam King
Cephadm does not have some variable that explicitly says it's an HCI deployment. However, the HCI variable in ceph ansible I believe only controlled the osd_memory_target attribute, which would automatically set it to 20% or 70% respectively of the memory on the node divided by the number of OSDs

[ceph-users] Re: first_virtual_router_id not allowed in ingress manifest

2024-02-21 Thread Adam King
It seems the quincy backport for that feature ( https://github.com/ceph/ceph/pull/53098) was merged Oct 1st 2023. According to the quincy part of https://docs.ceph.com/en/latest/releases/#release-timeline it looks like that would mean it would only be present in 17.2.7, but not 17.2.6. On Wed, Feb

[ceph-users] Re: Pacific Bug?

2024-02-14 Thread Adam King
Does seem like a bug, actually in more than just this command. The `ceph orch host ls` with the --label and/or --host-pattern flag just piggybacks off of the existing filtering done for placements in service specs. I've just taken a look and you actually can create the same behavior with the placem

[ceph-users] Re: Pacific: Drain hosts does not remove mgr daemon

2024-01-31 Thread Adam King
If you just manually run `ceph orch daemon rm ` does it get removed? I know there's some logic in host drain that does some ok-to-stop checks that can cause things to be delayed or stuck if it doesn't think it's safe to remove the daemon for some reason. I wonder if it's being overly cautious here.

[ceph-users] CLT meeting notes January 24th 2024

2024-01-24 Thread Adam King
- Build/package PRs- who to best review these? - Example: https://github.com/ceph/ceph/pull/55218 - Idea: create a GitHub team specifically for these types of PRs https://github.com/orgs/ceph/teams - Laura will try to organize people for the group - Pacific 16.2.15 status

[ceph-users] Re: nfs export over RGW issue in Pacific

2023-12-07 Thread Adam King
The first handling of nfs exports over rgw in the nfs module, including the `ceph nfs export create rgw` command, wasn't added to the nfs module in pacific until 16.2.7. On Thu, Dec 7, 2023 at 1:35 PM Adiga, Anantha wrote: > Hi, > > > oot@a001s016:~# cephadm version > > Using recent ceph image c

[ceph-users] Re: error deploying ceph

2023-11-30 Thread Adam King
G N/A >N/ANo 27m agoHas a FileSystem, Insufficient space (<10 > extents) on vgs, LVM detected > node3-ceph /dev/xvdb ssd 100G N/A >N/ANo 27m agoHas a FileSystem, Insufficient space (<10 >

[ceph-users] Re: error deploying ceph

2023-11-29 Thread Adam King
data: > pools: 0 pools, 0 pgs > objects: 0 objects, 0 B > usage: 0 B used, 0 B / 0 B avail > pgs: > > root@node1-ceph:~# > > Regards > > > > On Wed, Nov 29, 2023 at 5:45 PM Adam King wrote: > >> I think I remember a bug that happened

[ceph-users] Re: error deploying ceph

2023-11-29 Thread Adam King
I think I remember a bug that happened when there was a small mismatch between the cephadm version being used for bootstrapping and the container. In this case, the cephadm binary used for bootstrap knows about the ceph-exporter service and the container image being used does not. The ceph-exporter

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-16 Thread Adam King
tart building. > > Travis, Adam King - any need to rerun any suites? > > On Thu, Nov 16, 2023 at 7:14 AM Guillaume Abrioux > wrote: > > > > Hi Yuri, > > > > > > > > Backport PR [2] for reef has been merged. > > > > >

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-14 Thread Adam King
t; ran the tests below and asking for approvals: > > smoke - Laura > rados/mgr - PASSED > rados/dashboard - Nizamudeen > orch - Adam King > > See Build 4 runs - https://tracker.ceph.com/issues/63443#note-1 > > On Tue, Nov 14, 2023 at 12:21 AM Redouane Kachach > wrote:

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-08 Thread Adam King
> > https://tracker.ceph.com/issues/63151 - Adam King do we need anything for > this? > Yes, but not an actual code change in the main ceph repo. I'm looking into a ceph-container change to alter the ganesha version in the container as a solution. On Wed, Nov 8, 2023 at 11:10 

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-07 Thread Adam King
ests: > https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd > > Still seeing approvals. > smoke - Laura, Radek, Prashant, Venky in progress > rados - Neha, Radek, Travis, Ernesto, Adam King > rgw - Casey in progress > fs - Venky > orch - Adam King > rb

[ceph-users] Re: quincy v17.2.7 QE Validation status

2023-10-17 Thread Adam King
> Should it be fixed for this release? > > Seeking approvals/reviews for: > > smoke - Laura > rados - Laura, Radek, Travis, Ernesto, Adam King > > rgw - Casey > fs - Venky > orch - Adam King > > rbd - Ilya > krbd - Ilya > > upgrade/quincy-p2p - Known is

[ceph-users] CLT weekly notes October 11th 2023

2023-10-11 Thread Adam King
ites and dropping that as a build target - Last Pacific? - Yes, 17.2.7, then 18.2.1, then 16.2.15 (final) - PTLs will need to go through and find what backports still need to get into pacific - A lot of open pacific backports right no

[ceph-users] Re: cephadm, cannot use ECDSA key with quincy

2023-10-10 Thread Adam King
The CA signed keys working in pacific was sort of accidental. We found out that it was a working use case in pacific but not in quincy earlier this year, which resulted in this tracker https://tracker.ceph.com/issues/62009. That has since been implemented in main, and backported to the reef branch

[ceph-users] Re: ceph orch osd data_allocate_fraction does not work

2023-09-21 Thread Adam King
Looks like the orchestation side support for this got brought into pacific with the rest of the drive group stuff, but the actual underlying feature in ceph-volume (from https://github.com/ceph/ceph/pull/40659) never got a pacific backport. I've opened the backport now https://github.com/ceph/ceph/

[ceph-users] Re: 16.2.14 pacific QE validation status

2023-08-28 Thread Adam King
up in the Jenkins api check, where these kinds of >> conditions are expected. In that case, I would call #1 more of a test >> issue, and say that the fix is to whitelist the warning for that test. >> Would be good to have someone from CephFS weigh in though-- @Patrick >> D

[ceph-users] Re: cephadm to setup wal/db on nvme

2023-08-23 Thread Adam King
this should be possible by specifying a "data_devices" and "db_devices" fields in the OSD spec file each with different filters. There's some examples in the docs https://docs.ceph.com/en/latest/cephadm/services/osd/#the-simple-case that show roughly how that's done, and some other sections ( https

[ceph-users] Re: osdspec_affinity error in the Cephadm module

2023-08-16 Thread Adam King
it looks like you've hit https://tracker.ceph.com/issues/58946 which has a candidate fix open, but nothing merged. The description on the PR with the candidate fix says "When osdspec_affinity is not set, the drive selection code will fail. This can happen when a device has multiple LVs where some o

[ceph-users] Re: cephadm orchestrator does not restart daemons [was: ceph orch upgrade stuck between 16.2.7 and 16.2.13]

2023-08-16 Thread Adam King
I've seen this before where the ceph-volume process hanging causes the whole serve loop to get stuck (we have a patch to get it to timeout properly in reef and are backporting to quincy but nothing for pacific unfortunately). That's why I was asking about the REFRESHED column in the orch ps/ orch d

[ceph-users] Re: Cephadm adoption - service reconfiguration changes container image

2023-08-15 Thread Adam King
you could maybe try running "ceph config set global container quay.io/ceph/ceph:v16.2.9" before running the adoption. It seems it still thinks it should be deploying mons with the default image ( docker.io/ceph/daemon-base:latest-pacific-devel ) for some reason and maybe that config option is why.

[ceph-users] Re: ceph orch upgrade stuck between 16.2.7 and 16.2.13

2023-08-15 Thread Adam King
with the log to cluster level already on debug, if you do a "ceph mgr fail" what does cephadm log to the cluster before it reports sleeping? It should at least be doing something if it's responsive at all. Also, in "ceph orch ps" and "ceph orch device ls" are the REFRESHED columns reporting that t

[ceph-users] Re: ref v18.2.0 QE Validation status

2023-07-31 Thread Adam King
ein wrote: > Details of this release are summarized here: > > https://tracker.ceph.com/issues/62231#note-1 > > Seeking approvals/reviews for: > > smoke - Laura, Radek > rados - Neha, Radek, Travis, Ernesto, Adam King > rgw - Casey > fs - Venky > orch - Adam King > rbd

[ceph-users] Re: cephadm logs

2023-07-28 Thread Adam King
Not currently. Those logs aren't generated by any daemons, they come directly from anything done by the cephadm binary one the host, which tends to be quite a bit since the cephadm mgr module runs most of its operations on the host through a copy of the cephadm binary. It doesn't log to journal bec

[ceph-users] Re: Failing to restart mon and mgr daemons on Pacific

2023-07-25 Thread Adam King
hestrator._interface.OrchestratorError: cephadm exited with an error > code: 1, stderr:Deploy daemon node-exporter.darkside1 ... > Verifying port 9100 ... > Cannot bind to IP 0.0.0.0 port 9100: [Errno 98] Address already in use > ERROR: TCP Port(s) '9100' required for node-exp

[ceph-users] Re: Failing to restart mon and mgr daemons on Pacific

2023-07-24 Thread Adam King
The logs you probably really want to look at here are the journal logs from the mgr and mon. If you have a copy of the cephadm tool on the host, you can do a "cephadm ls --no-detail | grep systemd" to list out the systemd unit names for the ceph daemons on the host, or just look find the systemd un

[ceph-users] Re: cephadm does not redeploy OSD

2023-07-19 Thread Adam King
uot;db_uuid": "CUMgp7-Uscn-ASLo-bh14-7Sxe-80GE-EcywDb", > > "name": "osd-block-db-5cb8edda-30f9-539f-b4c5-dbe420927911", > > "osd_fsid": "089894cf-1782-4a3a-8ac0-9dd043f80c71", > > "osd_id": "7", > > "

[ceph-users] Re: cephadm does not redeploy OSD

2023-07-18 Thread Adam King
in the "ceph orch device ls --format json-pretty" output, in the blob for that specific device, is the "ceph_device" field set? There was a bug where it wouldn't be set at all (https://tracker.ceph.com/issues/57100) and it would make it so you couldn't use a device serving as a db device for any fu

[ceph-users] Re: CEPHADM_FAILED_SET_OPTION

2023-07-18 Thread Adam King
Someone hit what I think is this same issue the other day. Do you have a "config" section in your rgw spec that sets the "rgw_keystone_implicit_tenants" option to "True" or "true"? For them, changing the value to be 1 (which should be equivalent to "true" here) instead of "true" fixed it. Likely an

  1   2   3   >