[ceph-users] Re: HELP! Cluster usage increased after adding new nodes/osd's

2025-07-07 Thread Stefan Kooman
On 7/7/25 18:34, mhnx wrote: Hello! Few years ago I build a "dc-a:12 + dc-b:12 = 24" node ceph cluster with Nautilus v14.2.16 A year ago the cluster upgraded to Octopus and it was running fine. Recently I added 4+4=8 new nodes with identical hardware and SSD drives. When I created OSD's with Oct

[ceph-users] Re: Where are you running Ceph?

2025-07-03 Thread Stefan Kooman
On 7/3/25 00:02, Anthony Fecarotta wrote: Thank you. From what I can tell, after being on the mailing list for six months, it seems most users are running Kubernetes. The last Ceph survey we did (2022) showed a different picture: https://ceph.io/en/news/blog/2022/ceph-user-survey-results-2022

[ceph-users] Loki retention policy missing?

2025-06-30 Thread Stefan Kooman
Hi, Yesterday we got a disk space warning notification on one of the monitor nodes. Most space consumption was in use by a running Loki container. Looking into online documentation / trackers to find info on Ceph's default Loki retention policy left me empty handed. I did fine some trackers f

[ceph-users] Re: RADOS Gateway IAM for Production Environments ( reef to squid upgrade )

2025-06-30 Thread Stefan Kooman
Hi Casey, On 6/30/25 20:35, Casey Bodley wrote: the user account and IAM features are fully supported in squid - not "experimental" or "tech preview" That's good to know. In this blog post [1] however the very start of it states: "Efficient multitenant environment management is critical in

[ceph-users] Re: First time configuration advice

2025-06-26 Thread Stefan Kooman
On 6/25/25 09:19, Joachim Kraftmayer wrote: Hi, i agree to configure both interfaces as a bond. from my experience, i see the following advantages for a separate public and cluster network on the bond: the isolation of public network and cluster network traffic makes it easier to monitor client

[ceph-users] Re: Debugging OSD cache thrashing

2025-06-25 Thread Stefan Kooman
On 6/22/25 18:25, Hector Martin wrote: On 2025/06/23 0:21, Anthony D'Atri wrote: DIMMs are cheap. No DIMMs on Apple Macs. You’re running virtualized in VMs or containers, with OSDs, mons, mgr, and the constellation of other daemons with resources dramatically below recommendations. I’

[ceph-users] How does Ceph delete objects / data on disk?

2025-05-28 Thread Stefan Kooman
Hi, I would like to get a better understanding at what happens under the hood when an object is deleted in a Ceph cluster. Is there a (detailed) write up of the deletion process? I would like to be able to answer questions like: - How can we be sure that after the successful completion of fo

[ceph-users] Re: How to (permanently) disable msgr v1 on Ceph?

2025-03-14 Thread Stefan Kooman
On 14-03-2025 10:44, Janne Johansson wrote: I'll leave it to the devs to discuss this one. It would be nice if the defaults for newly created clusters also came with the global reclaim id thing disabled, so we didn't have to manually enable msgrv2 (and disable v1 possibly as per this thread) an

[ceph-users] Re: How to (permanently) disable msgr v1 on Ceph?

2025-03-14 Thread Stefan Kooman
On 14-03-2025 09:53, Janne Johansson wrote: On 13-03-2025 16:08, Frédéric Nass wrote: If ceph-mon respected ms_bind_msgr1 = false, then one could add --ms-bind-msgr1=false as extra_entrypoint_args in the mon service_type [1], so as to have any ceph-mon daemons deployed or redeployed using msgr

[ceph-users] Re: How to (permanently) disable msgr v1 on Ceph?

2025-03-13 Thread Stefan Kooman
On 13-03-2025 16:08, Frédéric Nass wrote: Hi Stefan, If ceph-mon respected ms_bind_msgr1 = false, then one could add --ms-bind-msgr1=false as extra_entrypoint_args in the mon service_type [1], so as to have any ceph-mon daemons deployed or redeployed using msgr v2 exclusively. Unfortunately,

[ceph-users] How to (permanently) disable msgr v1 on Ceph?

2025-03-13 Thread Stefan Kooman
Hi, For new clusters one of the first things I do is to disable messenger v1: ceph config set mon ms_bind_msgr1 false However, that is not enough, as a restart of the monitors will leave v1 enabled in the monmap. Why is that? That looks like a bug to me. Therefore I explicitly set the addrs

[ceph-users] Re: Severe Latency Issues in Ceph Cluster

2025-03-03 Thread Stefan Kooman
On 01-03-2025 15:10, Ramin Najjarbashi wrote: Hi We are currently facing severe latency issues in our Ceph cluster, particularly affecting read and write operations. At times, write operations completely stall, leading to significant service degradation. Below is a detailed breakdown of the issue

[ceph-users] Re: ceph rdb + libvirt

2025-02-12 Thread Stefan Kooman
On 12-02-2025 16:49, Curt wrote: Hi, I'm guessing you are deciding between librbd and krbd. I personally use krbd as in my original tests it was a bit faster. I think there are some cases where librbd is faster, but I don't remember those edge cases off the top of my head. That's my two cents.

[ceph-users] Re: ceph iscsi gateway

2025-02-12 Thread Stefan Kooman
On 10-02-2025 12:26, Iban Cabrillo wrote: Good morning, I wanted to inquire about the status of the Ceph iSCSI gateway service. We currently have several machines installed with this technology that are working correctly, although I have seen that it appears to be discontinued since 2022. M

[ceph-users] Re: Ceph Tentacle release timeline — when?

2025-02-06 Thread Stefan Kooman
On 05-02-2025 16:04, Gregory Farnum wrote: Hi all, We in the Ceph Steering Committee are discussing when we want to target the Tentacle release for, as we find ourselves in an unusual scheduling situation: * Historically, we have targeted our major release in early Spring. I believe this was init

[ceph-users] Re: Update host operating system - Ceph version 18.2.4 reef

2024-12-18 Thread Stefan Kooman
On 02-12-2024 21:53, alessan...@universonet.com.br wrote: Ceph version 18.2.4 reef (cephadm) Hello, We have a cluster running with 6 Ubuntu 20.04 servers and we would like to add another host but with Ubuntu 22.04, will we have any problems? We would like to add new HOST with Ubuntu 22.04 and

[ceph-users] ceph network acl: multiple network prefixes possible?

2024-12-17 Thread Stefan Kooman
Hi List, Is it possible to specify multiple network prefixes for a given ceph user? I.e. osd 'allow {access-spec} [{match-spec}] [network {network1/prefix} {network2/prefix} {network3/prefix}] ' osd 'profile {name} [pool={pool-name} [namespace={namespace-name}]] [network {network1/prefix}

[ceph-users] Re: Cephalocon Update - New Users Workshop and Power Users Session

2024-11-26 Thread Stefan Kooman
On 26-11-2024 09:37, Gregory Orange wrote: On 25/11/24 15:57, Stefan Kooman wrote: Update: The Ceph Developer Summit is nearing capacity for "Developers". There is still room for "Power Users" to register for the afternoon session. See below for details... However, it&#x

[ceph-users] Re: Cephalocon Update - New Users Workshop and Power Users Session

2024-11-25 Thread Stefan Kooman
On 18-10-2024 01:32, Dan van der Ster wrote: - Experienced Ceph User? Participate in the Power Users afternoon session at the Developers Summit - https://indico.cern.ch/e/ceph-developer-summit https://indico.cern.ch/event/1417034/ gives me: Update: The Ceph Developer Summit is nearing capacity

[ceph-users] Re: cephadm node failure (re-use OSDs instead of reprovisioning)

2024-11-13 Thread Stefan Kooman
On 13-11-2024 18:13, Eugen Block wrote: Hi, of course there is: https://docs.ceph.com/en/latest/cephadm/services/osd/#activate-existing- osds It has worked great for us. The orchestrator will ensure that keyrings and configs are copied (if set to managed), so you really just have to restor

[ceph-users] cephadm node failure (re-use OSDs instead of reprovisioning)

2024-11-13 Thread Stefan Kooman
Hi list, In case of an OS disk failure on a cephadm managed storage node, is there a way to redeploy ceph on the (reinstalled) node leaving the data (OSDs) intact? So instead of removing the storage node, have the cluster recover, redeploy the storage node, and let the cluster recover, I wou

[ceph-users] Re: Help with "27 osd(s) are not reachable" when also "27 osds: 27 up.. 27 in"

2024-10-17 Thread Stefan Kooman
On 17-10-2024 15:16, Nico Schottelius wrote: Stefan Kooman writes: On 16-10-2024 03:02, Harry G Coin wrote: Thanks for the notion!  I did that, the result was no change to the problem, but with the added ceph -s complaint "Public/cluster network defined, but can not be found on any

[ceph-users] Re: Help with "27 osd(s) are not reachable" when also "27 osds: 27 up.. 27 in"

2024-10-17 Thread Stefan Kooman
On 16-10-2024 03:02, Harry G Coin wrote: Thanks for the notion!  I did that, the result was no change to the problem, but with the added ceph -s complaint "Public/cluster network defined, but can not be found on any host"  -- with otherwise totally normal cluster operations.  Go figure.  How ca

[ceph-users] Re: Overlapping Roots - How to Fix?

2024-09-23 Thread Stefan Kooman
On 23-09-2024 16:31, Janne Johansson wrote: Den mån 23 sep. 2024 kl 16:23 skrev Stefan Kooman : On 23-09-2024 16:04, Dave Hall wrote: Thank you to everybody who has responded to my questions. At this point I think I am starting to understand. However, I am still trying to understand the

[ceph-users] Re: Overlapping Roots - How to Fix?

2024-09-23 Thread Stefan Kooman
On 23-09-2024 16:04, Dave Hall wrote: Thank you to everybody who has responded to my questions. At this point I think I am starting to understand. However, I am still trying to understand the potential for data loss. In particular: - In some ways it seems that as long as there is sufficie

[ceph-users] Re: [External Email] Overlapping Roots - How to Fix?

2024-09-19 Thread Stefan Kooman
On 19-09-2024 05:10, Anthony D'Atri wrote: Anthony, So it sounds like I need to make a new crush rule for replicated pools that specifies default-hdd and the device class? (Or should I go the other way around? I think I'd rather change the replicated pools even though there's more of th

[ceph-users] Re: Numa pinning best practices

2024-09-13 Thread Stefan Kooman
On 07-05-2024 22:37, Szabo, Istvan (Agoda) wrote: Hi, Haven't really found a proper descripton in case of 2 socket how to pin osds to numa node, only this: https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments#Ceph-Storage-Node-NUMA-Tuning Tuning for All Flash Deployment

[ceph-users] Re: bluefs _allocate unable to allocate on bdev 2

2024-09-12 Thread Stefan Kooman
*From:* Stefan Kooman *Sent:* Thursday, September 12, 2024 3:54 PM *To:* Szabo, Istvan (Agoda) ; igor.fedo...@croit.io *Cc:* Ceph Users *Subject:* Re: [ceph-users] Re: bluefs _allocate unable to allocate on bdev 2 Email received from the internet. If in doubt, don't click any

[ceph-users] Re: bluefs _allocate unable to allocate on bdev 2

2024-09-12 Thread Stefan Kooman
On 12-09-2024 06:43, Szabo, Istvan (Agoda) wrote: Maybe we are running into this bug Igor? https://github.com/ceph/ceph/pull/48854 That would be a solution for the bug you might be hitting (unable to allocate 64K aligned blocks for RocksDB). I would not be surprised if you hit this issue if

[ceph-users] Re: Ceph on Ubuntu 24.04 - Arm64

2024-07-30 Thread Stefan Kooman
On 30-07-2024 15:48, John Mulligan wrote: On Tuesday, July 30, 2024 8:46:31 AM EDT Daniel Brown wrote: Is there any workable solution for running Ceph on Ubuntu 24.04 on Arm64? I’ve tried about every package install method I could think of, short of compiling it myself. I’m aiming for a “cepha

[ceph-users] Re: v19.1.0 Squid RC0 released

2024-07-19 Thread Stefan Kooman
Hi, On 12-07-2024 00:27, Yuri Weinstein wrote: ... * For packages, see https://docs.ceph.com/en/latest/install/get-packages/ I see that only packages have been build for Ubuntu 22.04 LTS. Will there also be packages built for 24.04 LTS (the current LTS)? Thanks, Gr. Stefan _

[ceph-users] Re: Cephadm has a small wart

2024-07-19 Thread Stefan Kooman
On 19-07-2024 14:04, Tim Holloway wrote: Ah. Makes sense. Might be nice if the container build appended something like "cephadm container" to the redhat-release string, though. A more concerning item is that the container is based on CentOS 8 Stream. I'd feel more comfortable if the base OS was

[ceph-users] Re: cephadm for Ubuntu 24.04

2024-07-12 Thread Stefan Kooman
On 12-07-2024 09:33, tpDev Tester wrote: Hi, Am 11.07.2024 um 14:20 schrieb John Mulligan: ... as far as I know, we still have an issue https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2063456 with ceph on 24.04. I tried the offered fix, but was still unable to establish a running clust

[ceph-users] Re: cephadm for Ubuntu 24.04

2024-07-11 Thread Stefan Kooman
On 11-07-2024 14:20, John Mulligan wrote: On Thursday, July 11, 2024 4:22:28 AM EDT Stefan Kooman wrote: On 11-07-2024 09:55, Malte Stroem wrote: Hello Stefan, have a look: https://docs.ceph.com/en/latest/cephadm/install/#curl-based-installation Yeah, I have read that part. Just download

[ceph-users] Re: cephadm for Ubuntu 24.04

2024-07-11 Thread Stefan Kooman
On 11-07-2024 09:55, Malte Stroem wrote: Hello Stefan, have a look: https://docs.ceph.com/en/latest/cephadm/install/#curl-based-installation Yeah, I have read that part. Just download cephadm. It will work on any distro. curl --silent --remote-name --location https://download.ceph.com/r

[ceph-users] cephadm for Ubuntu 24.04

2024-07-10 Thread Stefan Kooman
Hi, Is it possible to only build "cephadm", so not the other ceph packages / daemons? Or can we think about a way to have cephadm packages build for all supported mainstream linux releases during the supported lifetime of a Ceph release: i.e. debian, Ubuntu LTS, CentOS Stream? I went ahead a

[ceph-users] Re: [EXTERN] Urgent help with degraded filesystem needed

2024-07-10 Thread Stefan Kooman
Hi, On 01-07-2024 10:34, Stefan Kooman wrote: Not that I know of. But changes in behavior of Ceph (daemons) and or Ceph kernels would be good to know about indeed. I follow the ceph-kernel mailing list to see what is going on with the development of kernel CephFS. And there is a thread

[ceph-users] Re: Pacific 16.2.15 `osd noin`

2024-07-08 Thread Stefan Kooman
On 02-04-2024 15:09, Zakhar Kirpichenko wrote: Hi, I'm adding a few OSDs to an existing cluster, the cluster is running with `osd noout,noin`: cluster: id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86 health: HEALTH_WARN noout,noin flag(s) set Specifically `noin` is docum

[ceph-users] Re: [EXTERN] Urgent help with degraded filesystem needed

2024-07-02 Thread Stefan Kooman
Hi Venky, On 02-07-2024 09:45, Venky Shankar wrote: Hi Stefan, On Mon, Jul 1, 2024 at 2:30 PM Stefan Kooman wrote: Hi Dietmar, On 29-06-2024 10:50, Dietmar Rieder wrote: Hi all, finally we were able to repair the filesystem and it seems that we did not lose any data. Thanks for all

[ceph-users] Re: [EXTERN] Urgent help with degraded filesystem needed

2024-07-01 Thread Stefan Kooman
Hi Dietmar, On 29-06-2024 10:50, Dietmar Rieder wrote: Hi all, finally we were able to repair the filesystem and it seems that we did not lose any data. Thanks for all suggestions and comments. Here is a short summary of our journey: Thanks for writing this up. This might be useful for som

[ceph-users] Re: [EXTERN] Re: Urgent help with degraded filesystem needed

2024-06-19 Thread Stefan Kooman
Hi, On 19-06-2024 11:15, Dietmar Rieder wrote: Please follow https://docs.ceph.com/en/nautilus/cephfs/disaster-recovery-experts/#disaster-recovery-experts. OK, when I run the cephfs-journal-tool I get an error: # cephfs-journal-tool journal export backup.bin Error ((22) Invalid argument) My

[ceph-users] degraded objects when setting different CRUSH rule on a pool, why?

2024-06-05 Thread Stefan Kooman
Hi, TL;DR: Selecting a different CRUSH rule (stretch_rule, no device class) for pool SSD results in degraded objects (unexpected) and misplaced objects (expected). Why would Ceph drop up to two healthy copies? Consider this two data center cluster: ID CLASS WEIGHT TYPE NAME S

[ceph-users] Re: rbd-mirror failed to query services: (13) Permission denied

2024-05-21 Thread Stefan Kooman
Hi, On 29-04-2024 17:15, Ilya Dryomov wrote: On Tue, Apr 23, 2024 at 8:28 PM Stefan Kooman wrote: On 23-04-2024 17:44, Ilya Dryomov wrote: On Mon, Apr 22, 2024 at 7:45 PM Stefan Kooman wrote: Hi, We are testing rbd-mirroring. There seems to be a permission error with the rbd-mirror user

[ceph-users] Re: stretched cluster new pool and second pool with nvme

2024-04-30 Thread Stefan Kooman
On 30-04-2024 11:22, ronny.lippold wrote: hi stefan ... you are the hero of the month ;) :p. i don't know, why i did not found your bug report. i have the exact same problem and resolved the HEALTH only with "ceph osd force_healthy_stretch_mode --yes-i-really-mean-it" will comment the rep

[ceph-users] rbd-mirror get status updates quicker

2024-04-25 Thread Stefan Kooman
Hi, We're testing with rbd-mirror (mode snapshot) and try to get status updates about snapshots as fast a possible. We want to use rbd-mirror as a migration tool between two clusters and keep downtime during migration as short as possible. Therefore we have tuned the following parameters and

[ceph-users] Re: rbd-mirror failed to query services: (13) Permission denied

2024-04-23 Thread Stefan Kooman
On 23-04-2024 17:44, Ilya Dryomov wrote: On Mon, Apr 22, 2024 at 7:45 PM Stefan Kooman wrote: Hi, We are testing rbd-mirroring. There seems to be a permission error with the rbd-mirror user. Using this user to query the mirror pool status gives: failed to query services: (13) Permission

[ceph-users] Re: stretched cluster new pool and second pool with nvme

2024-04-23 Thread Stefan Kooman
On 23-04-2024 14:40, Eugen Block wrote: Hi, whats the right way to add another pool? create pool with 4/2 and use the rule for the stretched mode, finished? the exsisting pools were automaticly set to 4/2 after "ceph mon enable_stretch_mode". It should be that simple. However, it does not se

[ceph-users] rbd-mirror failed to query services: (13) Permission denied

2024-04-22 Thread Stefan Kooman
Hi, We are testing rbd-mirroring. There seems to be a permission error with the rbd-mirror user. Using this user to query the mirror pool status gives: failed to query services: (13) Permission denied And results in the following output: health: UNKNOWN daemon health: UNKNOWN image health: O

[ceph-users] Re: How to make config changes stick for MDS?

2024-04-16 Thread Stefan Kooman
On 17-04-2024 05:23, Erich Weiler wrote: Hi All, I'm having a crazy time getting config items to stick on my MDS daemons.  I'm running Reef 18.2.1 on RHEL 9 and the daemons are running in podman, I used cephadm to deploy the daemons. I can adjust the config items in runtime, like so: ceph

[ceph-users] Re: Setting up Hashicorp Vault for Encryption with Ceph

2024-04-16 Thread Stefan Kooman
On 15-04-2024 16:43, Michael Worsham wrote: Is there a how-to document available on how to setup Hashicorp's Vault for Ceph, preferably in a HA state? See [1] on how to do this on kubernetes. AFAIK there is no documentation / integration on using Vault with Cephadm / packages. Due to som

[ceph-users] Re: [REEF][cephadm] new cluster all pg unknown

2024-03-15 Thread Stefan Kooman
On 15-03-2024 08:10, wodel youchi wrote: Hi, I found my error, it was a mismatch between the monitor network ip address and the --cluster_network which were in different subnets. I misunderstood the --cluster_network subnet, I thought that when creating a cluster, the monitor IP designed the pub

[ceph-users] Re: [REEF][cephadm] new cluster all pg unknown

2024-03-15 Thread Stefan Kooman
On 15-03-2024 07:18, wodel youchi wrote: Hi, Note : Firewall is disabled on all hos Can you send us the crush rules that are available 1) and also the crush_rule in use for the .mgr pool 2)? Further more I would like to see an overview of the OSD tree 3) and the state of the .mgr PG (normall

[ceph-users] ceph osd crush reweight rounding issue

2024-03-13 Thread Stefan Kooman
Hi, After some tests with OSD crush weight I wanted to restore the original weight of the OSD. But that proves to be difficult. See the following example: Situation before OSD crush reweight has taken place ceph osd tree ID CLASS WEIGHT TYPE NAMESTATUS REWEIGHT PRI-AFF -11

[ceph-users] Re: Minimum amount of nodes needed for stretch mode?

2024-03-07 Thread Stefan Kooman
On 07-03-2024 18:16, Gregory Farnum wrote: On Thu, Mar 7, 2024 at 9:09 AM Stefan Kooman wrote: Hi, TL;DR Failure domain considered is data center. Cluster in stretch mode [1]. - What is the minimum amount of monitor nodes (apart from tie breaker) needed per failure domain? You need at

[ceph-users] Minimum amount of nodes needed for stretch mode?

2024-03-07 Thread Stefan Kooman
Hi, TL;DR Failure domain considered is data center. Cluster in stretch mode [1]. - What is the minimum amount of monitor nodes (apart from tie breaker) needed per failure domain? - What is the minimum amount of storage nodes needed per failure domain? - Are device classes supported with str

[ceph-users] Re: OSD with dm-crypt?

2024-02-27 Thread Stefan Kooman
On 27-02-2024 05:45, Michael Worsham wrote: I was setting up the Ceph cluster via this URL (https://computingforgeeks.com/install-ceph-storage-cluster-on-ubuntu-linux-servers/) and didn't know if there was a way to do it via the "ceph orch daemon add osd ceph-osd-01:/dev/sdb" command or not?

[ceph-users] Re: PSA: Long Standing Debian/Ubuntu build performance issue (fixed, backports in progress)

2024-02-09 Thread Stefan Kooman
On 09-02-2024 14:18, Maged Mokhtar wrote: Hi Mark, Thanks a lot for highlighting this issue...I have 2 questions: 1) In the patch comments: /"but we fail to populate this setting down when building external projects. this is important when it comes to the projects which is critical to the pe

[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-08 Thread Stefan Kooman
Hi, Is this PR: https://github.com/ceph/ceph/pull/54918 included as well? You definitely want to build the Ubuntu / debian packages with the proper CMAKE_CXX_FLAGS. The performance impact on RocksDB is _HUGE_. Thanks, Gr. Stefan P.s. Kudos to Mark Nelson for figuring it out / testing. _

[ceph-users] Re: OSD read latency grows over time

2024-01-19 Thread Stefan Kooman
On 16-01-2024 11:22, Roman Pashin wrote: Hello Ceph users, we see strange issue on last recent Ceph installation v17.6.2. We store data on HDD pool, index pool is on SSD. Each OSD store its wal on NVME partition. Do you make use of a separate db partition as well? And if so, where is it store

[ceph-users] Re: About ceph osd slow ops

2023-12-01 Thread Stefan Kooman
On 01-12-2023 08:45, VÔ VI wrote: Hi community, My cluster running with 10 nodes and 2 nodes goes down, sometimes the log shows the slow ops, what is the root cause? My osd is HDD and block.db and wal is 500GB SSD per osd. Health check update: 13 slow ops, oldest one blocked for 167 sec, osd.10

[ceph-users] Re: ceph-exporter binds to IPv4 only

2023-11-22 Thread Stefan Kooman
On 22-11-2023 15:54, Stefan Kooman wrote: Hi, In a IPv6 only deployment the ceph-exporter daemons are not listening on IPv6 address(es). This can be fixed by editing the unit.run file of the ceph-exporter by changing "--addrs=0.0.0.0" to "--addrs=::". Is this configura

[ceph-users] ceph-exporter binds to IPv4 only

2023-11-22 Thread Stefan Kooman
Hi, In a IPv6 only deployment the ceph-exporter daemons are not listening on IPv6 address(es). This can be fixed by editing the unit.run file of the ceph-exporter by changing "--addrs=0.0.0.0" to "--addrs=::". Is this configurable? So that cephadm deploys ceph-exporter with proper unit.run a

[ceph-users] Re: Service Discovery issue in Reef 18.2.0 release ( upgrading )

2023-11-21 Thread Stefan Kooman
On 15-11-2023 07:09, Brent Kennedy wrote: Greetings group! We recently reloaded a cluster from scratch using cephadm and reef. The cluster came up, no issues. We then decided to upgrade two existing cephadm clusters that were on quincy. Those two clusters came up just fine but there is a

[ceph-users] Re: stuck MDS warning: Client HOST failing to respond to cache pressure

2023-10-17 Thread Stefan Kooman
On 17-10-2023 09:22, Frank Schilder wrote: Hi all, I'm affected by a stuck MDS warning for 2 clients: "failing to respond to cache pressure". This is a false alarm as no MDS is under any cache pressure. The warning is stuck already for a couple of days. I found some old threads about cases whe

[ceph-users] Re: 6.5 CephFS client - ceph_cap_reclaim_work [ceph] / ceph_con_workfn [libceph] hogged CPU

2023-09-18 Thread Stefan Kooman
On 13-09-2023 16:49, Stefan Kooman wrote: On 13-09-2023 14:58, Ilya Dryomov wrote: On Wed, Sep 13, 2023 at 9:20 AM Stefan Kooman wrote: Hi, Since the 6.5 kernel addressed the issue with regards to regression in the readahead handling code... we went ahead and installed this kernel for a

[ceph-users] Re: Status of IPv4 / IPv6 dual stack?

2023-09-18 Thread Stefan Kooman
On 15-09-2023 09:25, Robert Sander wrote: Hi, as the documentation sends mixed signals in https://docs.ceph.com/en/latest/rados/configuration/network-config-ref/#ipv4-ipv6-dual-stack-mode "Note Binding to IPv4 is enabled by default, so if you just add the option to bind to IPv6 you’ll actual

[ceph-users] Re: ceph orchestator pulls strange images from docker.io

2023-09-15 Thread Stefan Kooman
On 15-09-2023 10:25, Stefan Kooman wrote: I could just nuke the whole dev cluster, wipe all disks and start fresh after reinstalling the hosts, but as I have to adopt 17 clusters to the orchestrator, I rather get some learnings from the not working thing 🙂 There is actually a cephadm

[ceph-users] Re: ceph orchestator pulls strange images from docker.io

2023-09-15 Thread Stefan Kooman
On 15-09-2023 09:21, Boris Behrens wrote: Hi Stefan, the cluster is running 17.6.2 through the board. The mentioned container with other version don't show in the ceph -s or ceph verions. It looks like it is host related. One host get the correct 17.2.6 images, one get the 16.2.11 images and

[ceph-users] Re: ceph orchestator pulls strange images from docker.io

2023-09-14 Thread Stefan Kooman
On 14-09-2023 17:49, Boris Behrens wrote: Hi, I currently try to adopt our stage cluster, some hosts just pull strange images. root@0cc47a6df330:/var/lib/containers/storage/overlay-images# podman ps CONTAINER ID IMAGE COMMAND CREATEDSTA

[ceph-users] Re: osd cannot get osdmap

2023-09-14 Thread Stefan Kooman
On 14-09-2023 17:32, Nathan Gleason wrote: Hello, We had a network hiccup with a Ceph cluster and it made several of our osds go out/down. After the network was fixed the osds remain down. We have restarted them in numerous ways and they won’t come up. The logs for the down osds just repeat

[ceph-users] Re: 6.5 CephFS client - ceph_cap_reclaim_work [ceph] / ceph_con_workfn [libceph] hogged CPU

2023-09-13 Thread Stefan Kooman
On 14-09-2023 03:27, Xiubo Li wrote: < - snip --> Hi Stefan, Yeah, as I remembered before I have seen something like this only once in the cephfs qa tests together with other issues, but I just thought it wasn't the root cause so I didn't spent time on it. Just went through the k

[ceph-users] Re: 6.5 CephFS client - ceph_cap_reclaim_work [ceph] / ceph_con_workfn [libceph] hogged CPU

2023-09-13 Thread Stefan Kooman
On 13-09-2023 14:58, Ilya Dryomov wrote: On Wed, Sep 13, 2023 at 9:20 AM Stefan Kooman wrote: Hi, Since the 6.5 kernel addressed the issue with regards to regression in the readahead handling code... we went ahead and installed this kernel for a couple of mail / web clusters (Ubuntu 6.5.1

[ceph-users] 6.5 CephFS client - ceph_cap_reclaim_work [ceph] / ceph_con_workfn [libceph] hogged CPU

2023-09-13 Thread Stefan Kooman
Hi, Since the 6.5 kernel addressed the issue with regards to regression in the readahead handling code... we went ahead and installed this kernel for a couple of mail / web clusters (Ubuntu 6.5.1-060501-generic #202309020842 SMP PREEMPT_DYNAMIC Sat Sep 2 08:48:34 UTC 2023 x86_64 x86_64 x86_6

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread Stefan Kooman
On 07-09-2023 19:20, J-P Methot wrote: We went from 16.2.13 to 16.2.14 Also, timeout is 15 seconds because it's the default in Ceph. Basically, 15 seconds before Ceph shows a warning that OSD is timing out. We may have found the solution, but it would be, in fact, related to bluestore_alloca

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread Stefan Kooman
On 07-09-2023 09:05, J-P Methot wrote: Hi, We're running latest Pacific on our production cluster and we've been seeing the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed out after 15.00954s' error. We have reasons to believe this happens each time the RocksDB compaction process

[ceph-users] Re: Reef release candidate - v18.1.2

2023-07-12 Thread Stefan Kooman
ceph version 18.1.2 (a5c951305c2409669162c235d81981bdc60dd9e7) reef (rc) On Wed, Jul 12, 2023 at 2:06 PM Stefan Kooman wrote: On 6/30/23 18:36, Yuri Weinstein wrote: This RC has gone thru partial testing due to issues we are experiencing in the sepia lab. Please try it out and report any issues you enco

[ceph-users] Re: Reef release candidate - v18.1.2

2023-07-12 Thread Stefan Kooman
On 6/30/23 18:36, Yuri Weinstein wrote: This RC has gone thru partial testing due to issues we are experiencing in the sepia lab. Please try it out and report any issues you encounter. Happy testing! If I install cephadm from package, 18.1.2 on ubuntu focal in my case, cepadm usages the ceph-

[ceph-users] Re: Cluster down after network outage

2023-07-12 Thread Stefan Kooman
On 7/12/23 09:53, Frank Schilder wrote: Hi all, we had a network outage tonight (power loss) and restored network in the morning. All OSDs were running during this period. After restoring network peering hell broke loose and the cluster has a hard time coming back up again. OSDs get marked do

[ceph-users] Re: Reef release candidate - v18.1.2

2023-07-10 Thread Stefan Kooman
On 6/30/23 18:36, Yuri Weinstein wrote: This RC has gone thru partial testing due to issues we are experiencing in the sepia lab. Please try it out and report any issues you encounter. Happy testing! I tested the RC (v18.1.2) this afternoon. I tried out the new "read balancer". I hit asserts

[ceph-users] Re: 1 pg inconsistent and does not recover

2023-06-28 Thread Stefan Kooman
On 6/28/23 10:45, Frank Schilder wrote: Hi Stefan, we run Octopus. The deep-scrub request is (immediately) cancelled if the PG/OSD is already part of another (deep-)scrub or if some peering happens. As far as I understood, the commands osd/pg deep-scrub and pg repair do not create persistent

[ceph-users] Re: cephadm, new OSD

2023-06-28 Thread Stefan Kooman
On 6/28/23 11:30, Shashi Dahal wrote: Hi, I added new OSD on ceph servers. ( orch is cephadm) Its recognized as osd.12 and osd.13 ceph pg dump shows no pg are there in osd 12 and 13 .. they are all empty .. ceph osd tree shows them that they are up. ceph osd df shows them to be all 0 in rewe

[ceph-users] Re: 1 pg inconsistent and does not recover

2023-06-28 Thread Stefan Kooman
On 6/28/23 09:41, Frank Schilder wrote: Hi Niklas, please don't do any of the recovery steps yet! Your problem is almost certainly a non-issue. I had a failed disk with 3 scrub-errors, leading to the candidate read error messeges you have: ceph status/df/pool stats/health detail at 00:00:06:

[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent

2023-06-26 Thread Stefan Kooman
On 6/26/23 08:38, Jorge JP wrote: Hello, After deep-scrub my cluster shown this error: HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: 2/77158878 objects degraded (0.000%), 1 pg degraded

[ceph-users] Re: OSDs cannot join cluster anymore

2023-06-21 Thread Stefan Kooman
On 6/21/23 11:20, Malte Stroem wrote: Hello Eugen, recovery and rebalancing was finished however now all PGs show missing OSDs. Everything looks like the PGs are missing OSDs although it finished correctly. As if we shut down the servers immediately. But we removed the nodes the way it is

[ceph-users] Re: Encryption per user Howto

2023-06-07 Thread Stefan Kooman
On 6/7/23 14:22, Frank Schilder wrote: Hi Stefan, yes, ceph-volume OSDs. Requirements: Kernel version requirement and higher: 5.9 cryptsetup: 2.3.4 and higher. Preferably 2.4.x (automatic alignment of sector size based on physical disk properties). RAW device: cryptsetup luksFormat /dev/dev

[ceph-users] Re: Encryption per user Howto

2023-06-07 Thread Stefan Kooman
On 6/7/23 12:57, Frank Schilder wrote: Hi Stefan, sorry, forgot. Block device is almost certainly LVM with dmcrypt - unless you have another way of using encryption with ceph OSDs. I can compare LVM with LVM+dmcrypt(default/new) and possibly also raw /dev/sd? performance. If LVM+dmcrypt shows

[ceph-users] Re: Encryption per user Howto

2023-06-06 Thread Stefan Kooman
On 6/6/23 15:33, Frank Schilder wrote: Yes, would be interesting. I understood that it mainly helps with buffered writes, but ceph is using direct IO for writes and that's where bypassing the queues helps. Yeah, that makes sense. Are there detailed instructions somewhere how to set up a ho

[ceph-users] Re: Encryption per user Howto

2023-06-06 Thread Stefan Kooman
On 6/6/23 14:26, Frank Schilder wrote: Hi Stefan, there are still users with large HDD installations and I think this will not change anytime soon. What is the impact of encryption with the new settings for HDD? Is it as bad as their continued omission from any statement suggests? We only te

[ceph-users] Re: Encryption per user Howto

2023-06-02 Thread Stefan Kooman
On 6/2/23 16:33, Anthony D'Atri wrote: Stefan, how do you have this implemented? Earlier this year I submitted https://tracker.ceph.com/issues/58569  asking to enable just this. Lol, I have never seen that tracker otherwise I would have informed you abou

[ceph-users] Re: Encryption per user Howto

2023-06-02 Thread Stefan Kooman
On 5/26/23 23:09, Alexander E. Patrakov wrote: Hello Frank, On Fri, May 26, 2023 at 6:27 PM Frank Schilder wrote: Hi all, jumping on this thread as we have requests for which per-client fs mount encryption makes a lot of sense: What kind of security to you want to achieve with encryption

[ceph-users] Re: BlueStore fragmentation woes

2023-05-31 Thread Stefan Kooman
On 5/31/23 16:15, Igor Fedotov wrote: On 31/05/2023 15:26, Stefan Kooman wrote: On 5/29/23 15:52, Igor Fedotov wrote: Hi Stefan, given that allocation probes include every allocation (including short 4K ones) your stats look pretty high indeed. Although you omitted historic probes so it&#

[ceph-users] Re: Newer linux kernel cephfs clients is more trouble?

2023-05-29 Thread Stefan Kooman
On 5/29/23 20:25, Dan van der Ster wrote: Hi, Sorry for poking this old thread, but does this issue still persist in the 6.3 kernels? We are running a mail cluster setup with 6.3.1 kernel and it's not giving us any performance issues. We have not upgraded our shared webhosting platform to th

[ceph-users] Re: BlueStore fragmentation woes

2023-05-26 Thread Stefan Kooman
On 5/25/23 22:12, Igor Fedotov wrote: On 25/05/2023 20:36, Stefan Kooman wrote: On 5/25/23 18:17, Igor Fedotov wrote: Perhaps... I don't like the idea to use fragmentation score as a real index. IMO it's mostly like a very imprecise first turn marker to alert that something migh

[ceph-users] Re: BlueStore fragmentation woes

2023-05-25 Thread Stefan Kooman
On 5/25/23 18:17, Igor Fedotov wrote: Perhaps... I don't like the idea to use fragmentation score as a real index. IMO it's mostly like a very imprecise first turn marker to alert that something might be wrong. But not a real quantitative high-quality estimate. Chiming in on the high fragme

[ceph-users] Re: Training on ceph fs

2023-05-24 Thread Stefan Kooman
On 5/24/23 14:03, Emmanuel Jaep wrote: Hi, I inherited a ceph fs cluster. Even if I have years of experience in systems management, I fail to grasp the complete logic of it fully. From what I found on the web, the documentation is either too "high level" or too detailed. Is this a setup based

[ceph-users] Re: MDS crashes to damaged metadata

2023-05-24 Thread Stefan Kooman
On 5/22/23 20:24, Patrick Donnelly wrote: The original script is here: https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py "# Suggested recovery sequence (for single MDS cluster): # # 1) Unmount all clients." Is this a hard requirement? This might not be feasible for an M

[ceph-users] Re: Encryption per user Howto

2023-05-24 Thread Stefan Kooman
On 5/22/23 17:28, huxia...@horebdata.cn wrote: Hi, Stefan, Thanks a lot for the message. It seems that client-side encryption (or per use) is still on the way and not ready yet for today. Are there  practical methods to implement encryption for CephFS with today' technique? e.g using LUKS or

[ceph-users] Re: Encryption per user Howto

2023-05-22 Thread Stefan Kooman
On 5/21/23 15:44, Alexander E. Patrakov wrote: Hello Samuel, On Sun, May 21, 2023 at 3:48 PM huxia...@horebdata.cn wrote: Dear Ceph folks, Recently one of our clients approached us with a request on encrpytion per user, i.e. using individual encrytion key for each user and encryption files

[ceph-users] Re: Deleting a CephFS volume

2023-05-17 Thread Stefan Kooman
On 5/17/23 17:29, Conrad Hoffmann wrote: Hi all, I'm having difficulties removing a CephFS volume that I set up for testing. I've been through this with RBDs, so I do know about `mon_allow_pool_delete`. However, it doesn't help in this case. It is a cluster with 3 monitors. You can find a co

[ceph-users] Re: rbd mirror snapshot trash

2023-05-16 Thread Stefan Kooman
On 5/16/23 09:47, Eugen Block wrote: I'm still looking into these things myself but I'd appreciate anyone chiming in here. IIRC the configuration of the trash purge schedule has changed in one of the Ceph releases (not sure which one). Have they recently upgaraded to a new(er) release? Do t

  1   2   3   4   >