[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Nicola Mori
Hi Frank, I checked the first hypothesis, and I found something strange. This is the decompiled rule: rule wizard_data { id 1 type erasure step set_chooseleaf_tries 5 step set_choose_tries 100 step take default step chooseleaf indep 0 type host

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Frank Schilder
Hi Nicola, you are hit hard by the problem, having so many mappings requiring 49 or more tries. The parameter you need to tune is not set_choose_tries inside the rule, but choose_total_tries at the beginning of the crush map file. You need to decompile, modify and compile again. The start of ou

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Nicola Mori
Hi Frank, I set choose_total_tries 250 and set_choose_tries 1000: I get no bad mappings and up to 239 tries. I guess I might try this rule in production, what do you suggest? On 03/11/22 10:46, Frank Schilder wrote: Hi Nicola, you are hit hard by the problem, having so many mappings requiri

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Frank Schilder
These settings are safe to change. I think you can leave set_choose_tries at 100 though. If you are somewhat uncertain, you can pull the OSD map from your production cluster and inject the new crush map into this osd map first. Osdmaptool then allows you to compute the actual mappings of your pr

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Nicola Mori
If I use set_choose_tries 100 and choose_total_tries 250 I get a lot of bad mappings with crushtool; # crushtool -i better-totaltries--crush.map --test --show-bad-mappings --rule 1 --num-rep 8 --min-x 1 --max-x 100 --show-choose-tries bad mapping rule 1 x 319 num_rep 8 result [43,40,58,69

[ceph-users] Strange 50K slow ops incident

2022-11-03 Thread Frank Schilder
Hi all, I just had a very weird incident on our production cluster. An OSD was reporting >50K slow ops. Upon further investigation I observed exceptionally high network traffic on 3 out of the 12 hosts in this OSD's pools, one of them was the host with the slow ops OSD (ceph-09); see the image

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Frank Schilder
Ah, no. Just set it to 250 as well. I think choose_total_tries is the overall max, using set_choose_tries higher than choose_total_tries has no effect. In my case, the bad mapping was already resolved with both=51, but your case looks a bit more serious. Best regards, = Frank Sc

[ceph-users] Can't connect to MDS admin socket after updating to cephadm

2022-11-03 Thread Luis Calero Muñoz
Hello, I'm running a ceph 15.2.15 Octopus cluster, and in preparation to update it I've first transformed it to cephadm following the instructions in the website. All went well but now i'm having a problem running "ceph daemon mds.* dump_ops_in_flight" because it gives me an error: root@ceph-mds

[ceph-users] Re: How to force PG merging in one step?

2022-11-03 Thread Eugen Block
Hi Frank, Is this not checked per OSD? This would be really bad, because if it just uses the average (currently 143.3) this warning will never be triggered in critical situations. I believe you're right, I can only remember having warnings about the average pg count per OSD, not the absol

[ceph-users] Re: Can't connect to MDS admin socket after updating to cephadm

2022-11-03 Thread Eugen Block
Hi, you can use cephadm for that now [1]. To attach to a running daemon you run (run 'cephadm ls' to see all cephadm daemons): cephadm enter --name [--fsid ] There you can query the daemon as you used to: storage01:~ # cephadm ls |grep mds "name": "mds.cephfs.storage01.ozpeev", st

[ceph-users] Re: Strange 50K slow ops incident

2022-11-03 Thread Szabo, Istvan (Agoda)
Are those connected to the same switches? Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- On 2022. Nov 3., at 17:34

[ceph-users] Re: RBD and Ceph FS for private cloud

2022-11-03 Thread Eugen Block
Hi, as always the answer is "it depends". Our company uses the ceph cluster for all three protocols. We have an openstack cluster (rbd) and use cephfs for work and home directories, and radosgw for k8s backups. And we don't face any performance issues. I'd recommend to give cephfs a try,

[ceph-users] Re: Strange 50K slow ops incident

2022-11-03 Thread Frank Schilder
Hi Szabo, its a switch-local network shared with an HPC cluster with spine-leaf topology. The storage nodes sit on leafs and the leafs all connect to the same spine. Everything with duplicated hardware and LACP bonding. Best regards, = Frank Schilder AIT Risø Campus Bygning 109,

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Nicola Mori
Ok, I'd say I fixed it. I set both parameters to 250, recompiled the crush map and loaded it, and now the PG is in active+undersized+degraded+remapped+backfilling state and mapped as: # ceph pg map 3.5e osdmap e23741 pg 3.5e (3.5e) -> up [38,78,55,49,40,39,64,20] acting [38,78,55,49,40,39,64,2

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Frank Schilder
Yes, it will. The PG never had the last copy, which needs to be build for the first time. Just wait for it to finish. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nicola Mori Sent: 03 November 2022 13:37:30 To

[ceph-users] Ceph Virtual 2022 Begins Today!

2022-11-03 Thread Mike Perez
Hi everyone, Today is the first of our series in Ceph Virtual 2022! Our agenda will include a Ceph project update, community update, and telemetry talk by Yaarit Hatuka. Join us today at 10:00 AM EDT / 14:00 UTC Meeting link: https://bluejeans.com/908675367 Event: https://ceph.io/en/community/eve

[ceph-users] PG Ratio for EC overwrites Pool

2022-11-03 Thread mailing-lists
Dear Ceph'ers, I am wondering on how to choose the number of PGs for a RBD-EC-Pool. To be able to use RBD-Images on a EC-Pool, it needs to have an regular RBD-replicated-pool, as well as an EC-Pool with EC overwrites enabled, but how many PGs would you need for the RBD-replicated-pool. It does

[ceph-users] Re: PG Ratio for EC overwrites Pool

2022-11-03 Thread Anthony D'Atri
PG count isn’t just about storage size, it also affects performance, parallelism, and recovery. You want pgp_num for RBD metadata pool to be at the VERY least the number of OSDs it lives on, rounded up to the next power of 2. I’d probably go for at least (2x#OSD) rounded up. If you have two f

[ceph-users] Upgrade/migrate host operating system for ceph nodes (CentOS/Rocky)

2022-11-03 Thread Prof. Dr. Christian Dietrich
Hi all, we're running a ceph cluster with v15.2.17 and cephadm on various CentOS hosts. Since CentOS 8.x is EOL, we'd like to upgrade/migrate/reinstall the OS, possibly migrating to Rocky or CentOS stream: host | CentOS | Podman -|--|--- osd* | 7.9.2009 | 1.6.4 x5 osd* |

[ceph-users] Re: Upgrade/migrate host operating system for ceph nodes (CentOS/Rocky)

2022-11-03 Thread Sivy, Shawn
Chris, I recently had a proof-of-concept Ceph Quincy cluster up and running on bare metal. I used Rocky Linux 8.6 which has Podman 4.1.1. I was able to do a cephadm install of the cluster without issue and didn't run into any issue managing it while I tested Ceph. -- [

[ceph-users] State of the Cephalopod

2022-11-03 Thread Josh Durgin
As mentioned at Ceph Virtual today, here are the slides from the project update. The recording will be posted to the Ceph youtube channel later. Thanks to everyone contributing to and using Ceph, you make this all possible! Josh ___ ceph-users mailing l

[ceph-users] Re: State of the Cephalopod

2022-11-03 Thread Josh Durgin
Here's a link since the attachment didn't come through: https://github.com/jdurgin/ceph.io/raw/wip-virtual-2022-slides/src/assets/pdfs/2022.11-state-of-the-cephalopod.pdf On Thu, Nov 3, 2022 at 8:44 AM Josh Durgin wrote: > > As mentioned at Ceph Virtual today, here are the slides from the > pr

[ceph-users] Re: RBD and Ceph FS for private cloud

2022-11-03 Thread Ramana Krisna Venkatesh Raja
Hi, If performance is critical you'd want CephFS kernel clients to access your CephFS volumes/subvolumes. On the other hand, if you can't trust the clients in your cloud, then it's recommended that you set up a gateway (NFS-Ganesha server) for CephFS. NFS-Ganesha server uses libcephfs (userspace

[ceph-users] Question about quorum

2022-11-03 Thread Murilo Morais
Good afternoon everyone! I have a lab with 4 mons, I was testing the behavior in case a certain amount of hosts went offline, as soon as the second one went offline everything stopped. It would be interesting if there was a fifth node to ensure that, if two fall, everything will work, but why did

[ceph-users] Re: Question about quorum

2022-11-03 Thread Tyler Brekke
Hi Murilo, Since we need a majority to maintain a quorum when you lost 2 mons, you only had 50% available and lost quorum. This is why all recommendations specify having an odd number of mons. As you do not get any added availability with 4 instead of 3. If you had 5 mons, you can lose two without

[ceph-users] Re: Question about quorum

2022-11-03 Thread Josh Baergen
Hi Murilo, This is briefly referred to by https://docs.ceph.com/en/octopus/rados/deployment/ceph-deploy-mon/, but in order to avoid split brain issues it's common that distributed consensus algorithms require a strict majority in order to maintain quorum. This is why production deployments of mons

[ceph-users] Re: Question about quorum

2022-11-03 Thread Can Özyurt
Hello Murilo, You should always go for odd numbers. Essentially you are trying to avoid split-brain issues. Note that stopped/failed mons are basically your observations and running mons always assume that supposedly failed mons may be still running but unreachable due to a network issue. So 2 out