date:20221103

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Nicola Mori

Hi Frank, I checked the first hypothesis, and I found something strange. This is the decompiled rule: rule wizard_data { id 1 type erasure step set_chooseleaf_tries 5 step set_choose_tries 100 step take default step chooseleaf indep 0 type host

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Frank Schilder

Hi Nicola, you are hit hard by the problem, having so many mappings requiring 49 or more tries. The parameter you need to tune is not set_choose_tries inside the rule, but choose_total_tries at the beginning of the crush map file. You need to decompile, modify and compile again. The start of ou

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Nicola Mori

Hi Frank, I set choose_total_tries 250 and set_choose_tries 1000: I get no bad mappings and up to 239 tries. I guess I might try this rule in production, what do you suggest? On 03/11/22 10:46, Frank Schilder wrote: Hi Nicola, you are hit hard by the problem, having so many mappings requiri

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Frank Schilder

These settings are safe to change. I think you can leave set_choose_tries at 100 though. If you are somewhat uncertain, you can pull the OSD map from your production cluster and inject the new crush map into this osd map first. Osdmaptool then allows you to compute the actual mappings of your pr

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Nicola Mori

If I use set_choose_tries 100 and choose_total_tries 250 I get a lot of bad mappings with crushtool; # crushtool -i better-totaltries--crush.map --test --show-bad-mappings --rule 1 --num-rep 8 --min-x 1 --max-x 100 --show-choose-tries bad mapping rule 1 x 319 num_rep 8 result [43,40,58,69

[ceph-users] Strange 50K slow ops incident

2022-11-03 Thread Frank Schilder

Hi all, I just had a very weird incident on our production cluster. An OSD was reporting >50K slow ops. Upon further investigation I observed exceptionally high network traffic on 3 out of the 12 hosts in this OSD's pools, one of them was the host with the slow ops OSD (ceph-09); see the image

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Frank Schilder

Ah, no. Just set it to 250 as well. I think choose_total_tries is the overall max, using set_choose_tries higher than choose_total_tries has no effect. In my case, the bad mapping was already resolved with both=51, but your case looks a bit more serious. Best regards, = Frank Sc

[ceph-users] Can't connect to MDS admin socket after updating to cephadm

2022-11-03 Thread Luis Calero Muñoz

Hello, I'm running a ceph 15.2.15 Octopus cluster, and in preparation to update it I've first transformed it to cephadm following the instructions in the website. All went well but now i'm having a problem running "ceph daemon mds.* dump_ops_in_flight" because it gives me an error: root@ceph-mds

[ceph-users] Re: How to force PG merging in one step?

2022-11-03 Thread Eugen Block

Hi Frank, Is this not checked per OSD? This would be really bad, because if it just uses the average (currently 143.3) this warning will never be triggered in critical situations. I believe you're right, I can only remember having warnings about the average pg count per OSD, not the absol

[ceph-users] Re: Can't connect to MDS admin socket after updating to cephadm

2022-11-03 Thread Eugen Block

Hi, you can use cephadm for that now [1]. To attach to a running daemon you run (run 'cephadm ls' to see all cephadm daemons): cephadm enter --name [--fsid ] There you can query the daemon as you used to: storage01:~ # cephadm ls |grep mds "name": "mds.cephfs.storage01.ozpeev", st

[ceph-users] Re: Strange 50K slow ops incident

2022-11-03 Thread Szabo, Istvan (Agoda)

Are those connected to the same switches? Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- On 2022. Nov 3., at 17:34

[ceph-users] Re: RBD and Ceph FS for private cloud

2022-11-03 Thread Eugen Block

Hi, as always the answer is "it depends". Our company uses the ceph cluster for all three protocols. We have an openstack cluster (rbd) and use cephfs for work and home directories, and radosgw for k8s backups. And we don't face any performance issues. I'd recommend to give cephfs a try,

[ceph-users] Re: Strange 50K slow ops incident

2022-11-03 Thread Frank Schilder

Hi Szabo, its a switch-local network shared with an HPC cluster with spine-leaf topology. The storage nodes sit on leafs and the leafs all connect to the same spine. Everything with duplicated hardware and LACP bonding. Best regards, = Frank Schilder AIT Risø Campus Bygning 109,

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Nicola Mori

Ok, I'd say I fixed it. I set both parameters to 250, recompiled the crush map and loaded it, and now the PG is in active+undersized+degraded+remapped+backfilling state and mapped as: # ceph pg map 3.5e osdmap e23741 pg 3.5e (3.5e) -> up [38,78,55,49,40,39,64,20] acting [38,78,55,49,40,39,64,2

[ceph-users] Re: Missing OSD in up set

2022-11-03 Thread Frank Schilder

Yes, it will. The PG never had the last copy, which needs to be build for the first time. Just wait for it to finish. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Nicola Mori Sent: 03 November 2022 13:37:30 To

[ceph-users] Ceph Virtual 2022 Begins Today!

2022-11-03 Thread Mike Perez

Hi everyone, Today is the first of our series in Ceph Virtual 2022! Our agenda will include a Ceph project update, community update, and telemetry talk by Yaarit Hatuka. Join us today at 10:00 AM EDT / 14:00 UTC Meeting link: https://bluejeans.com/908675367 Event: https://ceph.io/en/community/eve

[ceph-users] PG Ratio for EC overwrites Pool

2022-11-03 Thread mailing-lists

Dear Ceph'ers, I am wondering on how to choose the number of PGs for a RBD-EC-Pool. To be able to use RBD-Images on a EC-Pool, it needs to have an regular RBD-replicated-pool, as well as an EC-Pool with EC overwrites enabled, but how many PGs would you need for the RBD-replicated-pool. It does

[ceph-users] Re: PG Ratio for EC overwrites Pool

2022-11-03 Thread Anthony D'Atri

PG count isn’t just about storage size, it also affects performance, parallelism, and recovery. You want pgp_num for RBD metadata pool to be at the VERY least the number of OSDs it lives on, rounded up to the next power of 2. I’d probably go for at least (2x#OSD) rounded up. If you have two f

[ceph-users] Upgrade/migrate host operating system for ceph nodes (CentOS/Rocky)

2022-11-03 Thread Prof. Dr. Christian Dietrich

Hi all, we're running a ceph cluster with v15.2.17 and cephadm on various CentOS hosts. Since CentOS 8.x is EOL, we'd like to upgrade/migrate/reinstall the OS, possibly migrating to Rocky or CentOS stream: host | CentOS | Podman -|--|--- osd* | 7.9.2009 | 1.6.4 x5 osd* |

[ceph-users] Re: Upgrade/migrate host operating system for ceph nodes (CentOS/Rocky)

2022-11-03 Thread Sivy, Shawn

Chris, I recently had a proof-of-concept Ceph Quincy cluster up and running on bare metal. I used Rocky Linux 8.6 which has Podman 4.1.1. I was able to do a cephadm install of the cluster without issue and didn't run into any issue managing it while I tested Ceph. -- [

[ceph-users] State of the Cephalopod

2022-11-03 Thread Josh Durgin

As mentioned at Ceph Virtual today, here are the slides from the project update. The recording will be posted to the Ceph youtube channel later. Thanks to everyone contributing to and using Ceph, you make this all possible! Josh ___ ceph-users mailing l

[ceph-users] Re: State of the Cephalopod

2022-11-03 Thread Josh Durgin

Here's a link since the attachment didn't come through: https://github.com/jdurgin/ceph.io/raw/wip-virtual-2022-slides/src/assets/pdfs/2022.11-state-of-the-cephalopod.pdf On Thu, Nov 3, 2022 at 8:44 AM Josh Durgin wrote: > > As mentioned at Ceph Virtual today, here are the slides from the > pr

[ceph-users] Re: RBD and Ceph FS for private cloud

2022-11-03 Thread Ramana Krisna Venkatesh Raja

Hi, If performance is critical you'd want CephFS kernel clients to access your CephFS volumes/subvolumes. On the other hand, if you can't trust the clients in your cloud, then it's recommended that you set up a gateway (NFS-Ganesha server) for CephFS. NFS-Ganesha server uses libcephfs (userspace

[ceph-users] Question about quorum

2022-11-03 Thread Murilo Morais

Good afternoon everyone! I have a lab with 4 mons, I was testing the behavior in case a certain amount of hosts went offline, as soon as the second one went offline everything stopped. It would be interesting if there was a fifth node to ensure that, if two fall, everything will work, but why did

[ceph-users] Re: Question about quorum

2022-11-03 Thread Tyler Brekke

Hi Murilo, Since we need a majority to maintain a quorum when you lost 2 mons, you only had 50% available and lost quorum. This is why all recommendations specify having an odd number of mons. As you do not get any added availability with 4 instead of 3. If you had 5 mons, you can lose two without

[ceph-users] Re: Question about quorum

2022-11-03 Thread Josh Baergen

Hi Murilo, This is briefly referred to by https://docs.ceph.com/en/octopus/rados/deployment/ceph-deploy-mon/, but in order to avoid split brain issues it's common that distributed consensus algorithms require a strict majority in order to maintain quorum. This is why production deployments of mons

[ceph-users] Re: Question about quorum

2022-11-03 Thread Can Özyurt

Hello Murilo, You should always go for odd numbers. Essentially you are trying to avoid split-brain issues. Note that stopped/failed mons are basically your observations and running mons always assume that supposedly failed mons may be still running but unreachable due to a network issue. So 2 out

[ceph-users] Re: Missing OSD in up set

[ceph-users] Re: Missing OSD in up set

[ceph-users] Re: Missing OSD in up set

[ceph-users] Re: Missing OSD in up set

[ceph-users] Re: Missing OSD in up set

[ceph-users] Strange 50K slow ops incident

[ceph-users] Re: Missing OSD in up set

[ceph-users] Can't connect to MDS admin socket after updating to cephadm

[ceph-users] Re: How to force PG merging in one step?

[ceph-users] Re: Can't connect to MDS admin socket after updating to cephadm

[ceph-users] Re: Strange 50K slow ops incident

[ceph-users] Re: RBD and Ceph FS for private cloud

[ceph-users] Re: Strange 50K slow ops incident

[ceph-users] Re: Missing OSD in up set

[ceph-users] Re: Missing OSD in up set

[ceph-users] Ceph Virtual 2022 Begins Today!

[ceph-users] PG Ratio for EC overwrites Pool

[ceph-users] Re: PG Ratio for EC overwrites Pool

[ceph-users] Upgrade/migrate host operating system for ceph nodes (CentOS/Rocky)

[ceph-users] Re: Upgrade/migrate host operating system for ceph nodes (CentOS/Rocky)

[ceph-users] State of the Cephalopod

[ceph-users] Re: State of the Cephalopod

[ceph-users] Re: RBD and Ceph FS for private cloud

[ceph-users] Question about quorum

[ceph-users] Re: Question about quorum

[ceph-users] Re: Question about quorum

[ceph-users] Re: Question about quorum

27 matches

Site Navigation

Mail list logo

Footer information