date:20210922

[ceph-users] Re: Modify pgp number after pg_num increased

2021-09-22 Thread Eugen Block

Hi, IIRC in a different thread you pasted your max-backfill config and it was the lowest possible value (1), right? That's why your backfill is slow. Zitat von "Szabo, Istvan (Agoda)" : Hi, By default in the newer versions of ceph when you increase the pg_num the cluster will start to

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

2021-09-22 Thread Eugen Block

Thanks for the summary, Dan! I'm still hesitating upgrading our production environment from N to O, your experience sounds reassuring though. I have one question, did you also switch to cephadm and containerize all daemons? We haven't made a decision yet, but I guess at some point we'll hav

[ceph-users] Balancer vs. Autoscaler

2021-09-22 Thread Jan-Philipp Litza

Hi everyone, I had the autoscale_mode set to "on" and the autoscaler went to work and started adjusting the number of PGs in that pool. Since this implies a huge shift in data, the reweights that the balancer had carefully adjusted (in crush-compat mode) are now rubbish, and more and more OSDs bec

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

2021-09-22 Thread Dan van der Ster

Hi Eugen, All of our prod clusters are still old school rpm packages managed by our private puppet manifests. Even our newest pacific pre-prod cluster is still managed like that. We have a side project to test and move to cephadm / containers but that is still a WIP. (Our situation is complicated

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

2021-09-22 Thread Dan van der Ster

Hi Andras, I'm not aware of any showstoppers to move directly to pacific. Indeed we already run pacific on a new cluster we built for our users to try cephfs snapshots at scale. That cluster was created with octopus a few months ago then upgraded to pacific at 16.2.4 to take advantage of the stray

[ceph-users] Why set osd flag to noout during upgrade ?

2021-09-22 Thread Francois Legrand

Hello everybody, I have a "stupid" question. Why is it recommended in the docs to set the osd flag to noout during an upgrade/maintainance (and especially during an osd upgrade/maintainance) ? In my understanding, if an osd goes down, after a while (600s by default) it's marked out and the c

[ceph-users] High overwrite latency

2021-09-22 Thread Erwin Ceph

Hi, We do run several Ceph clusters, but one has a strange problem. It is running Octopus 15.2.14 on 9 (HP 360 Gen 8, 64 GB, 10 Gbps) servers, 48 OSDs (all 2 TB Samsung SSDs with Bluestore). Monitoring in Grafana shows these three latency values over 7 days: ceph_osd_op_r_latency_sum: avg 1.1

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

2021-09-22 Thread Eugen Block

I understand, thanks for sharing! Zitat von Dan van der Ster : Hi Eugen, All of our prod clusters are still old school rpm packages managed by our private puppet manifests. Even our newest pacific pre-prod cluster is still managed like that. We have a side project to test and move to cephadm

[ceph-users] Re: Why set osd flag to noout during upgrade ?

2021-09-22 Thread Etienne Menguy

Hello, From my experience, I see three reasons : - You don’t want to recover data if you already have them on a down OSD, rebalancing can have a big impact on performance - If upgrade/maintenance goes wrong you will want to focus on this issue and not have to deal with things done by Ceph meanw

[ceph-users] Re: Why set osd flag to noout during upgrade ?

2021-09-22 Thread Dan van der Ster

Yeah you don't want to deal with backfilling while the cluster is upgrading. At best it can delay the upgrade, at worst mixed version backfilling has (rarely) caused issues in the past. We additionally `set noin` and disable the balancer: `ceph balancer off`. The former prevents broken osds from r

[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2021-09-22 Thread Kai Stian Olstad

On 21.09.2021 09:11, Kobi Ginon wrote: for sure the balancer affects the status Of course, but setting several PG to degraded is something else. i doubt that your customers will be writing so many objects in the same rate of the Test. I only need 2 host running rados bench to get several P

[ceph-users] Re: Balancer vs. Autoscaler

2021-09-22 Thread Dan van der Ster

To get an idea how much work is left, take a look at `ceph osd pool ls detail`. There should be pg_num_target... The osds will merge or split PGs until pg_num matches that value. .. Dan On Wed, 22 Sep 2021, 11:04 Jan-Philipp Litza, wrote: > Hi everyone, > > I had the autoscale_mode set to "on"

[ceph-users] Modify pgp number after pg_num increased

2021-09-22 Thread Szabo, Istvan (Agoda)

Hi, By default in the newer versions of ceph when you increase the pg_num the cluster will start to increase the pgp_num slowly up to the value of the pg_num. I've increased the ec-code data pool from 32 to 128 but 1 node has been added to the cluster and it's very slow. pool 28 'hkg.rgw.bucket

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

2021-09-22 Thread Andras Pataki

Hi Dan, This is excellent to hear - we've also been a bit hesitant to upgrade from Nautilus (which has been working so well for us). One question: did you/would you consider upgrading straight to Pacific from Nautilus? Can you share your thoughts that lead you to Octopus first? Thanks, An

[ceph-users] Change max backfills

2021-09-22 Thread Pascal Weißhaupt

Hi, I recently upgraded from Ceph 15 to Ceph 16 and when I want to change the max backfills via ceph tell 'osd.*' injectargs '--osd-max-backfills 1' I get no output: root@pve01:~# ceph tell 'osd.*' injectargs '--osd-max-backfills 1' osd.0: {} osd.1: {} osd.2: {} osd.3: {} osd.4: {} os

[ceph-users] Re: Change max backfills

2021-09-22 Thread Etienne Menguy

Hi, In the past you had this output if value was not changing, try with another value. I don’t know if things changed with latest Ceph version. - Etienne Menguy etienne.men...@croit.io > On 22 Sep 2021, at 15:34, Pascal Weißhaupt > wrote: > > Hi, > > > > I recently upgraded from Ceph 1

[ceph-users] Re: Change max backfills

2021-09-22 Thread Pascal Weißhaupt

God damn...you are absolutely right - my bad. Sorry and thanks for that... -Ursprüngliche Nachricht- Von: Etienne Menguy Gesendet: Mittwoch 22. September 2021 15:48 An: ceph-users@ceph.io Betreff: [ceph-users] Re: Change max backfills Hi, In the past you had this output if value

[ceph-users] Re: Modify pgp number after pg_num increased

2021-09-22 Thread Szabo, Istvan (Agoda)

That's been already increased to 4. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- -Original Message- From: Eugen Block Sent: Wednesday

[ceph-users] IO500 SC’21 Call for Submission

2021-09-22 Thread IO500 Committee

Stabilization period: Friday, 17th September - Friday, 1st October Submission deadline: Monday, 1st November 2021 AoE The IO500 [1] is now accepting and encouraging submissions for the upcoming 9th semi-annual IO500 list, in conjunction with SC'21. Once again, we are also accepting submissions

[ceph-users] "Remaining time" under-estimates by 100x....

2021-09-22 Thread Harry G. Coin

Is there a way to re-calibrate the various 'global recovery event' and related 'remaining time' estimators? For the last three days I've been assured that a 19h event will be over in under 3 hours... Previously I think Microsoft held the record for the most incorrect 'please wait' progress i

[ceph-users] Remoto 1.1.4 in Ceph 16.2.6 containers

2021-09-22 Thread David Orman

We'd worked on pushing a change to fix https://tracker.ceph.com/issues/50526 for a deadlock in remoto here: https://github.com/alfredodeza/remoto/pull/63 A new version, 1.2.1, was built to help with this. With the Ceph release 16.2.6 (at least), we see 1.1.4 is again part of the containers. Lookin

[ceph-users] Re: Remoto 1.1.4 in Ceph 16.2.6 containers

2021-09-22 Thread David Orman

I'm wondering if this was installed using pip/pypi before, and now switched to using EPEL? That would explain it - 1.2.1 may never have been pushed to EPEL. David On Wed, Sep 22, 2021 at 11:26 AM David Orman wrote: > > We'd worked on pushing a change to fix > https://tracker.ceph.com/issues/5052

[ceph-users] One PG keeps going inconsistent (stat mismatch)

2021-09-22 Thread Simon Ironside

Hi All, I have a recurring single PG that keeps going inconsistent. A scrub is enough to pick up the problem. The primary OSD log shows something like: 2021-09-22 18:08:18.502 7f5bdcb11700 0 log_channel(cluster) log [DBG] : 1.3ff scrub starts 2021-09-22 18:08:18.880 7f5bdcb11700 -1 log_chann

[ceph-users] Re: Why set osd flag to noout during upgrade ?

2021-09-22 Thread Anthony D'Atri

Indeed. In a large enough cluster, even a few minutes of extra backfill/recovery per OSD adds up. Say you have 100 OSD nodes, and just 3 minutes of unnecessary backfill per. That prolongs your upgrade by 5 hours. > Yeah you don't want to deal with backfilling while the cluster is > upgradi

[ceph-users] Re: Why set osd flag to noout during upgrade ?

2021-09-22 Thread Frank Schilder

In addition, from my experience: I often set noout, norebalance and nobackfill before doing maintenance. This greatly speeds up peering (when adding new OSDs) and reduces unnecessary load from all daemons. In particular, if there is heavy client IO going on at the same time, the ceph daemons ar

[ceph-users] Re: Remoto 1.1.4 in Ceph 16.2.6 containers

2021-09-22 Thread David Orman

https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-4b2736a28c ^^ if people want to test and provide feedback for a potential merge to EPEL8 stable. David On Wed, Sep 22, 2021 at 11:43 AM David Orman wrote: > > I'm wondering if this was installed using pip/pypi before, and now > switched t

[ceph-users] Re: Balancer vs. Autoscaler

2021-09-22 Thread Richard Bade

If you look at the current pg_num in that pool ls detail command that Dan mentioned you can set the pool pg_num to what that value currently is, which will effectively pause the pg changes. I did this recently when decreasing the number of pg's in a pool, which took several weeks to complete. This

[ceph-users] Re: Modify pgp number after pg_num increased

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

[ceph-users] Balancer vs. Autoscaler

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

[ceph-users] Why set osd flag to noout during upgrade ?

[ceph-users] High overwrite latency

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

[ceph-users] Re: Why set osd flag to noout during upgrade ?

[ceph-users] Re: Why set osd flag to noout during upgrade ?

[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

[ceph-users] Re: Balancer vs. Autoscaler

[ceph-users] Modify pgp number after pg_num increased

[ceph-users] Re: Successful Upgrade from 14.2.22 to 15.2.14

[ceph-users] Change max backfills

[ceph-users] Re: Change max backfills

[ceph-users] Re: Change max backfills

[ceph-users] Re: Modify pgp number after pg_num increased

[ceph-users] IO500 SC’21 Call for Submission

[ceph-users] "Remaining time" under-estimates by 100x....

[ceph-users] Remoto 1.1.4 in Ceph 16.2.6 containers

[ceph-users] Re: Remoto 1.1.4 in Ceph 16.2.6 containers

[ceph-users] One PG keeps going inconsistent (stat mismatch)

[ceph-users] Re: Why set osd flag to noout during upgrade ?

[ceph-users] Re: Why set osd flag to noout during upgrade ?

[ceph-users] Re: Remoto 1.1.4 in Ceph 16.2.6 containers

[ceph-users] Re: Balancer vs. Autoscaler

27 matches

Site Navigation

Mail list logo

Footer information