[ceph-users] Re: Upmap balancer after node failure
Hi Andras. Assuming that you've already tightened the mgr/balancer/upmap_max_deviation to 1, I suspect that this cluster already has too many upmaps. Last time I checked, the balancer implementation is not able to improve a pg-upmap-items entry if one already exists for a PG. (It can add an OSD mapping pair to an PG, but not change an existing pair from one osd to another). So I think that what happens in this case is the balancer gets stuck in a sort of local minimum in the overall optimization. It can therefore help to simply remove some upmaps, and then wait for the balancer to do a better job when it re-creates new entries for those PGs. And there's usually some low hanging fruit -- you can start by removing pg-upmap-items which are mapping PGs away from the least full OSDs. (Those upmap entries are making the least full OSDs even *less* full.) We have a script for that: https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/rm-upmaps-underfull.py It's a pretty hacky and I don't use it often, so please use it with caution -- you can run it and review which upmaps it would remove. Hope this helps, Dan On Fri, Apr 2, 2021 at 10:18 AM Andras Pataki wrote: > > Dear ceph users, > > On one of our clusters I have some difficulties with the upmap > balancer. We started with a reasonably well balanced cluster (using the > balancer in upmap mode). After a node failure, we crush reweighted all > the OSDs of the node to take it out of the cluster - and waited for the > cluster to rebalance. Obviously, this significantly changes the crush > map - hence the nice balance created by the balancer was gone. The > recovery mostly completed - but some of the OSDs became too full - so we > neded up with a few PGs that were backfill_toofull. The cluster has > plenty of space (overall perhaps 65% full), only a few OSDs are >90% (we > have backfillfull_ratio at 92%). The balancer refuses to change > anything since the cluster is not clean. Yet - the cluster can't become > clean without a few upmaps to help the top 3 or 4 most full OSDs. > > I would think this is a fairly common situation - trying to recover > after some failure. Are there any recommendations on how to proceed? > Obviously I can manually find and insert upmaps - but for a large > cluster with tens of thousands of PGs, that isn't too practical. Is > there a way to tell the balancer to still do something even though some > PGs are undersized (with a quick look at the python module - I didn't > see any)? > > The cluster is on Nautilus 14.2.15. > > Thanks, > > Andras > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] ceph orch update fails - got new digests
Hello Ceph user list! I tried to update Ceph 15.2.10 to 16.2.0 via ceph orch. In the beginning everything seems to work fine and the new MGR and MONs where deployed. But now I enden up in a pulling loop and I am unable to fix the issue by my self. #ceph -W cephadm --watch-debu 2021-04-02T10:36:20.704960+0200 mgr.mon-a-02.tvcrfq [INF] Upgrade: Need to upgrade myself (mgr.mon-a-02.tvcrfq) 2021-04-02T10:36:21.837596+0200 mgr.mon-a-02.tvcrfq [INF] Upgrade: Pulling ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6c43120b5e5e0 0951949000 on mon-a-01 2021-04-02T10:36:24.591487+0200 mgr.mon-a-02.tvcrfq [INF] Upgrade: image ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6c43120b5e5e0 0951949000 pull on mon-a-01 got new digests ['docker.io/ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6 c43120b5e5e00951949000', 'docker.io/ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42f ecb950c3407687cb4f29a'] (not ['ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6c43120b5e5 e00951949000']), restarting 5704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a']), restarting 2021-04-02T10:36:37.054786+0200 mgr.mon-a-02.tvcrfq [INF] Upgrade: Need to upgrade myself (mgr.mon-a-02.tvcrfq) 2021-04-02T10:36:38.419014+0200 mgr.mon-a-02.tvcrfq [INF] Upgrade: Pulling ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6c43120b5e5e0 0951949000 on mon-a-01 2021-04-02T10:36:41.172835+0200 mgr.mon-a-02.tvcrfq [INF] Upgrade: image ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6c43120b5e5e0 0951949000 pull on mon-a-01 got new digests ['docker.io/ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6 c43120b5e5e00951949000', 'docker.io/ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42f ecb950c3407687cb4f29a'] (not ['ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6c43120b5e5 e00951949000']), restarting After I stoppend the update I got the following health error: Module 'cephadm' has failed: 'NoneType' object has no attribute 'target_digests' Thanks in advance! Best, Alex ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Upmap balancer after node failure
Hi again, Oops, I'd missed the part about some PGs being degraded, which prevents the balancer from continuing. So I assume that you have PGs which are simultaneously undersized+backfill_toofull? That case does indeed sound tricky. To solve that you would either need to move PGs out of the toofull OSD, to make room for the undersized PGs; or, upmap those undersized PGs to some other less-full OSDs. For the former, you could either use the rm-upmaps-underfull script and hope that it incidentally moves data out of those toofull OSDs. Or a similar script with some variables reversed could be used to remove any upmaps which are directing PGs *to* those toofull OSDs. Or maybe it will be enough to just reweight those OSDs to 0.9. -- Dan On Fri, Apr 2, 2021 at 10:47 AM Dan van der Ster wrote: > > Hi Andras. > > Assuming that you've already tightened the > mgr/balancer/upmap_max_deviation to 1, I suspect that this cluster > already has too many upmaps. > > Last time I checked, the balancer implementation is not able to > improve a pg-upmap-items entry if one already exists for a PG. (It can > add an OSD mapping pair to an PG, but not change an existing pair from > one osd to another). > So I think that what happens in this case is the balancer gets stuck > in a sort of local minimum in the overall optimization. > > It can therefore help to simply remove some upmaps, and then wait for > the balancer to do a better job when it re-creates new entries for > those PGs. > And there's usually some low hanging fruit -- you can start by > removing pg-upmap-items which are mapping PGs away from the least full > OSDs. (Those upmap entries are making the least full OSDs even *less* > full.) > > We have a script for that: > https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/rm-upmaps-underfull.py > It's a pretty hacky and I don't use it often, so please use it with > caution -- you can run it and review which upmaps it would remove. > > Hope this helps, > > Dan > > > > On Fri, Apr 2, 2021 at 10:18 AM Andras Pataki > wrote: > > > > Dear ceph users, > > > > On one of our clusters I have some difficulties with the upmap > > balancer. We started with a reasonably well balanced cluster (using the > > balancer in upmap mode). After a node failure, we crush reweighted all > > the OSDs of the node to take it out of the cluster - and waited for the > > cluster to rebalance. Obviously, this significantly changes the crush > > map - hence the nice balance created by the balancer was gone. The > > recovery mostly completed - but some of the OSDs became too full - so we > > neded up with a few PGs that were backfill_toofull. The cluster has > > plenty of space (overall perhaps 65% full), only a few OSDs are >90% (we > > have backfillfull_ratio at 92%). The balancer refuses to change > > anything since the cluster is not clean. Yet - the cluster can't become > > clean without a few upmaps to help the top 3 or 4 most full OSDs. > > > > I would think this is a fairly common situation - trying to recover > > after some failure. Are there any recommendations on how to proceed? > > Obviously I can manually find and insert upmaps - but for a large > > cluster with tens of thousands of PGs, that isn't too practical. Is > > there a way to tell the balancer to still do something even though some > > PGs are undersized (with a quick look at the python module - I didn't > > see any)? > > > > The cluster is on Nautilus 14.2.15. > > > > Thanks, > > > > Andras > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] cephfs-top: "cluster ceph does not exist"
Hi, just installed pacific on our test-cluster. This really is a minimal, but fully functional cluster. Everything works as expected, except for the new (and by me anticipated) cephfs-top. When I run that tool, it says: "cluster ceph does not exist" If I point it to the correct config file: # cephfs-top --conffile /etc/ceph/ceph.conf I still get the same error. Doesn't matter if I run this as the ceph user or as root. I added the "client.fstop"-user as required by the documentation. module "stats" is enabled and functioning (tested with "ceph fs perf stats"). Anyone any suggestions what might be wrong? Regards, Erwin ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs-top: "cluster ceph does not exist"
On Fri, Apr 2, 2021 at 2:59 PM Erwin Bogaard wrote: > > Hi, > > just installed pacific on our test-cluster. This really is a minimal, but > fully functional cluster. > Everything works as expected, except for the new (and by me anticipated) > cephfs-top. > When I run that tool, it says: "cluster ceph does not exist" > > If I point it to the correct config file: > # cephfs-top --conffile /etc/ceph/ceph.conf Does running "cephfs-top" work for you? (since it is the default cluster name) > > I still get the same error. > Doesn't matter if I run this as the ceph user or as root. > > I added the "client.fstop"-user as required by the documentation. > module "stats" is enabled and functioning (tested with "ceph fs perf > stats"). > > Anyone any suggestions what might be wrong? > > Regards, > Erwin > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Upmap balancer after node failure
Den fre 2 apr. 2021 kl 11:23 skrev Dan van der Ster : > > Hi again, > > Oops, I'd missed the part about some PGs being degraded, which > prevents the balancer from continuing. > any upmaps which are directing PGs *to* those toofull OSDs. Or maybe > it will be enough to just reweight those OSDs to 0.9. I was also thinking this, in that case, just lower OSD weight on the toofull OSDs like us old pre-upmap admins do. ;) When all the dust has settled, move weight up again. -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Upmap balancer after node failure
Dear ceph users, On one of our clusters I have some difficulties with the upmap balancer. We started with a reasonably well balanced cluster (using the balancer in upmap mode). After a node failure, we crush reweighted all the OSDs of the node to take it out of the cluster - and waited for the cluster to rebalance. Obviously, this significantly changes the crush map - hence the nice balance created by the balancer was gone. The recovery mostly completed - but some of the OSDs became too full - so we neded up with a few PGs that were backfill_toofull. The cluster has plenty of space (overall perhaps 65% full), only a few OSDs are >90% (we have backfillfull_ratio at 92%). The balancer refuses to change anything since the cluster is not clean. Yet - the cluster can't become clean without a few upmaps to help the top 3 or 4 most full OSDs. I would think this is a fairly common situation - trying to recover after some failure. Are there any recommendations on how to proceed? Obviously I can manually find and insert upmaps - but for a large cluster with tens of thousands of PGs, that isn't too practical. Is there a way to tell the balancer to still do something even though some PGs are undersized (with a quick look at the python module - I didn't see any)? The cluster is on Nautilus 14.2.15. Thanks, Andras ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph orch update fails - got new digests
Hi Alex, Thanks for the report! I've opened https://tracker.ceph.com/issues/50114. It looks like the target_digests check needs to check for overlap instead of equality. sage On Fri, Apr 2, 2021 at 4:04 AM Alexander Sporleder wrote: > > Hello Ceph user list! > > I tried to update Ceph 15.2.10 to 16.2.0 via ceph orch. In the > beginning everything seems to work fine and the new MGR and MONs where > deployed. But now I enden up in a pulling loop and I am unable to fix > the issue by my self. > > #ceph -W cephadm --watch-debu > > 2021-04-02T10:36:20.704960+0200 mgr.mon-a-02.tvcrfq [INF] Upgrade: Need > to upgrade myself (mgr.mon-a-02.tvcrfq) > 2021-04-02T10:36:21.837596+0200 mgr.mon-a-02.tvcrfq [INF] Upgrade: > Pulling > ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6c43120b5e5e0 > 0951949000 on mon-a-01 > 2021-04-02T10:36:24.591487+0200 mgr.mon-a-02.tvcrfq [INF] Upgrade: > image > ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6c43120b5e5e0 > 0951949000 pull on mon-a-01 got new digests > ['docker.io/ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6 > c43120b5e5e00951949000', > 'docker.io/ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42f > ecb950c3407687cb4f29a'] (not > ['ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6c43120b5e5 > e00951949000']), restarting > 5704c49591640a37c7adfd40ffad0a4b42fecb950c3407687cb4f29a']), restarting > 2021-04-02T10:36:37.054786+0200 mgr.mon-a-02.tvcrfq [INF] Upgrade: Need > to upgrade myself (mgr.mon-a-02.tvcrfq) > 2021-04-02T10:36:38.419014+0200 mgr.mon-a-02.tvcrfq [INF] Upgrade: > Pulling > ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6c43120b5e5e0 > 0951949000 on mon-a-01 > 2021-04-02T10:36:41.172835+0200 mgr.mon-a-02.tvcrfq [INF] Upgrade: > image > ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6c43120b5e5e0 > 0951949000 pull on mon-a-01 got new digests > ['docker.io/ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6 > c43120b5e5e00951949000', > 'docker.io/ceph/ceph@sha256:9b04c0f15704c49591640a37c7adfd40ffad0a4b42f > ecb950c3407687cb4f29a'] (not > ['ceph/ceph@sha256:35b2786dc4cd535dd84f6a1a585503db4b43623ba6c43120b5e5 > e00951949000']), restarting > > > After I stoppend the update I got the following health error: > > Module 'cephadm' has failed: 'NoneType' object has no attribute > 'target_digests' > > Thanks in advance! > > Best, > Alex > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph orch update fails - got new digests
I'm a bit confused by the log messages--I'm not sure why the target_digests aren't changing. Can you post the whole ceph-mgr.mon-a-02.tvcrfq.log? (ceph-post-file /var/log/ceph/*/ceph-mgr.mon-a-02.tvcrfq.log) Thanks! s ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Upmap balancer after node failure
Lowering the weight is what I ended up doing. But this isn't ideal since afterwards the balancer will remove too many PGs from the OSD since now it has a lower weight. So I'll have to put the weight back once the cluster recovers and the balancer goes back to its business. But in any case - this is conceptually challenging - that the upmap balancer won't help for the case one would perhaps need it the most - when recovering from some kind of a disaster. So one can be under the illusion that everything is fine - OSDs are all balanced, cluster is running smooth. Then some non-trivial failure happens, and we are back to a pre upmap situation - the balance is completely thrown off. Also for larger clusters the pre-upmap imbalance is worse - and is also harder to fix (due to the large number of OSDs). I've done some analysis of what the expected imbalance is given various factors of the cluster - but that's a longer story ... Thanks for the input - I was really wondering if I was missing something with upmap ... Andras On 4/2/21 8:12 AM, Janne Johansson wrote: Den fre 2 apr. 2021 kl 11:23 skrev Dan van der Ster : Hi again, Oops, I'd missed the part about some PGs being degraded, which prevents the balancer from continuing. any upmaps which are directing PGs *to* those toofull OSDs. Or maybe it will be enough to just reweight those OSDs to 0.9. I was also thinking this, in that case, just lower OSD weight on the toofull OSDs like us old pre-upmap admins do. ;) When all the dust has settled, move weight up again. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph orch update fails - got new digests
On Fri, Apr 2, 2021 at 12:08 PM Alexander Sporleder wrote: > > Hello Sage, thank you for your response! > > I had some problems updating 15.2.8 -> 15.2.9 but after updating Podman > to 3.0.1 and Ceph to 15.2.10 everything was fine again. > > Then I started the update 15.2.10 -> 16.2.0 and in the beging > everything worked well. But at some point the update got stucked and > something broke the dashboard (port is in use). I stopped the update > but it was not possible to start the update process again without the > loop. > > Now my mons, the mgr, a few OSDs and cephadm are V16.2.0. > > Unfortunately the mgr is not logging to a file after I converted the > cluster to cephadm. ceph config set global log_to_file true You might also need to chown -R 167.167 /var/log/ceph/*/. if you're on debian/ubuntu (the packages installed on the host may have fiddled with the ownership; the uid for fedora/rhel/centos is different than the one for debian/ubuntu). s ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io