from:"Philippe D'Anjou"

[ceph-users] hanging/stopped recovery/rebalance in Nautilus

2019-10-01 Thread Philippe D'Anjou

Hi,I often observed now that the recovery/rebalance in Nautilus starts quite fast but gets extremely slow (2-3 objects/s) even if there are like 20 OSDs involved. Right now I am moving (reweighted to 0) 16x8TB disks, it's running since 4 days and since 12h it's kind of stuck now at cluster:

[ceph-users] Issues with data distribution on Nautilus / weird filling behavior

2019-10-01 Thread Philippe D'Anjou

Hi,this is a fresh Nautilus cluster, but there is a second old one that was upgraded from Luminous to Nautilus, both experience the same symptoms. First of all the data distribution on the OSDs is very bad. Now that could be due to low PGs although I get no recommendation to raise the PG number s

[ceph-users] mon sudden crash loop - pinned map

2019-10-04 Thread Philippe D'Anjou

Hi,our mon is acting up all of a sudden and dying in crash loop with the following: 2019-10-04 14:00:24.339583 lease_expire=0.00 has v0 lc 4549352 -3> 2019-10-04 14:00:24.335 7f6e5d461700 5 mon.km-fsn-1-dc4-m1-797678@0(leader).paxos(paxos active c 4548623..4549352) is_readable = 1 - no

Re: [ceph-users] mon sudden crash loop - pinned map

2019-10-06 Thread Philippe D'Anjou

rn it off and rebuild from the remainder, or do they all exhibit this bug? On Fri, Oct 4, 2019 at 5:44 AM Philippe D'Anjou wrote: > > Hi, > our mon is acting up all of a sudden and dying in crash loop with the > following: > > > 2019-10-04 14:00:24.339583 lease_

Re: [ceph-users] mon sudden crash loop - pinned map

2019-10-07 Thread Philippe D'Anjou

arts it reports no issues and all commands run fine. Am Montag, 7. Oktober 2019, 21:59:20 OESZ hat Gregory Farnum Folgendes geschrieben: On Sun, Oct 6, 2019 at 1:08 AM Philippe D'Anjou wrote: > > I had to use rocksdb repair tool before because the rocksdb files got >

Re: [ceph-users] mon sudden crash loop - pinned map

2019-10-09 Thread Philippe D'Anjou

ory Farnum Folgendes geschrieben: On Mon, Oct 7, 2019 at 11:11 PM Philippe D'Anjou wrote: > > Hi, > unfortunately it's single mon, because we had major outage on this cluster > and it's just being used to copy off data now. We werent able to add more > mon

Re: [ceph-users] mon sudden crash loop - pinned map

2019-10-10 Thread Philippe D'Anjou

How do I Import an osdmap in Nautilus? I saw documentation for older version but it seems one now can only export but not import anymore? Am Donnerstag, 10. Oktober 2019, 08:52:03 OESZ hat Philippe D'Anjou Folgendes geschrieben: I dont think this has anything to do with CephFS

Re: [ceph-users] mon sudden crash loop - pinned map

2019-10-10 Thread Philippe D'Anjou

Am Mittwoch, 9. Oktober 2019, 20:19:42 OESZ hat Gregory Farnum Folgendes geschrieben: On Mon, Oct 7, 2019 at 11:11 PM Philippe D'Anjou wrote: > > Hi, > unfortunately it's single mon, because we had major outage on this cluster > and it's just being used to

Re: [ceph-users] Issues with data distribution on Nautilus / weird filling behavior

2019-10-15 Thread Philippe D'Anjou

86.83 1.46 38 up 54 hdd 9.09470 1.0 9.1 TiB 5.0 TiB 5.0 TiB 136 KiB 13 GiB 4.1 TiB 54.80 0.92 24 up ... Now I again have to manually reweight to prevent bigger issues. How to fix this? Am Mittwoch, 2. Oktober 2019, 08:49:50 OESZ hat Philippe D'Anjou Folg

[ceph-users] OSD PGs are not being removed - Full OSD issues

2019-10-16 Thread Philippe D'Anjou

This is related to https://tracker.ceph.com/issues/42341 and to http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-October/037017.html After closing inspection yesterday we found that PGs are not being removed from OSDs which then leads to near full errors, explains why reweights don't wor

[ceph-users] How to reset compat weight-set changes caused by PG balancer module?

2019-10-22 Thread Philippe D'Anjou

Apparently the PG balancer crush-compat mode adds some crush bucket weights. Those cause major havoc in our cluster, our PG distribution is all over the place. Seeing things like this:... 97 hdd 9.09470 1.0 9.1 TiB 6.3 TiB 6.3 TiB 32 KiB 17 GiB 2.8 TiB 69.03 1.08 28 up 98 hdd

[ceph-users] Ceph is moving data ONLY to near-full OSDs [BUG]

2019-10-25 Thread Philippe D'Anjou

V14.2.4 So, this is not new, this happens every time there is a rebalance, now because of raising PGs. PG balancer is disabled because I thought it was the reason but apparently it's not, but it ain't helping either. Ceph is totally borged, it's only moving data on nearfull OSDs causing issues.

[ceph-users] very high ram usage by OSDs on Nautilus

2019-10-28 Thread Philippe D'Anjou

Hi, we are seeing quite a high memory usage by OSDs since Nautilus. Averaging 10GB/OSD for 10TB HDDs. But I had OOM issues on 128GB Systems because some single OSD processes used up to 32%. Here an example how they look on average: https://i.imgur.com/kXCtxMe.png Is that normal? I never seen this

Re: [ceph-users] Ceph is moving data ONLY to near-full OSDs [BUG]

2019-10-28 Thread Philippe D'Anjou

ost 90% of full. PGs are not equally distributed otherwise it'd be a PG size issue. Thanks Am Sonntag, 27. Oktober 2019, 20:33:11 OEZ hat Wido den Hollander Folgendes geschrieben: On 10/26/19 8:01 AM, Philippe D'Anjou wrote: > V14.2.4 > So, this is not new, this hap

[ceph-users] very high ram usage by OSDs on Nautilus

2019-10-29 Thread Philippe D'Anjou

Ok looking at mempool, what does it tell me? This affects multiple OSDs, got crashes almost every hour. { "mempool": { "by_pool": { "bloom_filter": { "items": 0, "bytes": 0 }, "bluestore_alloc": { "item

[ceph-users] very high ram usage by OSDs on Nautilus

2019-10-30 Thread Philippe D'Anjou

Yes you were right, somehow there was an unusual high memory target set, not sure where this came from. I set it back to normal now, that should fix it I guess. Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.

[ceph-users] changing set-require-min-compat-client will cause hiccup?

2019-10-30 Thread Philippe D'Anjou

Hi,I need to change set-require-min-compat-clientto use upmap mode for the PG balancer. Will this cause a disconnect of all clients? We're talking cephfs and RBD images for VMs. Or is it save to switch that live? thanks ___ ceph-users mailing list ceph-

[ceph-users] feature set mismatch CEPH_FEATURE_MON_GV kernel 5.0?

2019-10-30 Thread Philippe D'Anjou

Hi, we're on v14.2.4 and nothing but that. All clients and servers run kernel ubuntu 18.04 LTS 5.0.0-20. We're seeing this error: MountVolume.WaitForAttach failed for volume "pvc-45a86719-edb9-11e9-9f38-02000a030111" : fail to check rbd image status with: (exit status 110), rbd output: (2019

Re: [ceph-users] changing set-require-min-compat-client will cause hiccup?

2019-10-31 Thread Philippe D'Anjou

Hi, it is NOT safe.All clients fail to mount rbds now :( Am Mittwoch, 30. Oktober 2019, 09:33:16 OEZ hat Konstantin Shalygin Folgendes geschrieben: Hi,I need to change set-require-min-compat-clientto use upmap mode for the PG balancer. Will this cause a disconnect of all clients? W

[ceph-users] feature set mismatch CEPH_FEATURE_MON_GV kernel 5.0?

2019-10-31 Thread Philippe D'Anjou

So it seems like for some reason librados is used now instead of kernel module, and this produces the error. But we have all latest Nautilus repos installed on the clients...so why would librados throw a compatiblity issue?Client compatiblity level is set to Luminous. ___

[ceph-users] rebalance stuck backfill_toofull, OSD NOT full

2019-11-08 Thread Philippe D'Anjou

v14.2.4 Following issue: PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull pg 1.285 is active+remapped+backfill_toofull, acting [118,94,84] BUT:118 hdd 9.09470 0.8 9.1 TiB 7.4 TiB 7.4 TiB 12 KiB 19 GiB 1.7 TiB 81.53 1.16 38 up Even with adjusted

[ceph-users] Zombie OSD filesystems rise from the grave during bluestore conversion

2019-11-09 Thread Philippe D'Anjou

Zap had an issue back then and never properly worked, you have to manually dd, we always played it save and went 2-4GB in just to be sure.Should fix your issue. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/cep

[ceph-users] how to find the lazy egg - poor performance - interesting observations [klartext]

2019-11-09 Thread Philippe D'Anjou

This only happens with this one specific node?checked system logs? checked SMART on all disks?I mean technically it's expected to have slower writes when the third node is there, it's by ceph design. ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] PG Balancer Upmap mode not working

2019-12-07 Thread Philippe D'Anjou

Hi,the docs say the upmap mode is trying to achieve perfect distribution as to have equal amount of PGs/OSD.This is what I got(v14.2.4): 0 ssd 3.49219 1.0 3.5 TiB 794 GiB 753 GiB 38 GiB 3.4 GiB 2.7 TiB 22.20 0.32 82 up 1 ssd 3.49219 1.0 3.5 TiB 800 GiB 751 GiB 45 GiB 3.7

[ceph-users] PG Balancer Upmap mode not working

2019-12-07 Thread Philippe D'Anjou

@Wido Den Hollander That doesn't explain why its between 76 and 92 PGs, that's major not equal. Raising PGs to 100 is an old statement anyway, anything 60+ should be fine. Not an excuse for distribution failure in this case.I am expecting more or less equal PGs/OSD _

[ceph-users] PG Balancer Upmap mode not working

2019-12-07 Thread Philippe D'Anjou

@Wido Den Hollander First of all the docs say: " In most cases, this distribution is “perfect,” whichan equal number of PGs on each OSD (+/-1 PG, since they might notdivide evenly)."Either this is just false information or very badly stated. I increased PGs and see no difference. I pointed out

[ceph-users] PG Balancer Upmap mode not working

2019-12-07 Thread Philippe D'Anjou

I never had those issues with Luminous, never once, since Nautilus this is a constant headache.My issue is that I have OSDs that are over 85% whilst others are at 63%. My issue is that every time I do a rebalance or add new disks ceph moves PGs on near full OSDs and almost causes pool failures.

[ceph-users] PG Balancer Upmap mode not working

2019-12-08 Thread Philippe D'Anjou

@Wido Den Hollander Still think this is acceptable? 51 hdd 9.09470 1.0 9.1 TiB 6.1 TiB 6.1 TiB 72 KiB 16 GiB 3.0 TiB 67.23 0.98 68 up 52 hdd 9.09470 1.0 9.1 TiB 6.7 TiB 6.7 TiB 3.5 MiB 18 GiB 2.4 TiB 73.99 1.08 75 up 53 hdd 9.09470 1.0 9.1 TiB 8.0 TiB 7.9 T

[ceph-users] PG Balancer Upmap mode not working

2019-12-08 Thread Philippe D'Anjou

It's only getting worse after raising PGs now. Anything between: 96 hdd 9.09470 1.0 9.1 TiB 4.9 TiB 4.9 TiB 97 KiB 13 GiB 4.2 TiB 53.62 0.76 54 up and 89 hdd 9.09470 1.0 9.1 TiB 8.1 TiB 8.1 TiB 88 KiB 21 GiB 1001 GiB 89.25 1.27 87 up How is that possible? I dont

[ceph-users] PG Balancer Upmap mode not working

2019-12-10 Thread Philippe D'Anjou

My full OSD list (also here as pastebin https://paste.ubuntu.com/p/XJ4Pjm92B5/ ) ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 14 hdd 9.09470 1.0 9.1 TiB 6.9 TiB 6.8 TiB 71 KiB 18 GiB 2.2 TiB 75.34 1.04 69 up 19 hdd 9.09470

[ceph-users] Cluster in ERR status when rebalancing

2019-12-11 Thread Philippe D'Anjou

Has finally been addressed in 14.2.5, check changelog of release. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] PG Balancer Upmap mode not working

2019-12-12 Thread Philippe D'Anjou

@Wido Den Hollander Regarding the amonut of PGs, and I quote from the docs: "If you have more than 50 OSDs, we recommend approximately 50-100placement groups per OSD to balance out resource usage, datadurability and distribution." (https://docs.ceph.com/docs/master/rados/operations/placement-gro

[ceph-users] hanging/stopped recovery/rebalance in Nautilus

[ceph-users] Issues with data distribution on Nautilus / weird filling behavior

[ceph-users] mon sudden crash loop - pinned map

Re: [ceph-users] mon sudden crash loop - pinned map

Re: [ceph-users] mon sudden crash loop - pinned map

Re: [ceph-users] mon sudden crash loop - pinned map

Re: [ceph-users] mon sudden crash loop - pinned map

Re: [ceph-users] mon sudden crash loop - pinned map

Re: [ceph-users] Issues with data distribution on Nautilus / weird filling behavior

[ceph-users] OSD PGs are not being removed - Full OSD issues

[ceph-users] How to reset compat weight-set changes caused by PG balancer module?

[ceph-users] Ceph is moving data ONLY to near-full OSDs [BUG]

[ceph-users] very high ram usage by OSDs on Nautilus

Re: [ceph-users] Ceph is moving data ONLY to near-full OSDs [BUG]

[ceph-users] very high ram usage by OSDs on Nautilus

[ceph-users] very high ram usage by OSDs on Nautilus

[ceph-users] changing set-require-min-compat-client will cause hiccup?

[ceph-users] feature set mismatch CEPH_FEATURE_MON_GV kernel 5.0?

Re: [ceph-users] changing set-require-min-compat-client will cause hiccup?

[ceph-users] feature set mismatch CEPH_FEATURE_MON_GV kernel 5.0?

[ceph-users] rebalance stuck backfill_toofull, OSD NOT full

[ceph-users] Zombie OSD filesystems rise from the grave during bluestore conversion

[ceph-users] how to find the lazy egg - poor performance - interesting observations [klartext]

[ceph-users] PG Balancer Upmap mode not working

[ceph-users] PG Balancer Upmap mode not working

[ceph-users] PG Balancer Upmap mode not working

[ceph-users] PG Balancer Upmap mode not working

[ceph-users] PG Balancer Upmap mode not working

[ceph-users] PG Balancer Upmap mode not working

[ceph-users] PG Balancer Upmap mode not working

[ceph-users] Cluster in ERR status when rebalancing

[ceph-users] PG Balancer Upmap mode not working

32 matches

Site Navigation

Mail list logo

Footer information