Hi,I often observed now that the recovery/rebalance in Nautilus starts quite
fast but gets extremely slow (2-3 objects/s) even if there are like 20 OSDs
involved. Right now I am moving (reweighted to 0) 16x8TB disks, it's running
since 4 days and since 12h it's kind of stuck now at
cluster:
Hi,this is a fresh Nautilus cluster, but there is a second old one that was
upgraded from Luminous to Nautilus, both experience the same symptoms.
First of all the data distribution on the OSDs is very bad. Now that could be
due to low PGs although I get no recommendation to raise the PG number s
Hi,our mon is acting up all of a sudden and dying in crash loop with the
following:
2019-10-04 14:00:24.339583 lease_expire=0.00 has v0 lc 4549352
-3> 2019-10-04 14:00:24.335 7f6e5d461700 5
mon.km-fsn-1-dc4-m1-797678@0(leader).paxos(paxos active c 4548623..4549352)
is_readable = 1 - no
rn it off and rebuild from the remainder, or
do they all exhibit this bug?
On Fri, Oct 4, 2019 at 5:44 AM Philippe D'Anjou
wrote:
>
> Hi,
> our mon is acting up all of a sudden and dying in crash loop with the
> following:
>
>
> 2019-10-04 14:00:24.339583 lease_
arts it reports no issues and all commands run fine.
Am Montag, 7. Oktober 2019, 21:59:20 OESZ hat Gregory Farnum
Folgendes geschrieben:
On Sun, Oct 6, 2019 at 1:08 AM Philippe D'Anjou
wrote:
>
> I had to use rocksdb repair tool before because the rocksdb files got
>
ory Farnum
Folgendes geschrieben:
On Mon, Oct 7, 2019 at 11:11 PM Philippe D'Anjou
wrote:
>
> Hi,
> unfortunately it's single mon, because we had major outage on this cluster
> and it's just being used to copy off data now. We werent able to add more
> mon
How do I Import an osdmap in Nautilus? I saw documentation for older version
but it seems one now can only export but not import anymore?
Am Donnerstag, 10. Oktober 2019, 08:52:03 OESZ hat Philippe D'Anjou
Folgendes geschrieben:
I dont think this has anything to do with CephFS
Am Mittwoch, 9. Oktober 2019, 20:19:42 OESZ hat Gregory Farnum
Folgendes geschrieben:
On Mon, Oct 7, 2019 at 11:11 PM Philippe D'Anjou
wrote:
>
> Hi,
> unfortunately it's single mon, because we had major outage on this cluster
> and it's just being used to
86.83 1.46 38 up
54 hdd 9.09470 1.0 9.1 TiB 5.0 TiB 5.0 TiB 136 KiB 13 GiB 4.1 TiB
54.80 0.92 24 up
...
Now I again have to manually reweight to prevent bigger issues.
How to fix this?
Am Mittwoch, 2. Oktober 2019, 08:49:50 OESZ hat Philippe D'Anjou
Folg
This is related to https://tracker.ceph.com/issues/42341 and to
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-October/037017.html
After closing inspection yesterday we found that PGs are not being removed from
OSDs which then leads to near full errors, explains why reweights don't wor
Apparently the PG balancer crush-compat mode adds some crush bucket weights.
Those cause major havoc in our cluster, our PG distribution is all over the
place.
Seeing things like this:...
97 hdd 9.09470 1.0 9.1 TiB 6.3 TiB 6.3 TiB 32 KiB 17 GiB 2.8 TiB
69.03 1.08 28 up
98 hdd
V14.2.4
So, this is not new, this happens every time there is a rebalance, now because
of raising PGs. PG balancer is disabled because I thought it was the reason but
apparently it's not, but it ain't helping either.
Ceph is totally borged, it's only moving data on nearfull OSDs causing issues.
Hi,
we are seeing quite a high memory usage by OSDs since Nautilus. Averaging
10GB/OSD for 10TB HDDs. But I had OOM issues on 128GB Systems because some
single OSD processes used up to 32%.
Here an example how they look on average: https://i.imgur.com/kXCtxMe.png
Is that normal? I never seen this
ost 90% of full. PGs are not equally distributed otherwise it'd
be a PG size issue.
Thanks
Am Sonntag, 27. Oktober 2019, 20:33:11 OEZ hat Wido den Hollander
Folgendes geschrieben:
On 10/26/19 8:01 AM, Philippe D'Anjou wrote:
> V14.2.4
> So, this is not new, this hap
Ok looking at mempool, what does it tell me? This affects multiple OSDs, got
crashes almost every hour.
{ "mempool": {
"by_pool": {
"bloom_filter": {
"items": 0,
"bytes": 0
},
"bluestore_alloc": {
"item
Yes you were right, somehow there was an unusual high memory target set, not
sure where this came from. I set it back to normal now, that should fix it I
guess.
Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.
Hi,I need to change set-require-min-compat-clientto use upmap mode for the PG
balancer. Will this cause a disconnect of all clients? We're talking cephfs and
RBD images for VMs.
Or is it save to switch that live?
thanks
___
ceph-users mailing list
ceph-
Hi,
we're on v14.2.4 and nothing but that. All clients and servers run kernel
ubuntu 18.04 LTS 5.0.0-20.
We're seeing this error:
MountVolume.WaitForAttach failed for volume
"pvc-45a86719-edb9-11e9-9f38-02000a030111" : fail to check rbd image status
with: (exit status 110), rbd output: (2019
Hi, it is NOT safe.All clients fail to mount rbds now :(
Am Mittwoch, 30. Oktober 2019, 09:33:16 OEZ hat Konstantin Shalygin
Folgendes geschrieben:
Hi,I need to change set-require-min-compat-clientto use upmap mode for the PG
balancer. Will this cause a disconnect of all clients? W
So it seems like for some reason librados is used now instead of kernel module,
and this produces the error. But we have all latest Nautilus repos installed on
the clients...so why would librados throw a compatiblity issue?Client
compatiblity level is set to Luminous.
___
v14.2.4
Following issue:
PG_DEGRADED_FULL Degraded data redundancy (low space): 1 pg backfill_toofull
pg 1.285 is active+remapped+backfill_toofull, acting [118,94,84]
BUT:118 hdd 9.09470 0.8 9.1 TiB 7.4 TiB 7.4 TiB 12 KiB 19 GiB 1.7
TiB 81.53 1.16 38 up
Even with adjusted
Zap had an issue back then and never properly worked, you have to manually dd,
we always played it save and went 2-4GB in just to be sure.Should fix your
issue.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/cep
This only happens with this one specific node?checked system logs? checked
SMART on all disks?I mean technically it's expected to have slower writes when
the third node is there, it's by ceph design.
___
ceph-users mailing list
ceph-users@lists.ceph.com
Hi,the docs say the upmap mode is trying to achieve perfect distribution as to
have equal amount of PGs/OSD.This is what I got(v14.2.4):
0 ssd 3.49219 1.0 3.5 TiB 794 GiB 753 GiB 38 GiB 3.4 GiB 2.7 TiB
22.20 0.32 82 up
1 ssd 3.49219 1.0 3.5 TiB 800 GiB 751 GiB 45 GiB 3.7
@Wido Den Hollander
That doesn't explain why its between 76 and 92 PGs, that's major not equal.
Raising PGs to 100 is an old statement anyway, anything 60+ should be fine. Not
an excuse for distribution failure in this case.I am expecting more or less
equal PGs/OSD
_
@Wido Den Hollander
First of all the docs say: " In most cases, this distribution is “perfect,”
whichan equal number of PGs on each OSD (+/-1 PG, since they might notdivide
evenly)."Either this is just false information or very badly stated.
I increased PGs and see no difference.
I pointed out
I never had those issues with Luminous, never once, since Nautilus this is a
constant headache.My issue is that I have OSDs that are over 85% whilst others
are at 63%. My issue is that every time I do a rebalance or add new disks ceph
moves PGs on near full OSDs and almost causes pool failures.
@Wido Den Hollander
Still think this is acceptable?
51 hdd 9.09470 1.0 9.1 TiB 6.1 TiB 6.1 TiB 72 KiB 16 GiB 3.0 TiB
67.23 0.98 68 up 52 hdd 9.09470 1.0 9.1 TiB 6.7 TiB 6.7 TiB 3.5
MiB 18 GiB 2.4 TiB 73.99 1.08 75 up
53 hdd 9.09470 1.0 9.1 TiB 8.0 TiB 7.9 T
It's only getting worse after raising PGs now.
Anything between: 96 hdd 9.09470 1.0 9.1 TiB 4.9 TiB 4.9 TiB 97 KiB 13
GiB 4.2 TiB 53.62 0.76 54 up
and
89 hdd 9.09470 1.0 9.1 TiB 8.1 TiB 8.1 TiB 88 KiB 21 GiB 1001 GiB
89.25 1.27 87 up
How is that possible? I dont
My full OSD list (also here as pastebin https://paste.ubuntu.com/p/XJ4Pjm92B5/ )
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE
VAR PGS STATUS
14 hdd 9.09470 1.0 9.1 TiB 6.9 TiB 6.8 TiB 71 KiB 18 GiB 2.2 TiB
75.34 1.04 69 up
19 hdd 9.09470
Has finally been addressed in 14.2.5, check changelog of release.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
@Wido Den Hollander
Regarding the amonut of PGs, and I quote from the docs:
"If you have more than 50 OSDs, we recommend approximately 50-100placement
groups per OSD to balance out resource usage, datadurability and distribution."
(https://docs.ceph.com/docs/master/rados/operations/placement-gro
32 matches
Mail list logo