Re: [ceph-users] Slow requests from bluestore osds

2019-05-14 Thread Stefan Kooman
Quoting Marc Schöchlin (m...@256bit.org): > Out new setup is now: > (12.2.10 on Ubuntu 16.04) > > [osd] > osd deep scrub interval = 2592000 > osd scrub begin hour = 19 > osd scrub end hour = 6 > osd scrub load threshold = 6 > osd scrub sleep = 0.3 > osd snap trim sleep = 0.4 > pg max concurrent s

Re: [ceph-users] Ceph MGR CRASH : balancer module

2019-05-14 Thread xie.xingguo
Should be fixed by https://github.com/ceph/ceph/pull/27225 You can simply upgrade to v14.2.1 to get rid of it, or you can do 'ceph balancer off' to temporarily disable automatic balancing... 原始邮件 发件人:TarekZegar 收件人:ceph-users@lists.ceph.com ; 日 期 :2019年05月14日 01:53 主 题 :

Re: [ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-14 Thread Stefan Kooman
Quoting Frank Schilder (fr...@dtu.dk): If at all possible I would: Upgrade to 13.2.5 (there have been quite a few MDS fixes since 13.2.2). Use more recent kernels on the clients. Below settings for [mds] might help with trimming (you might already have changed mds_log_max_segments to 128 accordi

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
On 13.05.19 10:51 nachm., Lionel Bouton wrote: Le 13/05/2019 à 16:20, Kevin Flöh a écrit : Dear ceph experts, [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...] Here is what happened: One osd daemon could not be started and therefore we decided to mark the osd as lost an

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
On 13.05.19 11:21 nachm., Dan van der Ster wrote: Presumably the 2 OSDs you marked as lost were hosting those incomplete PGs? It would be useful to double confirm that: check with `ceph pg query` and `ceph pg dump`. (If so, this is why the ignore history les thing isn't helping; you don't have

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Dan van der Ster
On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote: > > On 13.05.19 10:51 nachm., Lionel Bouton wrote: > > Le 13/05/2019 à 16:20, Kevin Flöh a écrit : > >> Dear ceph experts, > >> > >> [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...] > >> Here is what happened: One osd daem

Re: [ceph-users] Ceph MGR CRASH : balancer module

2019-05-14 Thread EDH - Manuel Rios Fernandez
We can confirm that Balancer module works smooth in 14.2.1. We’re balancing with bytes and pg. Now all osd are 100% balanced. De: ceph-users En nombre de xie.xing...@zte.com.cn Enviado el: martes, 14 de mayo de 2019 9:53 Para: tze...@us.ibm.com CC: ceph-users@lists.ceph.com Asunto:

Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO [EXT]

2019-05-14 Thread Matthew Vernon
On 14/05/2019 00:36, Tarek Zegar wrote: > It's not just mimic to nautilus > I confirmed with luminous to mimic >   > They are checking for clean pgs with flags set, they should unset flags, > then check. Set flags again, move on to next osd I think I'm inclined to agree that "norebalance" is likel

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
On 14.05.19 10:08 vorm., Dan van der Ster wrote: On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote: On 13.05.19 10:51 nachm., Lionel Bouton wrote: Le 13/05/2019 à 16:20, Kevin Flöh a écrit : Dear ceph experts, [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...] Here is

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Dan van der Ster
On Tue, May 14, 2019 at 10:59 AM Kevin Flöh wrote: > > > On 14.05.19 10:08 vorm., Dan van der Ster wrote: > > On Tue, May 14, 2019 at 10:02 AM Kevin Flöh wrote: > > On 13.05.19 10:51 nachm., Lionel Bouton wrote: > > Le 13/05/2019 à 16:20, Kevin Flöh a écrit : > > Dear ceph experts, > > [...] We h

Re: [ceph-users] ceph mimic and samba vfs_ceph

2019-05-14 Thread Ansgar Jazdzewski
hi, i was able to compile samba 4.10.2 using the mimic-headerfiles and it works fine so far. now we are loking forward to do some real load tests. Have a nice one, Ansgar Am Fr., 10. Mai 2019 um 13:33 Uhr schrieb Ansgar Jazdzewski : > > thanks, > > i will try to "backport" this to ubuntu 16.04 >

[ceph-users] ceph nautilus deep-scrub health error

2019-05-14 Thread nokia ceph
Hi Team, After upgrading from Luminous to Nautilus , we see 654 pgs not deep-scrubbed in time error in ceph status . How can we disable this flag? . In our setup we disable deep-scrubbing for performance issues. Thanks, Muthu ___ ceph-users mailing list

Re: [ceph-users] ceph nautilus deep-scrub health error

2019-05-14 Thread EDH - Manuel Rios Fernandez
Hi Muthu We found the same issue near 2000 pgs not deep-scrubbed in time. We’re manually force scrubbing with : ceph health detail | grep -i not | awk '{print $2}' | while read i; do ceph pg deep-scrub ${i}; done It launch near 20-30 pgs to be deep-scrubbed. I think you can improve

[ceph-users] Lost OSD from PCIe error, recovered, to restore OSD process

2019-05-14 Thread Tarek Zegar
Someone nuked and OSD that had 1 replica PGs. They accidentally did echo 1 > /sys/block/nvme0n1/device/device/remove We got it back doing a echo 1 > /sys/bus/pci/rescan However, it reenumerated as a different drive number (guess we didn't have udev rules) They restored the LVM volume (vgcfgrestore

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
ok, so now we see at least a diffrence in the recovery state:     "recovery_state": [     {     "name": "Started/Primary/Peering/Incomplete",     "enter_time": "2019-05-14 14:15:15.650517",     "comment": "not enough complete instances of this PG"     },     {

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Dan van der Ster
On Tue, May 14, 2019 at 5:13 PM Kevin Flöh wrote: > > ok, so now we see at least a diffrence in the recovery state: > > "recovery_state": [ > { > "name": "Started/Primary/Peering/Incomplete", > "enter_time": "2019-05-14 14:15:15.650517", > "comm

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Konstantin Shalygin
peering does not seem to be blocked anymore. But still there is no recovery going on. Is there anything else we can try? Try to reduce min_size for problem pool as 'health detail' suggested: `ceph osd pool set ec31 min_size 2`. k ___ ceph-user

Re: [ceph-users] ceph nautilus deep-scrub health error

2019-05-14 Thread Brett Chancellor
You can increase your scrub intervals. osd deep scrub interval osd scrub max interval On Tue, May 14, 2019 at 7:00 AM EDH - Manuel Rios Fernandez < mrios...@easydatahost.com> wrote: > Hi Muthu > > > > We found the same issue near 2000 pgs not deep-scrubbed in time. > > > > We’re manually force sc

[ceph-users] Health Cron Script

2019-05-14 Thread Georgios Dimitrakakis
Hello, I am wondering if there are people out there that still use "old fashion" CRON scripts to check Ceph's health, monitor and receive email alerts. If there are do you mind sharing your implementation? Probably something similar to this: https://github.com/cernceph/ceph-scripts/blob/mas

[ceph-users] ceph -s finds 4 pools but ceph osd lspools says no pool which is the expected answer

2019-05-14 Thread Rainer Krienke
Hello, for a fresh setup ceph cluster I see a strange difference in the number of existing pools in the output of ceph -s and what I know that should be there: no pools at all. I set up a fresh Nautilus cluster with 144 OSDs on 9 hosts. Just to play around I created a pool named rbd with $ ceph

Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO [EXT]

2019-05-14 Thread Tarek Zegar
https://github.com/ceph/ceph-ansible/issues/3961 <--- created ticket Thanks Tarek From: Matthew Vernon To: Tarek Zegar , solarflo...@gmail.com Cc: ceph-users@lists.ceph.com Date: 05/14/2019 04:41 AM Subject:[EXTERNAL] Re: [ceph-users] Rolling upgrade fails with flag

Re: [ceph-users] Lost OSD from PCIe error, recovered, to restore OSD process

2019-05-14 Thread Bob R
Does 'ceph-volume lvm list' show it? If so you can try to activate it with 'ceph-volume lvm activate 122 74b01ec2--124d--427d--9812--e437f90261d4' Bob On Tue, May 14, 2019 at 7:35 AM Tarek Zegar wrote: > Someone nuked and OSD that had 1 replica PGs. They accidentally did echo 1 > > /sys/block/n

[ceph-users] Using centraliced management configuration drops some unrecognized config option

2019-05-14 Thread EDH - Manuel Rios Fernandez
Hi We're moving our config to centralized management configuration with "ceph config set" and with the minimal ceph.conf in all nodes. Several options from ceph are not allowed. Why? ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable) ceph config set osd o

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
Hi, since we have 3+1 ec I didn't try before. But when I run the command you suggested I get the following error: ceph osd pool set ec31 min_size 2 Error EINVAL: pool min_size must be between 3 and 4 On 14.05.19 6:18 nachm., Konstantin Shalygin wrote: peering does not seem to be blocked

Re: [ceph-users] ceph -s finds 4 pools but ceph osd lspools says no pool which is the expected answer

2019-05-14 Thread Rainer Krienke
Hello, since I had no ideas by what the wrong pool number in the ceph -s output could be caused I simply rebooted all machines of this cluster (it does not yet contain any real data) which solved the problem. So it seems that some caching problem might have caused this issue. Thanks Rainer Am 1

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
The hdds of OSDs 4 and 23 are completely lost, we cannot access them in any way. Is it possible to use the shards which are maybe stored on working OSDs as shown in the all_participants list? On 14.05.19 5:24 nachm., Dan van der Ster wrote: On Tue, May 14, 2019 at 5:13 PM Kevin Flöh wrote: o