[ceph-users] Re: ceph IO are interrupted when OSD goes down

2021-10-21 Thread Eugen Block
Hi, sorry for the delay. So no, the min_size is not the issue here. Is the 86% utilization an average or does it spike to 100% during the interruptions? Does ceph report slow requests? Have you questioned the osd daemon which operations took so long with ceph daemon osd.1 dump_historic_sl

[ceph-users] Re: ceph IO are interrupted when OSD goes down

2021-10-18 Thread Szabo, Istvan (Agoda)
Octopus 15.2.14? I have totally the same issue and it makes me prod issue. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com -

[ceph-users] Re: ceph IO are interrupted when OSD goes down

2021-10-18 Thread Denis Polom
No it's actually not. It's by design by colleague of mine.  But anyway it's not related to this issue. On 10/18/21 15:55, Eugen Block wrote: Well, the default is k + 1, so 11. Could it be that you reduced it during a recovery phase but didn't set it back to the default? Zitat von denispo...@

[ceph-users] Re: ceph IO are interrupted when OSD goes down

2021-10-18 Thread Eugen Block
Well, the default is k + 1, so 11. Could it be that you reduced it during a recovery phase but didn't set it back to the default? Zitat von denispo...@gmail.com: no, disks utilization is around 86%. What is safe value for min_size in this case? 18. 10. 2021 15:46:44 Eugen Block : Hi, mi

[ceph-users] Re: ceph IO are interrupted when OSD goes down

2021-10-18 Thread denispolom
no, disks utilization is around 86%. What is safe value for min_size in this case? 18. 10. 2021 15:46:44 Eugen Block : > Hi, > > min_size = k is not the safest option, it should be only used in case  of > disaster recovery. But in this case it's not related to IO  interruption, it > seems. Ar

[ceph-users] Re: ceph IO are interrupted when OSD goes down

2021-10-18 Thread Eugen Block
Hi, min_size = k is not the safest option, it should be only used in case of disaster recovery. But in this case it's not related to IO interruption, it seems. Are some disks utilized around 100% (iostat) when this happens? Zitat von Denis Polom : Hi, it's min_size: 10 On 10/18/21

[ceph-users] Re: ceph IO are interrupted when OSD goes down

2021-10-18 Thread Eugen Block
Hi, with this EC setup your pool min_size would be 11 (k+1), so in case one host goes down (or several OSDs fail on this host), your clients should not be affected. But as soon as a second host fails you’ll notice IO pause until at least one host has recovered. Do you have more than 12 ho