[ceph-users] Long interruption when increasing placement groups

fcid Tue, 03 Jul 2018 15:19:40 -0700

Hello ceph community,

Last week I was increasing the PGs in a pool used for RBD, in a attemptto reach 1024 PGs (from 128 PGs). The increments were of 32 each timeand after creating the new placement groups I trigger re-balance of datausing the pgp_num parameter.

Every thing was fine until the pool reach the ~400 PGs. Before 414 PGs,the cluster interrupted the client io for 10 seconds approx., whilecreating the new 32 PGs, which was fine for the SLA we try toaccomplish. After 414 PGs that interruption was longer, reaching 40seconds and some downtime in our virtual machines which last 1 minutemore or less and hundreds of blocked ops in the ceph log.

I would like to understand how the client io interruption took longerwhen I had more PGs. I've bee unable to figure that out from thedocumentation and distribution list.


Some info of the cluster:

 * n° OSD: 24. This cluster born with 6 OSDs.
 * 3 OSD nodes.
 * 3 monitors.
 * version: Jewel 10.2.10
 * OSD backend disks: HDD
 * OSD journal disks: SSD

Let me know if you need further information and thanks in advance.

Kind regards to you all.

--
Fernando Cid O.
Ingeniero de Operaciones
AltaVoz S.A.
 http://www.altavoz.net
Viña del Mar, Valparaiso:
 2 Poniente 355 of 53
 +56 32 276 8060
Providencia, Santiago:
 Antonio Bellet 292 of 701
 +56 2 585 4264

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Long interruption when increasing placement groups

Reply via email to