Re: [ceph-users] Blocked requests after "osd in"

2015-12-11 Thread Christian Kauhaus
Am 10.12.2015 um 06:38 schrieb Robert LeBlanc: > Since I'm very interested in > reducing this problem, I'm willing to try and submit a fix after I'm > done with the new OP queue I'm working on. I don't know the best > course of action at the moment, but I hope I can get some input for > when I do t

Re: [ceph-users] Blocked requests after "osd in"

2015-12-10 Thread Jan Schermer
Just try to give the booting OSD and all MONs the resources they ask for (CPU, memory). Yes, it causes disruption but only for a select group of clients, and only for a moment (<20s with my extremely high number of PGs). From a service provider perspective this might break SLAs, but until you get

Re: [ceph-users] Blocked requests after "osd in"

2015-12-10 Thread Christian Kauhaus
Am 10.12.2015 um 06:38 schrieb Robert LeBlanc: > I noticed this a while back and did some tracing. As soon as the PGs > are read in by the OSD (very limited amount of housekeeping done), the > OSD is set to the "in" state so that peering with other OSDs can > happen and the recovery process can beg

Re: [ceph-users] Blocked requests after "osd in"

2015-12-09 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I noticed this a while back and did some tracing. As soon as the PGs are read in by the OSD (very limited amount of housekeeping done), the OSD is set to the "in" state so that peering with other OSDs can happen and the recovery process can begin. Th

Re: [ceph-users] Blocked requests after "osd in"

2015-12-09 Thread Christian Kauhaus
Am 09.12.2015 um 11:21 schrieb Jan Schermer: > Are you seeing "peering" PGs when the blocked requests are happening? That's > what we see regularly when starting OSDs. Mostly "peering" and "activating". > I'm not sure this can be solved completely (and whether there are major > improvements in

Re: [ceph-users] Blocked requests after "osd in"

2015-12-09 Thread Jan Schermer
Are you seeing "peering" PGs when the blocked requests are happening? That's what we see regularly when starting OSDs. I'm not sure this can be solved completely (and whether there are major improvements in newer Ceph versions), but it can be sped up by 1) making sure you have free (and not dirt