On Wed, Sep 18, 2013 at 11:43 PM, Dan Van Der Ster
<daniel.vanders...@cern.ch> wrote:
>
> On Sep 18, 2013, at 11:50 PM, Gregory Farnum <g...@inktank.com>
>  wrote:
>
>> On Wed, Sep 18, 2013 at 6:33 AM, Dan Van Der Ster
>> <daniel.vanders...@cern.ch> wrote:
>>> Hi,
>>> We just finished debugging a problem with RBD-backed Glance image creation 
>>> failures, and thought our workaround would be useful for others. Basically, 
>>> we found that during an image upload, librbd on the glance api server was 
>>> consuming many many processes, eventually hitting the 1024 nproc limit of 
>>> non-root users in RHEL. The failure occurred when uploading to pools with 
>>> 2048 PGs, but didn't fail when uploading to pools with 512 PGs (we're 
>>> guessing that librbd is opening one thread per accessed-PG, and not closing 
>>> those threads until the whole processes completes.)
>>>
>>> If you hit this same problem (and you run RHEL like us), you'll need to 
>>> modify at least /etc/security/limits.d/90-nproc.conf (adding your non-root 
>>> user that should be allowed > 1024 procs), and then also possibly run 
>>> ulimit -u in the init script of your client process. Ubuntu should have 
>>> some similar limits.
>>
>> Did your pools with 2048 PGs have a significantly larger number of
>> OSDs in them? Or are both pools on a pool with a lot of OSDs relative
>> to the PG counts?
>
> 1056 OSDs at the moment.
>
> Uploading a 14GB image we observed up to ~1500 threads.
>
> We set the glance client to allow 4096 processes for now.
>
>
>> The PG count shouldn't matter for this directly, but RBD (and other
>> clients) will create a couple messenger threads for each OSD it talks
>> to, and while they'll eventually shut down on idle it doesn't
>> proactively close them. I'd expect this to be a problem around 500
>> OSDs.
>
> A couple, is that the upper limit? Should we be safe with ulimit -u 2*nOSDs 
> +1 ??

The messenger currently generates 2 threads per daemon it communicates
with (although they will go away after a long enough idle period).
2*nOSD+1 won't quite be enough as there's the monitor connection and a
handful of internal threads (I don't remember the exact numbers
off-hand).

So far this hasn't been a problem for anybody and I doubt you'll see
issues, but at some point we will need to switch the messenger to use
epoll instead of a thread per socket. :)

-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to