Gregs are awesome, apparently. Thanks for the confirmation.

I know that threads are light-weight, it's just the first time I've ever
run into something that uses them... so liberally. ^_^


On Mon, Aug 26, 2013 at 10:07 AM, Gregory Farnum <g...@inktank.com> wrote:

> On Mon, Aug 26, 2013 at 9:24 AM, Greg Poirier <greg.poir...@opower.com>
> wrote:
> > So, in doing some testing last week, I believe I managed to exhaust the
> > number of threads available to nova-compute last week. After some
> > investigation, I found the pthread_create failure and increased nproc for
> > our Nova user to, what I considered, a ridiculous 120,000 threads after
> > reading that librados will require a thread per osd, plus a few for
> > overhead, per VM on our compute nodes.
> >
> > This made me wonder: how many threads could Ceph possibly need on one of
> our
> > compute nodes.
> >
> > 32 cores * an overcommit ratio of 16, assuming each one is booted from a
> > Ceph volume, * 300 (approximate number of disks in our soon-to-go-live
> Ceph
> > cluster) = 153,600 threads.
> >
> > So this is where I started to put the truck in reverse. Am I right? What
> > about when we triple the size of our Ceph cluster? I could easily see a
> > future where we have easily 1,000 disks, if not many, many more in our
> > cluster. How do people scale this? Do you RAID to increase the density of
> > your Ceph cluster? I can only imagine that this will also drastically
> > increase the amount of resources required on my data nodes as well.
> >
> > So... suggestions? Reading?
>
> Your math looks right to me. So far though it hasn't caused anybody
> any trouble — Linux threads are much cheaper than people imagine when
> they're inactive. At some point we will certainly need to reduce the
> thread counts of our messenger (using epoll on a bunch of sockets
> instead of 2 threads -> 1 socket), but it hasn't happened yet.
> In terms of things you can do if this does become a problem, the most
> prominent is probably to (sigh) partition your cluster into pods on a
> per-rack basis or something. This is actually not as bad as it sounds
> since your network design probably would prefer not to send all writes
> through your core router, so if you create a pool for each rack and do
> something like this rack, next rack, next row for your replication you
> get better network traffic patterns.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to