----- Message from Guang Yang <yguan...@yahoo.com> ---------
Date: Fri, 30 May 2014 08:56:37 +0800
From: Guang Yang <yguan...@yahoo.com>
Subject: Re: [ceph-users] Expanding pg's of an erasure coded pool
To: Gregory Farnum <g...@inktank.com>
Cc: Kenneth Waegeman <kenneth.waege...@ugent.be>, ceph-users
<ceph-users@lists.ceph.com>
On May 28, 2014, at 5:31 AM, Gregory Farnum <g...@inktank.com> wrote:
On Sun, May 25, 2014 at 6:24 PM, Guang Yang <yguan...@yahoo.com> wrote:
On May 21, 2014, at 1:33 AM, Gregory Farnum <g...@inktank.com> wrote:
This failure means the messenger subsystem is trying to create a
thread and is getting an error code back ? probably due to a process
or system thread limit that you can turn up with ulimit.
This is happening because a replicated PG primary needs a connection
to only its replicas (generally 1 or 2 connections), but with an
erasure-coded PG the primary requires a connection to m+n-1 replicas
(everybody who's in the erasure-coding set, including itself). Right
now our messenger requires a thread for each connection, so kerblam.
(And it actually requires a couple such connections because we have
separate heartbeat, cluster data, and client data systems.)
Hi Greg,
Is there any plan to refactor the messenger component to reduce
the num of threads? For example, use event-driven mode.
We've discussed it in very broad terms, but there are no concrete
designs and it's not on the schedule yet. If anybody has conclusive
evidence that it's causing them trouble they can't work around, that
would be good to know?
Well, we weren't able to find the source of the problem we had (see
original message) but there was no more time to test it further. So
something (memory or thread related?) was causing bad troubles on the
large setup what didn't happen on our small setup, but at this moment
I have no more information than included in the mails of this thread..
Thanks for the response!
We used to have a cluster with each OSD host having 11 disks
(daemons), on each host, there are around 15K threads, the system is
stable but when there is cluster wide change (e.g. OSD down / out,
recovery), we observed system load increasing, there is no cascading
failure though.
Most recently we are evaluating Ceph against high density hardware
with each OSD host having 33 disks (daemons), on each host, there
are around 40K-50K threads, with some OSD host down/out, we started
seeing high load increasing and a large volume of thread
join/creation.
We don?t have a strong evidence that the messenger thread model is
the problem and how event-driven approach can help, but I think as
moving to high density hardware (for cost saving purpose), the issue
could be amplified.
If there is any plan, it is good to know and we are very interested
to involve.
Thanks,
Guang
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
----- End message from Guang Yang <yguan...@yahoo.com> -----
--
Met vriendelijke groeten,
Kenneth Waegeman
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com