Re: [ceph-users] Expanding pg's of an erasure coded pool

Kenneth Waegeman Mon, 02 Jun 2014 02:14:48 -0700


----- Message from Guang Yang <yguan...@yahoo.com> ---------
   Date: Fri, 30 May 2014 08:56:37 +0800
   From: Guang Yang <yguan...@yahoo.com>
Subject: Re: [ceph-users] Expanding pg's of an erasure coded pool
     To: Gregory Farnum <g...@inktank.com>

Cc: Kenneth Waegeman <kenneth.waege...@ugent.be>, ceph-users<ceph-users@lists.ceph.com>

On May 28, 2014, at 5:31 AM, Gregory Farnum <g...@inktank.com> wrote:

On Sun, May 25, 2014 at 6:24 PM, Guang Yang <yguan...@yahoo.com> wrote:

On May 21, 2014, at 1:33 AM, Gregory Farnum <g...@inktank.com> wrote:

This failure means the messenger subsystem is trying to create a
thread and is getting an error code back ? probably due to a process
or system thread limit that you can turn up with ulimit.

This is happening because a replicated PG primary needs a connection
to only its replicas (generally 1 or 2 connections), but with an
erasure-coded PG the primary requires a connection to m+n-1 replicas
(everybody who's in the erasure-coding set, including itself). Right
now our messenger requires a thread for each connection, so kerblam.
(And it actually requires a couple such connections because we have
separate heartbeat, cluster data, and client data systems.)

Hi Greg,

Is there any plan to refactor the messenger component to reducethe num of threads? For example, use event-driven mode.


We've discussed it in very broad terms, but there are no concrete
designs and it's not on the schedule yet. If anybody has conclusive
evidence that it's causing them trouble they can't work around, that
would be good to know?

Well, we weren't able to find the source of the problem we had (seeoriginal message) but there was no more time to test it further. Sosomething (memory or thread related?) was causing bad troubles on thelarge setup what didn't happen on our small setup, but at this momentI have no more information than included in the mails of this thread..

Thanks for the response!
We used to have a cluster with each OSD host having 11 disks(daemons), on each host, there are around 15K threads, the system isstable but when there is cluster wide change (e.g. OSD down / out,recovery), we observed system load increasing, there is no cascadingfailure though.
Most recently we are evaluating Ceph against high density hardwarewith each OSD host having 33 disks (daemons), on each host, thereare around 40K-50K threads, with some OSD host down/out, we startedseeing high load increasing and a large volume of threadjoin/creation.
We don?t have a strong evidence that the messenger thread model isthe problem and how event-driven approach can help, but I think asmoving to high density hardware (for cost saving purpose), the issuecould be amplified.
If there is any plan, it is good to know and we are very interestedto involve.
Thanks,
Guang
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com



----- End message from Guang Yang <yguan...@yahoo.com> -----

--

Met vriendelijke groeten,
Kenneth Waegeman


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Expanding pg's of an erasure coded pool

Reply via email to