Re: [ceph-users] Upper limit of MONs and MDSs in a Cluster

Gregory Farnum Thu, 25 May 2017 11:14:15 -0700

You absolutely cannot do this with your monitors -- as David says every
node would have to participate in every monitor decision; the long tails
would be horrifying and I expect it would collapse in ignominious defeat
very quickly.


Your MDSes should be fine since they are indeed just a bunch of standby
daemons at that point. You'd want to consider how that fits with your RAM
requirements though; it's probably not a good deployment decision even
though it would work at the daemon level.
-Greg


On Thu, May 25, 2017 at 8:30 AM David Turner <drakonst...@gmail.com> wrote:

> For the MDS, the primary doesn't hold state data that needs to be replayed
> to a standby.  The information exists in the cluster.  Your setup would be
> 1 Active, 100 Standby.  If the active went down, 1 of the standby's would
> be promoted and read the information from the cluster.
>
> With Mons, it's interesting because of the quorum mechanics.  4 mons is
> worse than 3 mons because of the chance for split brain where 2 of them
> think something is right and the other 2 think it's wrong.  You have no tie
> breaking vote.  Odd numbers are always best and it seems like your proposal
> would regularly have an even number of Mons.  I haven't heard of a
> deployment with more than 5 mons.  I would imagine there are some with 7
> mons out there, but it's not worth the hardware expense in 99.999% of cases.
>
> I'm assuming your question comes from a place of wanting to have 1
> configuration to rule them all and not have multiple types of nodes in your
> ceph deployment scripts.  Just put in the time and do it right.  Have MDS
> servers, have Mons, have OSD nodes, etc.  Once you reach scale, your mons
> are going to need their resources, your OSDs are going to need theirs, your
> RGW will be using more bandwidth, ad infinitum.  That isn't to mention all
> of the RAM that the services will need during any recovery (assume 3x
> memory requirements for most Ceph services when recovering.
>
> Hyper converged clusters are not recommended for production deployments.
> Several people use them, but generally for smaller clusters.  By the time
> you reach dozens and hundreds of servers, you will only cause yourself
> headaches by becoming the special snowflake in the community.  Every time
> you have a problem, the first place to look will be your resource
> contention between Ceph daemons.
>
>
> Back to some of your direct questions.  Not having tested this, but using
> educated guesses... A possible complication of having 100's of Mons would
> be that they all have to agree on a new map causing a LOT more
> communication between your mons which could likely lead to a bottleneck for
> map updates (snapshot creation/deletion, osds going up/down, scrubs
> happening, anything that affects data in a map).  When an MDS fails, I
> don't know how the voting would go for choosing a new Active MDS among 100
> Stand-by's.  That could either go very quickly or take quite a bit longer
> depending on the logic behind the choice.  100's of RGW servers behind an
> LB (I'm assuming) would negate any caching that is happening on the RGW
> servers as multiple accesses to the same file will not likely reach the
> same RGW.
>
> On Thu, May 25, 2017 at 10:40 AM Wes Dillingham <
> wes_dilling...@harvard.edu> wrote:
>
>> How much testing has there been / what are the implications of having a
>> large number of Monitor and Metadata daemons running in a cluster.
>>
>> Thus far I  have deployed all of our Ceph clusters as a single service
>> type per physical machine but I am interested in a use case where we deploy
>> dozens/hundreds? of boxes each of which would be a mon,mds,mgr,osd,rgw all
>> in one and all a single cluster. I do realize it is somewhat trivial (with
>> config mgmt and all) to dedicate a couple of lean boxes as MDS's and MONs
>> and only expand at the OSD level but I'm still curious.
>>
>> My use case in mind is for backup targets where pools span the entire
>> cluster and am looking to streamline the process for possible rack and
>> stack situations where boxes can just be added in place booted up and they
>> auto-join the cluster as a mon/mds/mgr/osd/rgw.
>>
>> So does anyone run clusters with dozen's of MONs' and/or MDS or aware of
>> any testing with very high numbers of each? At the MDS level I would just
>> be looking for 1 Active, 1 Standby-replay and X standby until multiple
>> active MDSs are production ready. Thanks!
>>
>> --
>> Respectfully,
>>
>> Wes Dillingham
>> wes_dilling...@harvard.edu
>> Research Computing | Infrastructure Engineer
>> Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 102
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Upper limit of MONs and MDSs in a Cluster

Reply via email to