You absolutely cannot do this with your monitors -- as David says every node would have to participate in every monitor decision; the long tails would be horrifying and I expect it would collapse in ignominious defeat very quickly.
Your MDSes should be fine since they are indeed just a bunch of standby daemons at that point. You'd want to consider how that fits with your RAM requirements though; it's probably not a good deployment decision even though it would work at the daemon level. -Greg On Thu, May 25, 2017 at 8:30 AM David Turner <drakonst...@gmail.com> wrote: > For the MDS, the primary doesn't hold state data that needs to be replayed > to a standby. The information exists in the cluster. Your setup would be > 1 Active, 100 Standby. If the active went down, 1 of the standby's would > be promoted and read the information from the cluster. > > With Mons, it's interesting because of the quorum mechanics. 4 mons is > worse than 3 mons because of the chance for split brain where 2 of them > think something is right and the other 2 think it's wrong. You have no tie > breaking vote. Odd numbers are always best and it seems like your proposal > would regularly have an even number of Mons. I haven't heard of a > deployment with more than 5 mons. I would imagine there are some with 7 > mons out there, but it's not worth the hardware expense in 99.999% of cases. > > I'm assuming your question comes from a place of wanting to have 1 > configuration to rule them all and not have multiple types of nodes in your > ceph deployment scripts. Just put in the time and do it right. Have MDS > servers, have Mons, have OSD nodes, etc. Once you reach scale, your mons > are going to need their resources, your OSDs are going to need theirs, your > RGW will be using more bandwidth, ad infinitum. That isn't to mention all > of the RAM that the services will need during any recovery (assume 3x > memory requirements for most Ceph services when recovering. > > Hyper converged clusters are not recommended for production deployments. > Several people use them, but generally for smaller clusters. By the time > you reach dozens and hundreds of servers, you will only cause yourself > headaches by becoming the special snowflake in the community. Every time > you have a problem, the first place to look will be your resource > contention between Ceph daemons. > > > Back to some of your direct questions. Not having tested this, but using > educated guesses... A possible complication of having 100's of Mons would > be that they all have to agree on a new map causing a LOT more > communication between your mons which could likely lead to a bottleneck for > map updates (snapshot creation/deletion, osds going up/down, scrubs > happening, anything that affects data in a map). When an MDS fails, I > don't know how the voting would go for choosing a new Active MDS among 100 > Stand-by's. That could either go very quickly or take quite a bit longer > depending on the logic behind the choice. 100's of RGW servers behind an > LB (I'm assuming) would negate any caching that is happening on the RGW > servers as multiple accesses to the same file will not likely reach the > same RGW. > > On Thu, May 25, 2017 at 10:40 AM Wes Dillingham < > wes_dilling...@harvard.edu> wrote: > >> How much testing has there been / what are the implications of having a >> large number of Monitor and Metadata daemons running in a cluster. >> >> Thus far I have deployed all of our Ceph clusters as a single service >> type per physical machine but I am interested in a use case where we deploy >> dozens/hundreds? of boxes each of which would be a mon,mds,mgr,osd,rgw all >> in one and all a single cluster. I do realize it is somewhat trivial (with >> config mgmt and all) to dedicate a couple of lean boxes as MDS's and MONs >> and only expand at the OSD level but I'm still curious. >> >> My use case in mind is for backup targets where pools span the entire >> cluster and am looking to streamline the process for possible rack and >> stack situations where boxes can just be added in place booted up and they >> auto-join the cluster as a mon/mds/mgr/osd/rgw. >> >> So does anyone run clusters with dozen's of MONs' and/or MDS or aware of >> any testing with very high numbers of each? At the MDS level I would just >> be looking for 1 Active, 1 Standby-replay and X standby until multiple >> active MDSs are production ready. Thanks! >> >> -- >> Respectfully, >> >> Wes Dillingham >> wes_dilling...@harvard.edu >> Research Computing | Infrastructure Engineer >> Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 102 >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com