John, Thanks for the really insightful responses!
It would be nice to know what the dominant deployment scenario for the native case (my question (c)). Do they usually end up with something like OCFS2 on top of RBD for the native case, or do they go with the CephFS? Thanks, Hari On Fri, Jul 26, 2013 at 12:59 PM, John Wilkins <john.wilk...@inktank.com>wrote: > (a) This is true when using ceph-deploy for a cluster. It's one Ceph > Monitor for the cluster on one node. You can have many Ceph monitors, > but the typical high availability cluster has 3-5 monitor nodes. With > a manual install, you could conceivably install multiple monitors onto > a single node for the same cluster, but this isn't a best practice > since the node is a failure domain. The monitor is part of the > cluster, not the node. So you can have thousands of nodes running Ceph > daemons that are members of the cluster "ceph." A node that has a > monitor for cluster "ceph" will monitor all Ceph OSD daemons and MDS > daemons across those thousands of nodes. That same node could also > have a monitor for cluster "deep-storage" or whatever cluster name you > choose. > > (b) I'm actually working on a reference architecture for Calxeda that > is asking exactly that question. My personal feeling is that having a > machine/host/chassis optimized for a particular purpose (e.g., running > Ceph OSDs) is the ideal scenario, since you can just add hardware to > the cluster to expand it. You don't need to add monitors or MDSs to > add OSDs. So my personal opinion is that it's an ideal approach. The > upcoming Calxeda offerings provide excellent value in the > cost/performance tradeoff. You get a lot of storage density and good > performance. High performance clusters--e.g., using SSDs for journals, > having more RAM and CPU power--cost more, but you still have some of > the same issues. I still don't have a firm opinion on this, but my gut > tells me that OSDs should be separate from the other daemons--build > OSD hosts with dense storage. The fsync issues with the > kernel--running monitors and OSDs on the same host--generally lead to > performance issues. See > > http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#osds-are-slow-unresponsive > for examples of why you may run into performance issues making > different types of processes co-resident on the same host. Processes > like monitors shouldn't be co-resident with OSDs. So you don't have > wasted hosts with light weight processes like Ceph monitors, it may be > ideal to place your MDS daemons, Apache/RGW daemons, > OpenStack/CloudStack, and/or VMs on those nodes. You need to consider > the CPU, RAM, disk i/o and network implications of co-resident > applications. > > (d) If you have three monitors, Paxos will still work. 2 out of 3 > monitors is a majority. A failure of a monitor means it's down, but > not out. If it were out of the cluster, then the cluster would assume > only two monitors, which wouldn't work with Paxos. That's why 3 > monitors is the minimum for high availability. 4 works too, because 3 > out of 4 is a majority too. Some people like using an odd number of > monitors, since you never have an equal number of monitors that are > up/down; however, this isn't a requirement for Paxos. 3 out of 4 and 3 > out of 5 both constitute a majority. > > > > > > On Fri, Jul 26, 2013 at 11:29 AM, Hariharan Thantry <than...@gmail.com> > wrote: > > Hi John, > > > > Thanks for the responses. > > > > For (a), I remember reading somewhere that one can only run a max of 1 > > monitor/node, I assume that that implies the single monitor process will > be > > responsible for ALL ceph clusters on that node, correct? > > > > So (b) isn't really a Ceph issue, that's nice to know. Any > recommendations > > on the minimum kernel/glibc version and min RAM size requirements where > Ceph > > can be run on a single client in native mode? Reason I ask this is in a > few > > deployment scenarios (especially non-standard like telco platforms), > > hardware gets added gradually, so its more important to be able to scale > the > > cluster out gracefully. I actually see Ceph as an alternative to SAN, > using > > JBODs from machines to create a larg(ish) storage cluster. Plus, usually, > > the clients would probably be running on the same hardware as the > OSD/MON, > > because space on the chassis is at a premium. > > > > (d) I was thinking about single node failure scenarios, with 3 nodes, > > wouldn't a failure of 1 node cause PAXOS to not work? > > > > > > > > Thanks, > > Hari > > > > > > > > > > > > On Fri, Jul 26, 2013 at 10:00 AM, John Wilkins <john.wilk...@inktank.com > > > > wrote: > >> > >> (a) Yes. See > >> > http://ceph.com/docs/master/rados/configuration/ceph-conf/#running-multiple-clusters > >> and > >> > http://ceph.com/docs/master/rados/deployment/ceph-deploy-new/#naming-a-cluster > >> (b) Yes. See > >> http://wiki.ceph.com/03FAQs/01General_FAQ#How_Can_I_Give_Ceph_a_Try.3F > >> Mounting kernel modules on the same node as Ceph Daemons can cause > >> older kernels to deadlock. > >> (c) Someone else can probably answer that better than me. > >> (d) At least three. Paxos requires a simple majority, so 2 out of 3 is > >> sufficient. See > >> > http://ceph.com/docs/master/rados/configuration/mon-config-ref/#background > >> particularly the monitor quorum section. > >> > >> On Wed, Jul 24, 2013 at 4:03 PM, Hariharan Thantry <than...@gmail.com> > >> wrote: > >> > Hi folks, > >> > > >> > Some very basic questions. > >> > > >> > (a) Can I be running more than 1 ceph cluster on the same node (assume > >> > that > >> > I have no more than 1 monitor/node, but storage is contributed by one > >> > node > >> > into more than 1 cluster) > >> > (b) Are there any issues with running Ceph clients on the same node as > >> > the > >> > other Ceph storage cluster entities (OSD/MON?) > >> > (c) Is the best way to access Ceph storage cluster in native mode by > >> > multiple clients through hosting a shared-disk filesystem on top of > the > >> > RBD > >> > (like OCFS2?). What if these clients were running inside VMs? Could > one > >> > then > >> > create independent partitions on top of rbd and give a partition to > each > >> > of > >> > the VMs? > >> > (d) Isn't the realistic minimum for # of monitors in a cluster at > least > >> > 4 > >> > (to guard against one failure?) > >> > > >> > > >> > Thanks, > >> > Hari > >> > > >> > > >> > > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > > >> > >> > >> > >> -- > >> John Wilkins > >> Senior Technical Writer > >> Intank > >> john.wilk...@inktank.com > >> (415) 425-9599 > >> http://inktank.com > > > > > > > > -- > John Wilkins > Senior Technical Writer > Intank > john.wilk...@inktank.com > (415) 425-9599 > http://inktank.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com