Symmetric clustering works best when the nodes are comparable because all nodes have to work in sync. NFS may be more suitable for your needs.
On 01/26/2012 05:51 PM, Jorge Adrian Salaices wrote: > I have been working on trying to convince Mgmt at work that we want to > go to OCFS2 away from NFS for the sharing of the Application Layer of > our Oracle EBS (Enterprise Business Suite), and for just general "Backup > Share", but general instability in my setup has dissuaded me to > recommend it. > > I have a mixture of 1.4.7 (EL 5.3) and 1.6.3 (EL 5.7 + UEK) and > something as simple as an umount has triggered random Node reboots, even > on nodes that have Other OCFS2 mounts not shared by the rebooting nodes. > You see the problem I have is that I have disparate hardware and some of > these servers are even VM's. > > Several documents state that nodes have to be somewhat equal of power > and specs and in my case that will never be. > Unfortunately for me, I have had several other events of random Fencing > that have been unexplained by common checks. > i.e. My Network has never been the problem yet one server may see > another one go away when all of the other services on that node may be > running perfectly fine. I can only surmise that the reason why that may > have been is because of an elevated load on the server that starved the > Heartbeat process preventing it from sending Network packets to other > nodes. > > My config has about 40 Nodes on it, I have 4 or 5 different shared LUNs > out of our SAN and not all servers share all Mounts. > meaning only 10 or 12 share one LUN, 8 or 9 share another and 2 or 3 > share a third, unfortunately the complexity is such that a server may > intersect with some of the servers but not all. > perhaps a change in my config to create separate clusters may be the > solution but only if a node can be part of multiple clusters: > > /node: > ip_port = 7777 > ip_address = 172.20.16.151 > number = 1 > name = txri-oprdracdb-1.tomkinsbp.com > cluster = ocfs2-back > > node: > ip_port = 7777 > ip_address = 172.20.16.152 > number = 2 > name = txri-oprdracdb-2.tomkinsbp.com > cluster = ocfs2-back > > node: > ip_port = 7777 > ip_address = 10.30.12.172 > number = 4 > name = txri-util01.tomkinsbp.com > cluster = ocfs2-util, ocfs2-back > node: > ip_port = 7777 > ip_address = 10.30.12.94 > number = 5 > name = txri-util02.tomkinsbp.com > cluster = ocfs2-util, ocfs2-back > > cluster: > node_count = 2 > name = ocfs2-back > > cluster: > node_count = 2 > name = ocfs2-util > / > Is this even Legal, or can it be done some other way ? > or is this done based on the Different DOMAINS that are created once a > mount is done . > > > How can I make the cluster more stable ? and Why does a node fence > itself on the cluster even if it does Not have any locks on the shared > LUN ? It seems to be that the node may be "fenceable" simply by having > the OCFS2 services turned ON, without a mount . > is this correct ? > > Another question I have been having as well is: can the Fencing method > be other than Panic or restart ? Can a third party or a Userland event > be triggered to recover from what may be construed by the "Heartbeat" or > "Network tests" as a downed node ? > > Thanks for any of the help you can give me. > > > -- > Jorge Adrian Salaices > Sr. Linux Engineer > Tomkins Building Products > > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users