Got it. Thank you =) I just thought about possibility of a NIC to burn down.
Thank you, Kostya On Tue, Jan 20, 2015 at 10:50 AM, Jan Friesse <jfrie...@redhat.com> wrote: > Kostiantyn, > > > > One more thing to clarify. > > You said "rebind can be avoided" - what does it mean? > > By that I mean that as long as you don't shutdown interface everything > will work as expected. Interface shutdown is administrator decision, > system doesn't do it automagically :) > > Regards, > Honza > > > > > Thank you, > > Kostya > > > > On Wed, Jan 14, 2015 at 1:31 PM, Kostiantyn Ponomarenko < > > konstantin.ponomare...@gmail.com> wrote: > > > >> Thank you. Now I am aware of it. > >> > >> Thank you, > >> Kostya > >> > >> On Wed, Jan 14, 2015 at 12:59 PM, Jan Friesse <jfrie...@redhat.com> > wrote: > >> > >>> Kostiantyn, > >>> > >>>> Honza, > >>>> > >>>> Thank you for helping me. > >>>> So, there is no defined behavior in case one of the interfaces is not > in > >>>> the system? > >>> > >>> You are right. There is no defined behavior. > >>> > >>> Regards, > >>> Honza > >>> > >>> > >>>> > >>>> > >>>> Thank you, > >>>> Kostya > >>>> > >>>> On Tue, Jan 13, 2015 at 12:01 PM, Jan Friesse <jfrie...@redhat.com> > >>> wrote: > >>>> > >>>>> Kostiantyn, > >>>>> > >>>>> > >>>>>> According to the https://access.redhat.com/solutions/638843 , the > >>>>>> interface, that is defined in the corosync.conf, must be present in > >>> the > >>>>>> system (see at the bottom of the article, section "ROOT CAUSE"). > >>>>>> To confirm that I made a couple of tests. > >>>>>> > >>>>>> Here is a part of the corosync.conf file (in a free-write form) > (also > >>>>>> attached the origin config file): > >>>>>> =============================== > >>>>>> rrp_mode: passive > >>>>>> ring0_addr is defined in corosync.conf > >>>>>> ring1_addr is defined in corosync.conf > >>>>>> =============================== > >>>>>> > >>>>>> ------------------------------- > >>>>>> > >>>>>> Two-node cluster > >>>>>> > >>>>>> ------------------------------- > >>>>>> > >>>>>> Test #1: > >>>>>> -------------------------------------------------- > >>>>>> IP for ring0 is not defines in the system: > >>>>>> -------------------------------------------------- > >>>>>> Start Corosync simultaneously on both nodes. > >>>>>> Corosync fails to start. > >>>>>> From the logs: > >>>>>> Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] parse error > in > >>>>>> config: No interfaces defined > >>>>>> Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] Corosync > >>> Cluster > >>>>>> Engine exiting with status 8 at main.c:1343. > >>>>>> Result: Corosync and Pacemaker are not running. > >>>>>> > >>>>>> Test #2: > >>>>>> -------------------------------------------------- > >>>>>> IP for ring1 is not defines in the system: > >>>>>> -------------------------------------------------- > >>>>>> Start Corosync simultaneously on both nodes. > >>>>>> Corosync starts. > >>>>>> Start Pacemaker simultaneously on both nodes. > >>>>>> Pacemaker fails to start. > >>>>>> From the logs, the last writes from the "corosync": > >>>>>> Jan 8 16:31:29 daemon.err<27> corosync[3728]: [TOTEM ] Marking > ringid > >>> 0 > >>>>>> interface 169.254.1.3 FAULTY > >>>>>> Jan 8 16:31:30 daemon.notice<29> corosync[3728]: [TOTEM ] > >>> Automatically > >>>>>> recovered ring 0 > >>>>>> Result: Corosync and Pacemaker are not running. > >>>>>> > >>>>>> > >>>>>> Test #3: > >>>>>> > >>>>>> "rrp_mode: active" leads to the same result, except Corosync and > >>>>> Pacemaker > >>>>>> init scripts return status "running". > >>>>>> But still "vim /var/log/cluster/corosync.log" shows a lot of errors > >>> like: > >>>>>> Jan 08 16:30:47 [4067] A6-402-1 cib: error: pcmk_cpg_dispatch: > >>> Connection > >>>>>> to the CPG API failed: Library error (2) > >>>>>> > >>>>>> Result: Corosync and Pacemaker show their statuses as "running", but > >>>>>> "crm_mon" cannot connect to the cluster database. And half of the > >>>>>> Pacemaker's services are not running (including Cluster Information > >>> Base > >>>>>> (CIB)). > >>>>>> > >>>>>> > >>>>>> ------------------------------- > >>>>>> > >>>>>> For a single node mode > >>>>>> > >>>>>> ------------------------------- > >>>>>> > >>>>>> IP for ring0 is not defines in the system: > >>>>>> > >>>>>> Corosync fails to start. > >>>>>> > >>>>>> IP for ring1 is not defines in the system: > >>>>>> > >>>>>> Corosync and Pacemaker are started. > >>>>>> > >>>>>> It is possible that configuration will be applied successfully > (50%), > >>>>>> > >>>>>> and it is possible that the cluster is not running any resources, > >>>>>> > >>>>>> and it is possible that the node cannot be put in a standby mode > >>> (shows: > >>>>>> communication error), > >>>>>> > >>>>>> and it is possible that the cluster is running all resources, but > >>> applied > >>>>>> configuration is not guaranteed to be fully loaded (some rules can > be > >>>>>> missed). > >>>>>> > >>>>>> > >>>>>> ------------------------------- > >>>>>> > >>>>>> Conclusions: > >>>>>> > >>>>>> ------------------------------- > >>>>>> > >>>>>> It is possible that in some rare cases (see comments to the bug) the > >>>>>> cluster will work, but in that case its working state is unstable > and > >>> the > >>>>>> cluster can stop working every moment. > >>>>>> > >>>>>> > >>>>>> So, is it correct? Does my assumptions make any sense? I didn't any > >>> other > >>>>>> explanation in the network ... . > >>>>> > >>>>> Corosync needs all interfaces during start and runtime. This doesn't > >>>>> mean they must be connected (this would make corosync unusable for > >>>>> physical NIC/Switch or cable failure), but they must be up and have > >>>>> correct ip. > >>>>> > >>>>> When this is not the case, corosync rebinds to localhost and weird > >>>>> things happens. Removal of this rebinding is long time TODO, but > there > >>>>> are still more important bugs (especially because rebind can be > >>> avoided). > >>>>> > >>>>> Regards, > >>>>> Honza > >>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> Thank you, > >>>>>> Kostya > >>>>>> > >>>>>> On Fri, Jan 9, 2015 at 11:10 AM, Kostiantyn Ponomarenko < > >>>>>> konstantin.ponomare...@gmail.com> wrote: > >>>>>> > >>>>>>> Hi guys, > >>>>>>> > >>>>>>> Corosync fails to start if there is no such network interface > >>> configured > >>>>>>> in the system. > >>>>>>> Even with "rrp_mode: passive" the problem is the same when at least > >>> one > >>>>>>> network interface is not configured in the system. > >>>>>>> > >>>>>>> Is this the expected behavior? > >>>>>>> I thought that when you use redundant rings, it is enough to have > at > >>>>> least > >>>>>>> one NIC configured in the system. Am I wrong? > >>>>>>> > >>>>>>> Thank you, > >>>>>>> Kostya > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>>>> > >>>>>> Project Home: http://www.clusterlabs.org > >>>>>> Getting started: > >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>>>> Bugs: http://bugs.clusterlabs.org > >>>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>>> > >>>>> Project Home: http://www.clusterlabs.org > >>>>> Getting started: > >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>>> Bugs: http://bugs.clusterlabs.org > >>>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>> > >>>> Project Home: http://www.clusterlabs.org > >>>> Getting started: > >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>> Bugs: http://bugs.clusterlabs.org > >>>> > >>> > >>> > >>> _______________________________________________ > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: http://bugs.clusterlabs.org > >>> > >> > >> > > > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org