One more thing to clarify. You said "rebind can be avoided" - what does it mean?
Thank you, Kostya On Wed, Jan 14, 2015 at 1:31 PM, Kostiantyn Ponomarenko < konstantin.ponomare...@gmail.com> wrote: > Thank you. Now I am aware of it. > > Thank you, > Kostya > > On Wed, Jan 14, 2015 at 12:59 PM, Jan Friesse <jfrie...@redhat.com> wrote: > >> Kostiantyn, >> >> > Honza, >> > >> > Thank you for helping me. >> > So, there is no defined behavior in case one of the interfaces is not in >> > the system? >> >> You are right. There is no defined behavior. >> >> Regards, >> Honza >> >> >> > >> > >> > Thank you, >> > Kostya >> > >> > On Tue, Jan 13, 2015 at 12:01 PM, Jan Friesse <jfrie...@redhat.com> >> wrote: >> > >> >> Kostiantyn, >> >> >> >> >> >>> According to the https://access.redhat.com/solutions/638843 , the >> >>> interface, that is defined in the corosync.conf, must be present in >> the >> >>> system (see at the bottom of the article, section "ROOT CAUSE"). >> >>> To confirm that I made a couple of tests. >> >>> >> >>> Here is a part of the corosync.conf file (in a free-write form) (also >> >>> attached the origin config file): >> >>> =============================== >> >>> rrp_mode: passive >> >>> ring0_addr is defined in corosync.conf >> >>> ring1_addr is defined in corosync.conf >> >>> =============================== >> >>> >> >>> ------------------------------- >> >>> >> >>> Two-node cluster >> >>> >> >>> ------------------------------- >> >>> >> >>> Test #1: >> >>> -------------------------------------------------- >> >>> IP for ring0 is not defines in the system: >> >>> -------------------------------------------------- >> >>> Start Corosync simultaneously on both nodes. >> >>> Corosync fails to start. >> >>> From the logs: >> >>> Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] parse error in >> >>> config: No interfaces defined >> >>> Jan 08 09:43:56 [2992] A6-402-2 corosync error [MAIN ] Corosync >> Cluster >> >>> Engine exiting with status 8 at main.c:1343. >> >>> Result: Corosync and Pacemaker are not running. >> >>> >> >>> Test #2: >> >>> -------------------------------------------------- >> >>> IP for ring1 is not defines in the system: >> >>> -------------------------------------------------- >> >>> Start Corosync simultaneously on both nodes. >> >>> Corosync starts. >> >>> Start Pacemaker simultaneously on both nodes. >> >>> Pacemaker fails to start. >> >>> From the logs, the last writes from the "corosync": >> >>> Jan 8 16:31:29 daemon.err<27> corosync[3728]: [TOTEM ] Marking ringid >> 0 >> >>> interface 169.254.1.3 FAULTY >> >>> Jan 8 16:31:30 daemon.notice<29> corosync[3728]: [TOTEM ] >> Automatically >> >>> recovered ring 0 >> >>> Result: Corosync and Pacemaker are not running. >> >>> >> >>> >> >>> Test #3: >> >>> >> >>> "rrp_mode: active" leads to the same result, except Corosync and >> >> Pacemaker >> >>> init scripts return status "running". >> >>> But still "vim /var/log/cluster/corosync.log" shows a lot of errors >> like: >> >>> Jan 08 16:30:47 [4067] A6-402-1 cib: error: pcmk_cpg_dispatch: >> Connection >> >>> to the CPG API failed: Library error (2) >> >>> >> >>> Result: Corosync and Pacemaker show their statuses as "running", but >> >>> "crm_mon" cannot connect to the cluster database. And half of the >> >>> Pacemaker's services are not running (including Cluster Information >> Base >> >>> (CIB)). >> >>> >> >>> >> >>> ------------------------------- >> >>> >> >>> For a single node mode >> >>> >> >>> ------------------------------- >> >>> >> >>> IP for ring0 is not defines in the system: >> >>> >> >>> Corosync fails to start. >> >>> >> >>> IP for ring1 is not defines in the system: >> >>> >> >>> Corosync and Pacemaker are started. >> >>> >> >>> It is possible that configuration will be applied successfully (50%), >> >>> >> >>> and it is possible that the cluster is not running any resources, >> >>> >> >>> and it is possible that the node cannot be put in a standby mode >> (shows: >> >>> communication error), >> >>> >> >>> and it is possible that the cluster is running all resources, but >> applied >> >>> configuration is not guaranteed to be fully loaded (some rules can be >> >>> missed). >> >>> >> >>> >> >>> ------------------------------- >> >>> >> >>> Conclusions: >> >>> >> >>> ------------------------------- >> >>> >> >>> It is possible that in some rare cases (see comments to the bug) the >> >>> cluster will work, but in that case its working state is unstable and >> the >> >>> cluster can stop working every moment. >> >>> >> >>> >> >>> So, is it correct? Does my assumptions make any sense? I didn't any >> other >> >>> explanation in the network ... . >> >> >> >> Corosync needs all interfaces during start and runtime. This doesn't >> >> mean they must be connected (this would make corosync unusable for >> >> physical NIC/Switch or cable failure), but they must be up and have >> >> correct ip. >> >> >> >> When this is not the case, corosync rebinds to localhost and weird >> >> things happens. Removal of this rebinding is long time TODO, but there >> >> are still more important bugs (especially because rebind can be >> avoided). >> >> >> >> Regards, >> >> Honza >> >> >> >>> >> >>> >> >>> >> >>> Thank you, >> >>> Kostya >> >>> >> >>> On Fri, Jan 9, 2015 at 11:10 AM, Kostiantyn Ponomarenko < >> >>> konstantin.ponomare...@gmail.com> wrote: >> >>> >> >>>> Hi guys, >> >>>> >> >>>> Corosync fails to start if there is no such network interface >> configured >> >>>> in the system. >> >>>> Even with "rrp_mode: passive" the problem is the same when at least >> one >> >>>> network interface is not configured in the system. >> >>>> >> >>>> Is this the expected behavior? >> >>>> I thought that when you use redundant rings, it is enough to have at >> >> least >> >>>> one NIC configured in the system. Am I wrong? >> >>>> >> >>>> Thank you, >> >>>> Kostya >> >>>> >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >>> >> >>> Project Home: http://www.clusterlabs.org >> >>> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> >>> Bugs: http://bugs.clusterlabs.org >> >>> >> >> >> >> >> >> _______________________________________________ >> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> >> >> Project Home: http://www.clusterlabs.org >> >> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> >> Bugs: http://bugs.clusterlabs.org >> >> >> > >> > >> > >> > _______________________________________________ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> > >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org