On Fri, Nov 2, 2012 at 6:22 PM, Vladislav Bogdanov <bub...@hoster-ok.com> wrote: > 02.11.2012 02:05, Andrew Beekhof wrote: >> On Thu, Nov 1, 2012 at 5:09 PM, Vladislav Bogdanov <bub...@hoster-ok.com> >> wrote: >>> 01.11.2012 02:47, Andrew Beekhof wrote: >>> ... >>>>> >>>>> One remark about that - it requires that gfs2 communicates with dlm in >>>>> the kernel space - so gfs_controld is not longer required. I think >>>>> Fedora 17 is the first version with that feature. And it is definitely >>>>> not available for EL6 (centos6 which I use). >>>>> >>>>> But I have preliminary success running GFS2 with corosync2 and pacemaker >>>>> 1.1.8 on EL6. dlm4 runs just fine as is (although it misses some >>>>> featured on EL6 because of kernel). And it still includes (not >>>>> documented) option enable_fscontrol, so user-space communication with fs >>>>> control daemons is supported. Even it that feature will be removed >>>>> upstream, it can be easily returned back - just several lines of code. >>>>> And I ported gfs_controld from cman to corosync2 (patch is very dirty >>>>> yet, made with scissors and needle, just a proof-of-concept that it even >>>>> can work). Some features are unsupported (f.e. nodir) and will not be >>>>> implemented by me. >>>> >>>> I'm impressed. What was the motivation though? You really really >>>> don't like CMAN? :-) >>> >>> Why should I like software which is going to die? ;) >>> >>> I believe that how things are done currently (third case from your list) >>> fully reflect my "perfectionistic" needs. I had many problems with >>> cman+pacemaker in a past. Most critical is that pacemaker and >>> dlm_controld react differently when node reappears back very soon after >>> if was lost (because pacemaker uses totem ? directly for membership, but >>> dlm uses CPG). >> >> We both get it from the CPG and quorum APIs for option 3. > > Yes, but not for 1 nor for 2.
Not quite. We used to ignore it for option 2, but not anymore. Option 2 uses CPG for messaging. > I saw described behavior with both of > them, but not with 3. > That's why I decided to go with 3 which I think conceptually right. > >> >>> Pacemaker accepts that, but controld freezes lockspaces, >>> waiting for fencing. But fencing is never done because nobody handles >>> "node lost" CPG event. >> >> WTF. Pacemaker should absolutely do this. Bug report? > > Sorry for being unclear. > I saw that with both 1 and 2 (where pacemaker did not use CPG), until I > "fixed" fencing at dlm layer for 1. I modified it to request fencing if > "node down" event occurs and then did not see freezes anymore. From what > I understand, "node down" CPG event occurs when corosync forms > transitional membership (at least pacemaker logged lines about that at > the same time with dlm freeze. And if stable membership occurs > (milli-)seconds after transitional one, pacemaker (as of probable 1.1.6) > did not fence re-appeared node. I can understand that - pacemaker can > absolutely live with that. But dlm cannot. Right. Any sort of membership hiccup is fatal as far as the dlm is concerned. But even with options 1 and 2, it should still make a fencing request. Without fence_pcmk in cluster.conf that request might have gotten lost, but with 1.1.8 I would expect the node to be shot - regardless of whether the rest of Pacemaker thought it was ok. Thats why going direct to stonithd was an important change. > And it is its task to do > proper fencing in case it cannot work, not pacemaker's. But that piece > was missing there. The same is (probably, I may be damn wrong here) true > for cman - I did a quick search for a CPG "node down" handler in its > sources but didn't find one. I suspect it was handled by some deprecated > daemon (f.e. groupd) in the past, but as of 3.1.7 I did not observe > handling for that. > > As I go with option 3, I should not see that anymore even theoretically. > > So no bug report for what I wont use anymore :) > >> >>> dlm does start fencing for "process lost", but >>> not for "node lost". >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org