Hi Shencanquan / Srini, Thanks for the comments.
If I am ready to compromise the kernel io that are pending, is there a way to do it. what I need is to stop heartbeat when the heartbeat region is not reachable? In my case the host has got other types of filesystems as well that users use, and I cannot give an explanation to those users for the host reboot. Thanks, Vineth On Sun, Jun 2, 2013 at 3:19 AM, shencanquan <shencanq...@huawei.com> wrote: > On 2013/6/1 1:09, Srinivas Eeda wrote: > > The reason nodes are fenced during network failures is because we need to > guarantee that no i/o's are going to happen from this fenced node. If you > just change the fs to read-only we still cannot guarantee that there are no > inflight-io's from this node from previous writes. > > I agree it. > set the ocfs2 to read-only, it just prevent io from user space > application. on the kernel cache for example page cache or currently write > maybe write to io the SAN. > > the best way is use the SCSI-3 Persistent Group Reservation to fence the > node. > > > > On 05/31/2013 08:33 AM, Vineeth Thampi wrote: > > Hi, > > I have been working around the issue of Node fence in case of a > heartbeat failure / Network timeout. I modified o2quo_fence_self() in > quorum.c to make all ocfs2 filesystems RO, when tested it worked like a > charm, and the filesystems were made RO, but I am not able to umount the > filesystem or stop O2CB service. > > Is there any way by which I could ask O2CB to abort heartbeat and treat > the filesystem as LOCAL instead of GLOBAL? > > The following is the code change that I made. > > ************************************************** > static void make_fs_RO(struct super_block *sb, void *arg) > { > struct ocfs2_super *osb = OCFS2_SB(sb); > > sb->s_flags |= MS_RDONLY; > ocfs2_set_osb_flag(osb, OCFS2_OSB_ERROR_FS); > ocfs2_set_ro_flag(osb, *(int *)arg); > } > > /* this is horribly heavy-handed. It should instead flip the file > * system RO and call some userspace script. */ > static void o2quo_fence_self(void) > { > > *...* > > case O2NM_FENCE_RESET: > printk(KERN_ERR "*** Hard failure in O2CB, all ocfs2 " > "filesystems made RO ***\n"); > > /* Iterate through all ocfs2 super blocks and make each of > them RO */ > fs_type = get_fs_type("ocfs2"); > if (fs_type) > iterate_supers_type(fs_type, make_fs_RO, > &hard_reset); > > break; > *...* > > } > *************************************************************** > > > The error from kern.log: > > ======================================= > May 31 16:08:18 localhost kernel: [ 5434.076126] > (kworker/u:2,577,3):dlm_send_remote_convert_request:395 ERROR: Error -107 > when sending message 504 (key 0xcfe4a084) to node 0 > May 31 16:08:18 localhost kernel: [ 5434.076178] o2dlm: Waiting on the > death of node 0 in domain A4E98618A3744717A65AF04E943D035A > ======================================= > > Any pointers would be much appreciated. > > Thanks, > > Vineeth > > > _______________________________________________ > Ocfs2-users mailing > listOcfs2-users@oss.oracle.comhttps://oss.oracle.com/mailman/listinfo/ocfs2-users > > > > > _______________________________________________ > Ocfs2-users mailing > listOcfs2-users@oss.oracle.comhttps://oss.oracle.com/mailman/listinfo/ocfs2-users > > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-users >
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users