Hi, I have been working around the issue of Node fence in case of a heartbeat failure / Network timeout. I modified o2quo_fence_self() in quorum.c to make all ocfs2 filesystems RO, when tested it worked like a charm, and the filesystems were made RO, but I am not able to umount the filesystem or stop O2CB service.
Is there any way by which I could ask O2CB to abort heartbeat and treat the filesystem as LOCAL instead of GLOBAL? The following is the code change that I made. ************************************************** static void make_fs_RO(struct super_block *sb, void *arg) { struct ocfs2_super *osb = OCFS2_SB(sb); sb->s_flags |= MS_RDONLY; ocfs2_set_osb_flag(osb, OCFS2_OSB_ERROR_FS); ocfs2_set_ro_flag(osb, *(int *)arg); } /* this is horribly heavy-handed. It should instead flip the file * system RO and call some userspace script. */ static void o2quo_fence_self(void) { *...* case O2NM_FENCE_RESET: printk(KERN_ERR "*** Hard failure in O2CB, all ocfs2 " "filesystems made RO ***\n"); /* Iterate through all ocfs2 super blocks and make each of them RO */ fs_type = get_fs_type("ocfs2"); if (fs_type) iterate_supers_type(fs_type, make_fs_RO, &hard_reset); break; *...* } *************************************************************** The error from kern.log: ======================================= May 31 16:08:18 localhost kernel: [ 5434.076126] (kworker/u:2,577,3):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0xcfe4a084) to node 0 May 31 16:08:18 localhost kernel: [ 5434.076178] o2dlm: Waiting on the death of node 0 in domain A4E98618A3744717A65AF04E943D035A ======================================= Any pointers would be much appreciated. Thanks, Vineeth
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users