On 2012-08-22 18:23, srinivas eeda wrote:
crash looks similar to what patch https://oss.oracle.com/pipermail/ocfs2-devel/2012-January/008469.html trying to address. The fix is not yet accepted because as explained in the patch description we need to fix the master node to skip sending BAST after receiving unlock message.

regarding ERROR: status = -17 what storage do you use? could be due to stale data.
Size of storage is 400G
OCFS2 works over aoe



On 8/22/2012 2:25 AM, Pawel wrote:
It was done multiple times,
even more: system was recreated  by mkfs.
Still the same behavior...


Pawel

On 2012-08-22 04:21, Sunil Mushran wrote:
You may want to run a full fsck on the fs.

fsck.ocfs2 -fy /dev/xxxx

On Tue, Aug 21, 2012 at 12:49 AM, Pawel <pzl...@mp.pl <mailto:pzl...@mp.pl>> wrote:

    Hi,
    After upgrading ocfs2 my cluster is instable.

    At least ones per week I can see:
    kernel panic: Null pointer dereference  at 00048
    o2dlm_blocking_ast_wrapper + 0x8/0x20 [ocfs2_stack_o2cb]
    stack:
    dlm_do_local_bast [ocfs2_dlm]
    dlm_lookup_lockers [ocfs2_dlm]
    dlm_proxy_ast_handler
    add_timer
    ..

    After that sometimes deadlock happens on another nodes. Entire
    cluster
    restart solve the issue.
    I see in log:
    (dlm_thread,7227,3):dlm_send_proxy_ast_msg:484 ERROR:
    ECB9442E19A94EAC896641BFADD55E4B: res
    M0000000000000001f411c900000000,
    error -107 send AST to node 4
    (dlm_thread,7227,3):dlm_flush_asts:605 ERROR: status = -107
    o2net: No connection established with node 4 after 10.0 seconds,
    giving up.
    o2net: No connection established with node 4 after 10.0 seconds,
    giving up.
    o2net: No connection established with node 4 after 10.0 seconds,
    giving up.
    (dlm_thread,7227,4):dlm_send_proxy_ast_msg:484 ERROR:
    ECB9442E19A94EAC896641BFADD55E4B: res
    M0000000000000001f411c900000000,
    error -107 send AST to node 4
    (dlm_thread,7227,4):dlm_flush_asts:605 ERROR: status = -107
    o2cb: o2dlm has evicted node 4 from domain
    ECB9442E19A94EAC896641BFADD55E4B
    o2cb: o2dlm has evicted node 4 from domain
    ECB9442E19A94EAC896641BFADD55E4B
    o2dlm: Begin recovery on domain ECB9442E19A94EAC896641BFADD55E4B
    for node 4
    o2dlm: Node 5 (he) is the Recovery Master for the dead node 4 in
    domain
    ECB9442E19A94EAC896641BFADD55E4B
    o2dlm: End recovery on domain ECB9442E19A94EAC896641BFADD55E4B


    Additionaly ~4 times per day I see:

    ocfs2_check_dir_for_entry:2119 ERROR: status = -17
    ocfs2_mknod:459 ERROR: status = -17
    ocfs2_create:629 ERROR: status = -17


    I currently use kernel 3.4.2
    my filesystem has been created with:
    -N 8-b 4096 -C 32768 --fs-features
    
backup-super,strict-journal-super,sparse,extended-slotmap,inline-data,metaecc,xattr,indexed-dirs,refcount,discontig-bg,unwritten,usrquota,grpquota

    Could you tell me what could make my system instable? Which
    feature ?

    Thanks for any  help

    Pawel


    _______________________________________________
    Ocfs2-users mailing list
    Ocfs2-users@oss.oracle.com <mailto:Ocfs2-users@oss.oracle.com>
    https://oss.oracle.com/mailman/listinfo/ocfs2-users





_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to