>Number:         160198
>Category:       kern
>Synopsis:       amd + NFS reconnect = ICMP storm + unkillable process + hung 
>amd mount.
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Aug 26 06:10:08 UTC 2011
>Closed-Date:
>Last-Modified:
>Originator:     Artem Belevich
>Release:        FreeBSD 8.2-STABLE i386
>Organization:
FreeBSD
>Environment:
FreeBSD stable/8, head

>Description:

        When a process is interrupted during NFS reconnect which uses
        UDP, the process gets stuck in an unkillable state.

        In my particular case NFS connection is to the amd process on
        the localhost. Continuous reconnects result in a
        self-inflicted DoS attack on the amd which renders it
        unresponsive which hangs all other processes that access
        amd-mounted filesystems. As a side effect we also generate
        rather high rate of ICMP port unreachable replies. All in all
        the system ends up being virtually unavailable and in many
        cases it requires reboot to get it out of this state.

        The stuck process always has clnt_reconnect_call() in its backtrace:

        18779 100511 collect2         -                
        mi_switch+0x176
        turnstile_wait+0x1cb 
        _mtx_lock_sleep+0xe1 
        sleepq_catch_signals+0x386
        sleepq_timedwait_sig+0x19 
        _sleep+0x1b1 
        clnt_dg_call+0x7e6
        clnt_reconnect_call+0x12e 
        nfs_request+0x212 
        nfs_getattr+0x2e4
        VOP_GETATTR_APV+0x44 
        nfs_bioread+0x42a 
        VOP_READLINK_APV+0x4a
        namei+0x4f9 
        kern_statat_vnhook+0x92 
        kern_statat+0x15
        freebsd32_stat+0x2e 
        syscallenter+0x23d
        

>How-To-Repeat:
        In my case the problem most frequently occurs when a parallel
        build that touches amd-mounted filesystem is interrupted.

>Fix:
        
        clnt_dg_call() uses msleep() which may return ERESTART when
        current process is interrupted. In that happens we return to
        clnt_reconnect_call with RPC_CANTRECV. clnt_reconnect_call()
        handles RPC_CANTRECV by trying to reconnect again and the
        story repeats. Because current code never returns to the
        userland, it never quits and gets stuck, in most cases,
        forever.

        The fix is to convert ERESTART to RPC_INTR which is what's
        done in other places where it's handled in RPC code.

>Release-Note:
>Audit-Trail:
>Unformatted:
_______________________________________________
freebsd-bugs@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

Reply via email to