Re: git: b19740f4ce7a - main - swap_pager: lock vnode in swapdev_strategy()

Yoshihiro Ota Sun, 28 Nov 2021 17:46:28 -0800

Hi,

I wonder creating /etc/rc.d/nfsswap that depends on nfsclient can give a 
smoother shutdown like swaplate depends on mountlate.


Hiro

On Sun, 28 Nov 2021 04:00:56 +0200
Konstantin Belousov <kostik...@gmail.com> wrote:

> On Sun, Nov 28, 2021 at 12:22:46PM +1100, Peter Jeremy wrote:
> > On 2021-Nov-27 01:26:17 +0200, Konstantin Belousov <kostik...@gmail.com> 
> > wrote:
> > >commit 9c62295373f728459c19138f5aa03d9cb8422554
> > >Author: Konstantin Belousov <k...@freebsd.org>
> > >Date:   Sat Nov 27 01:22:27 2021 +0200
> > >
> > >    swapoff_one(): only check free pages count manually turning swap off
> > 
> > That didn't work but I don't think the underlying bug is related to
> > your recent work on swap_pager - digging back through my logs, I've
> > found another similar panic in August last year.
> It is definitely not caused by, it is just yet another issue.
> 
> > 
> > Nov 28 09:40:17 rock64 syslogd: exiting on signal 15
> > Waiting (max 60 seconds) for system process `vnlru' to stop... done
> > Waiting (max 60 seconds) for system process `syncer' to stop... 
> > Syncing disks, vnodes remaining... 0 0 done
> > Waiting (max 60 seconds) for system thread `bufdaemon' to stop... done
> > Waiting (max 60 seconds) for system thread `bufspacedaemon-0' to stop... 
> > done
> > All buffers synced.
> > No strategy for buffer at 0xffff0000bf8dc000
> > vnode 0xffffa00009024a80: type VBAD
> >     usecount 2, writecount 0, refcount 33263 seqc users 1
> >     hold count flags ()
> >     flags (VIRF_DOOMED|VV_VMSIZEVNLOCK)
> >     lock type nfs: SHARED (count 1)
> > swap_pager: I/O error - pagein failed; blkno 241400,size 4096, error 45
> > panic: VOP_STRATEGY failed bp=0xffff0000bf8dc000 vp=0
> > cpuid = 0
> > time = 1638052821
> > KDB: stack backtrace:
> > db_trace_self() at db_trace_self
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> > vpanic() at vpanic+0x178
> > panic() at panic+0x44
> > bufstrategy() at bufstrategy+0x80
> > swapdev_strategy() at swapdev_strategy+0xcc
> > swap_pager_getpages_locked() at swap_pager_getpages_locked+0x460
> > swapoff_one() at swapoff_one+0x3e4
> > swapoff_all() at swapoff_all+0x9c
> > bufshutdown() at bufshutdown+0x2ac
> > kern_reboot() at kern_reboot+0x240
> > sys_reboot() at sys_reboot+0x358
> > do_el0_sync() at do_el0_sync+0x4a4
> > handle_el0_sync() at handle_el0_sync+0x9c
> > --- exception, esr 0x56000000
> > KDB: enter: panic
> > [ thread pid 1 tid 100002 ]
> > Stopped at      kdb_enter+0x48: undefined       f900c11f
> > db> 
> > 
> > This is the same traceback as my previous mail.  Looking at the code
> > path, the test whether there's enough RAM to swap in all the data
> > passes in both cases: If swapoff_one() returned ENOMEM then
> > swapoff_all() would report a "Cannot remove swap device" error and
> > keep going (not bother to actually remove the swap device) - and
> > that's not happening.
> > 
> > I think the important message is "No strategy for buffer at 0x..."
> > which comes from vop_nostrategy() and causes bufstrategy() to panic:
> > 
> > swapdev_strategy()
> >  => bstrategy()
> >  => BO_STRATEGY()
> >  => bufstrategy()
> >  => VOP_STRATEGY()
> >  => VOP_STRATEGY_APV()
> >  => vop_nostrategy()
> >     => bufdone() => swp_pager_async_iodone()
> > 
> > Presumably, stopping the network means there's no longer any way for
> > swap operations to complete so the swap device has become associated
> > with default_vnodeops, (though I haven't dug into the actual code
> > path that does that).
> > 
> > Moving up a level, does it really matter if swapoff_one() is skipped?
> > If it actually returned an error (eg if the free memory test failed),
> > then that's what would happen.  By this point in the shutdown, there's
> > no userland left (which makes me wonder why there's anything left in
> > swap in any case) and only the final cleanups remain before the kernel
> > shuts down.  What's really needed is a way to detect that the relevant
> > swap I/O provider has gone away and return to swapoff_all() without
> > panicing.
> 
> I think the intent there is to ensure that the system is in as much steady
> state as possible.  I can easily imagine some HBA doing weird things if
> some io is in flight when the reset comes in.  So we want to ensure that
> no io occurs.
> 
> The cause for your panic is not the network interface down state (in fact,
> I think that interface state up), but as you correctly analyzed, the
> call to vop_nostrategy().  It is there in stack because underlying filesystem
> was unmounted, and the swap vnode was reclaimed, replacing NFS vop vector
> with deadfs vop vector, which inherits from the default vop.
> 
> The bit that I do not understand from your report, is why swapoff_all() did
> not occured earlier.  Your earlier report, where spurious ENOMEM came out
> from swapoff(2) syscall, and which I fixed by the previous patch, means 
> that there was an attempt to swapoff.
> 
> Still, the solution is, IMO, to swapoff before unmount. We do not need
> swapin for flushing buffers.
> 
> Please try this combined patch.
> 
> diff --git a/sys/kern/vfs_bio.c b/sys/kern/vfs_bio.c
> index 4b746a269171..201afeec9311 100644
> --- a/sys/kern/vfs_bio.c
> +++ b/sys/kern/vfs_bio.c
> @@ -1452,16 +1452,21 @@ bufshutdown(int show_busybufs)
>                */
>               printf("Giving up on %d buffers\n", nbusy);
>               DELAY(5000000); /* 5 seconds */
> +             swapoff_all();
>       } else {
>               if (!first_buf_printf)
>                       printf("Final sync complete\n");
> +
>               /*
> -              * Unmount filesystems
> +              * Unmount filesystems.  Swapoff before unmount,
> +              * because swap on file is unoperational after unmount
> +              * of underlying filesystem.
>                */
> -             if (!KERNEL_PANICKED())
> +             if (!KERNEL_PANICKED()) {
> +                     swapoff_all();
>                       vfs_unmountall();
> +             }
>       }
> -     swapoff_all();
>       DELAY(100000);          /* wait for console output to finish */
>  }
>  
> diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
> index 4cfdb3fd2cc8..981a71b2c4b1 100644
> --- a/sys/vm/swap_pager.c
> +++ b/sys/vm/swap_pager.c
> @@ -469,7 +469,8 @@ static bool       swp_pager_swblk_empty(struct swblk *sb, 
> int start, int
> limit); static void   swp_pager_free_empty_swblk(vm_object_t, struct swblk 
> *sb);
>  static int   swapongeom(struct vnode *);
>  static int   swaponvp(struct thread *, struct vnode *, u_long);
> -static int   swapoff_one(struct swdevt *sp, struct ucred *cred);
> +static int   swapoff_one(struct swdevt *sp, struct ucred *cred,
> +                 bool swapoff_syscall);
>  
>  /*
>   * Swap bitmap functions
> @@ -2523,14 +2524,14 @@ sys_swapoff(struct thread *td, struct swapoff_args 
> *uap)
>               error = EINVAL;
>               goto done;
>       }
> -     error = swapoff_one(sp, td->td_ucred);
> +     error = swapoff_one(sp, td->td_ucred, true);
>  done:
>       sx_xunlock(&swdev_syscall_lock);
>       return (error);
>  }
>  
>  static int
> -swapoff_one(struct swdevt *sp, struct ucred *cred)
> +swapoff_one(struct swdevt *sp, struct ucred *cred, bool swapoff_syscall)
>  {
>       u_long nblks;
>  #ifdef MAC
> @@ -2552,8 +2553,16 @@ swapoff_one(struct swdevt *sp, struct ucred *cred)
>        * available virtual memory in the system will fit the amount
>        * of data we will have to page back in, plus an epsilon so
>        * the system doesn't become critically low on swap space.
> +      * The vm_free_count() part does not account e.g. for clean
> +      * pages that can be immediately reclaimed without paging, so
> +      * this is very rough estimation.
> +      *
> +      * On the other hand, not turning swap off on swapoff_all()
> +      * means that we loose swap data when filesystems go away,
> +      * which is arguably worse.
>        */
> -     if (vm_free_count() + swap_pager_avail < nblks + nswap_lowat)
> +     if (swapoff_syscall &&
> +         vm_free_count() + swap_pager_avail < nblks + nswap_lowat)
>               return (ENOMEM);
>  
>       /*
> @@ -2603,7 +2612,7 @@ swapoff_all(void)
>                       devname = devtoname(sp->sw_vp->v_rdev);
>               else
>                       devname = "[file]";
> -             error = swapoff_one(sp, thread0.td_ucred);
> +             error = swapoff_one(sp, thread0.td_ucred, false);
>               if (error != 0) {
>                       printf("Cannot remove swap device %s (error=%d), "
>                           "skipping.\n", devname, error);
>

Re: git: b19740f4ce7a - main - swap_pager: lock vnode in swapdev_strategy()

Reply via email to