Andriy,

On Sun, Feb 18, 2018 at 12:54:21AM +0200, Andriy Gapon wrote:
A> > Today's rebuild has given me uptimes of below an hour, usually.  The box 
will stay up in single user mode long enough to rebuild world/kernel, but 
multi-user it is panicking at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
A> > 
A> > The backtrace shows that it gets to this panic from a sendfile() syscall.  
The line above is in the middle of a big edit that's part of svn revision 
329363.  The tripping assertion seems to suggest that m->valid != 0, for 
whatever that's worth.
A> 
A> I am doing a bit of an offline investigation with Andrew and it seems that 
the
A> actual panic message is this:
A> 
A> panic: vm_page_assert_xbusied: page 0xfffff807ebbd8f98 not exclusive busy @
A> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:1592
A> 
A> The stack is this:
A> vpanic() at vpanic/frame 0xfffffe00b3c36390
A> dmu_read_pages() at dmu_read_pages+0x535/frame 0xfffffe00b3c36460
A> zfs_freebsd_getpages() at zfs_freebsd_getpages+0x24c/frame 0xfffffe00b3c36510
A> VOP_GETPAGES_APV() at VOP_GETPAGES_APV+0xd9/frame 0xfffffe00b3c36540
A> vop_stdgetpages_async() at vop_stdgetpages_async+0x49/frame 
0xfffffe00b3c36590
A> VOP_GETPAGES_ASYNC_APV() at VOP_GETPAGES_ASYNC_APV+0xd9/frame 
0xfffffe00b3c365c0
A> vnode_pager_getpages_async() at vnode_pager_getpages_async+0x81/frame
A> 0xfffffe00b3c36650
A> vn_sendfile() at vn_sendfile+0xe70/frame 0xfffffe00b3c368e0
A> sendfile() at sendfile+0x149/frame 0xfffffe00b3c36980
A> amd64_syscall() at amd64_syscall+0x79b/frame 0xfffffe00b3c36ab0
A> fast_syscall_common() at fast_syscall_common+0x101/frame 0x7fffffffdb00
A> 
A> I looked at sendfile_swapin() code and it seems that it uses the pager API 
in an
A> undocumented way.  Specifically, it inserts bogus_page into the array of
A> requested pages.  For starters, bogus_page is not busied and VOP_GETPAGES is
A> documented to have all requested pages exclusively busied.  Second, I always 
had
A> an impression that bogus_page is an implementation detail of the unified 
buffer
A> / page cache and that other code need not be aware of it.
A> 
A> So, my opinion is that the sendfile code uses a "clever hack" that happens to
A> work with the buffer cache based filesystems, but that that hack is a bug.
A> So, I'd prefer that the problem is fixed in that code.
A> But I am open to being convinced that all VOP_GETPAGES implementations,
A> including that in ZFS, must be made aware of bogus_page.  Or, at least, that
A> they should not verify that the requested pages are busied.

This is optimization that improves throughput when file memory cache is
fragmented. Why don't you like adding the code to zfs_freebsd_getpages()?

-- 
Gleb Smirnoff
_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to