> On Jul 8, 2016, at 08:55, Trond Myklebust <tron...@primarydata.com> wrote:
> 
> 
>> On Jul 8, 2016, at 08:48, Seth Forshee <seth.fors...@canonical.com> wrote:
>> 
>> On Fri, Jul 08, 2016 at 09:53:30AM +1000, Dave Chinner wrote:
>>> On Wed, Jul 06, 2016 at 06:07:18PM -0400, Jeff Layton wrote:
>>>> On Wed, 2016-07-06 at 12:46 -0500, Seth Forshee wrote:
>>>>> We're seeing a hang when freezing a container with an nfs bind mount while
>>>>> running iozone. Two iozone processes were hung with this stack trace.
>>>>> 
>>>>> [] schedule+0x35/0x80
>>>>> [] schedule_preempt_disabled+0xe/0x10
>>>>> [] __mutex_lock_slowpath+0xb9/0x130
>>>>> [] mutex_lock+0x1f/0x30
>>>>> [] do_unlinkat+0x12b/0x2d0
>>>>> [] SyS_unlink+0x16/0x20
>>>>> [] entry_SYSCALL_64_fastpath+0x16/0x71
>>>>> 
>>>>> This seems to be due to another iozone thread frozen during unlink with
>>>>> this stack trace:
>>>>> 
>>>>> [] __refrigerator+0x7a/0x140
>>>>> [] nfs4_handle_exception+0x118/0x130 [nfsv4]
>>>>> [] nfs4_proc_remove+0x7d/0xf0 [nfsv4]
>>>>> [] nfs_unlink+0x149/0x350 [nfs]
>>>>> [] vfs_unlink+0xf1/0x1a0
>>>>> [] do_unlinkat+0x279/0x2d0
>>>>> [] SyS_unlink+0x16/0x20
>>>>> [] entry_SYSCALL_64_fastpath+0x16/0x71
>>>>> 
>>>>> Since nfs is allowing the thread to be frozen with the inode locked it's
>>>>> preventing other threads trying to lock the same inode from freezing. It
>>>>> seems like a bad idea for nfs to be doing this.
>>>>> 
>>>> 
>>>> Yeah, known problem. Not a simple one to fix though.
>>> 
>>> Actually, it is simple to fix.
>>> 
>>> <insert broken record about suspend should be using freeze_super(),
>>> not sys_sync(), to suspend filesystem operations>
>>> 
>>> i.e. the VFS blocks new operations from starting, and then then the
>>> NFS client simply needs to implement ->freeze_fs to drain all it's
>>> active operations before returning. Problem solved.
>> 
>> No, this won't solve my problem. We're not doing a full suspend, rather
>> using a freezer cgroup to freeze a subset of processes. We don't want to
>> want to fully freeze the filesystem.
> 
> …and therein lies the rub. The whole cgroup freezer stuff assumes that you 
> can safely deactivate a bunch of processes that may or may not hold state in 
> the filesystem. That’s definitely not OK when you hold locks etc that can 
> affect processes that lies outside the cgroup (and/or outside the NFS client 
> itself).
> 

In case it wasn’t clear, I’m not just talking about VFS mutexes here. I’m also 
talking about all the other stuff, a lot of which the kernel has no control 
over, including POSIX file locking, share locks, leases/delegations, etc.

Trond

Reply via email to