On Jun 8, 2016, at 1:22 PM, Jeff Layton wrote:

> On Wed, 2016-06-08 at 12:10 -0400, Oleg Drokin wrote:
>> On Jun 8, 2016, at 6:58 AM, Jeff Layton wrote:
>> 
>>> A simple way to confirm that might be to convert all of the read locks
>>> on the st_rwsem to write locks. That will serialize all of the open
>>> operations and should prevent that particular race from occurring.
>>> 
>>> If that works, we'd probably want to fix it in a less heavy-handed way,
>>> but I'd have to think about how best to do that.
>> 
>> So I looked at the call sites for nfs4_get_vfs_file(), how about something 
>> like this:
>> 
>> after we grab the fp->fi_lock, we can do test_access(open->op_share_access, 
>> stp);
>> 
>> If that returns true - just drop the spinlock and return EAGAIN.
>> 
>> The callsite in nfs4_upgrade_open() would handle that by retesting the 
>> access map
>> again and either coming back in or more likely reusing the now updated 
>> stateid
>> (synchronised by the fi_lock again).
>> We probably need to convert the whole access map testing there to be under
>> fi_lock.
>> Something like:
>> nfs4_upgrade_open(struct svc_rqst *rqstp, struct nfs4_file *fp, struct 
>> svc_fh *cur_fh, struct nfs4_ol_stateid *stp, struct nfsd4_open *open)
>> {
>>         __be32 status;
>>         unsigned char old_deny_bmap = stp->st_deny_bmap;
>> 
>> again:
>> +        spin_lock(&fp->fi_lock);
>>         if (!test_access(open->op_share_access, stp)) {
>> +            spin_unlock(&fp->fi_lock);
>> +               status = nfs4_get_vfs_file(rqstp, fp, cur_fh, stp, open);
>> +            if (status == -EAGAIN)
>> +                    goto again;
>> +            return status;
>> +    }
>> 
>>         /* test and set deny mode */
>> -        spin_lock(&fp->fi_lock);
>>         status = nfs4_file_check_deny(fp, open->op_share_deny);
>> 
>> 
>> The call in nfsd4_process_open2() I think cannot hit this condition, right?
>> probably can add a WARN_ON there? BUG_ON? more sensible approach?
>> 
>> Alternatively we can probably always call nfs4_get_vfs_file() under this 
>> spinlock,
>> just have it drop that for the open and then reobtain (already done), not as 
>> transparent I guess.
>> 
> 
> Yeah, I think that might be best. It looks like things could change
> after you drop the spinlock with the patch above. Since we have to
> retake it anyway in nfs4_get_vfs_file, we can just do it there.
> 
>> Or the fi_lock might be converted to say a mutex, so we can sleep with it 
>> held and
>> then we can hold it across whole invocation of nfs4_get_vfs_file() and 
>> access testing and stuff.
> 
> I think we'd be better off taking the st_rwsem for write (maybe just
> turning it into a mutex). That would at least be per-stateid instead of
> per-inode. That's a fine fix for now.
> 
> It might slow down a client slightly that is sending two stateid
> morphing operations in parallel, but they shouldn't affect each other.
> I'm liking that solution more and more here.
> Longer term, I think we need to further simplify OPEN handling. It has
> gotten better, but it's still really hard to follow currently (and is
> obviously error-prone).

The conversion to always rwlock holds up nice so far (also no other WARNs are 
triggered
yet.)

I guess I'll do a patch converting to mutex, but also separately a patch that 
just
holds fi_lock more - test that other one and if all is well, submit is too,
and let you choose which one you like the most ;)


Reply via email to