On Jan 16, 2014, at 10:49, Peng Tao <bergw...@gmail.com> wrote:
> On Tue, Jan 14, 2014 at 2:45 AM, Trond Myklebust
> <trond.mykleb...@primarydata.com> wrote:
>> void pnfs_set_lo_fail(struct pnfs_layout_segment *lseg)
>> @@ -1881,43 +1887,37 @@ pnfs_layoutcommit_inode(struct inode *inode, bool 
>> sync)
>>        struct nfs4_layoutcommit_data *data;
>>        struct nfs_inode *nfsi = NFS_I(inode);
>>        loff_t end_pos;
>> -       int status = 0;
>> +       int status;
>> 
>> -       dprintk("--> %s inode %lu\n", __func__, inode->i_ino);
>> -
>> -       if (!test_bit(NFS_INO_LAYOUTCOMMIT, &nfsi->flags))
>> +       if (!pnfs_layoutcommit_outstanding(inode))
> This might be a problem. If nfsi->flags has !NFS_INO_LAYOUTCOMMIT and
> NFS_INO_LAYOUTCOMMITTING, client cannot issue a new layoutcommit after
> the inflight one finishes. It might not be an issue for file layout as
> long as we only use layoutcommit to update time, but it can cause data
> corruption for block layout.

I don’t understand.

With the new patch, if _either_ NFS_INO_LAYOUTCOMMIT or 
NFS_INO_LAYOUTCOMMITTING are set, then the client will wait until 
NFS_INO_LAYOUTCOMMITTING can be locked, it will test for NFS_INO_LAYOUTCOMMIT, 
and then either issue a new layout commit or exit. How can that cause new 
breakage for blocks?

The only issues that I’m aware of with the blocks layout and LAYOUTCOMMIT today 
are:
1. encode_pnfs_block_layoutupdate() runs out of XDR buffer space after 4-5 
iterations in the list_for_each_entry_safe() loop. That is because nobody has 
yet added support for preallocating a page buffer to store the (potentially 
very large) array of extents. BTW: that array looks like a perfect candidate 
for xdr_encode_array2() if we could teach the latter about xdr_stream...
2. the blocks layout also needs to be able handle the case where the list of 
extents is so large that a single LAYOUTCOMMIT is not sufficient. There is no 
reason why it should not be able to send multiple LAYOUTCOMMIT rpc calls when 
the size exceeds the session forward channel's negotiated max_rqst_sz.

--
Trond Myklebust
Linux NFS client maintainer

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to