Hi kernel team,

Brian Kroth wrote:

> 3.2.28
[...]
> It does reproduce the bug.
>
> Jonathan Nieder <jrnie...@gmail.com> 2012-08-20 17:33:

>> 4. try the patches:
>>
>>      cd linux
>>      git am -3sc $(ls -1 /path/to/patches/0*)
[...]
> It does *not* reproduce the bug.
>
> Looks to have worked.

Please consider the attached patch for the sid branch of the packaging
repo.  It applies the five aforementioned patches from upstream:

  6a8a13e03861 fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash
  d1f5273e9adb ext4: return 32/64-bit dir name hash according to usage type
  999448a8c020 nfsd: rename 'int access' to 'int may_flags' in nfsd_open
  06effdbb49af nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
  d7dab39b6e16 ext3: return 32/64-bit dir name hash according to usage
               type

which make NFSv3/4 use 64-bit hashes as readdir cookies instead of
crippling itself for the sake of NFSv2 which only supports 32-bit
cookies.  The most interesting of these (patches #2 and #5) are
unfortunately a bit too big for the letter of the upstream stable
rules, but the patches are straightforward, make sense, and are well
tested.

Thoughts welcome, as usual.

Sincerely,
Jonathan
Index: 
debian/patches/bugfix/all/nfsd-vfs_llseek-with-32-or-64-bit-offsets-hashes.patch
===================================================================
--- 
debian/patches/bugfix/all/nfsd-vfs_llseek-with-32-or-64-bit-offsets-hashes.patch
    (revision 0)
+++ 
debian/patches/bugfix/all/nfsd-vfs_llseek-with-32-or-64-bit-offsets-hashes.patch
    (revision 0)
@@ -0,0 +1,77 @@
+From: Bernd Schubert <bernd.schub...@itwm.fraunhofer.de>
+Date: Sun, 18 Mar 2012 22:44:50 -0400
+Subject: nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
+
+commit 06effdbb49af5f6c7d20affaec74603914acc768 upstream.
+
+Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
+the NFS version. NFSv2 gets 32-bit hashes only.
+
+NOTE: This patch got rather complex as Christoph asked to set the
+filp->f_mode flag in the open call or immediatly after dentry_open()
+in nfsd_open() to avoid races.
+Personally I still do not see a reason for that and in my opinion
+FMODE_32BITHASH/FMODE_64BITHASH flags could be set nfsd_readdir(), as it
+follows directly after nfsd_open() without a chance of races.
+
+Signed-off-by: Bernd Schubert <bernd.schub...@itwm.fraunhofer.de>
+Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>
+Acked-by: J. Bruce Fields <bfie...@redhat.com>
+Signed-off-by: Jonathan Nieder <jrnie...@gmail.com>
+---
+ fs/nfsd/vfs.c |   15 +++++++++++++--
+ fs/nfsd/vfs.h |    2 ++
+ 2 files changed, 15 insertions(+), 2 deletions(-)
+
+diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
+index b395c61..959039e 100644
+--- a/fs/nfsd/vfs.c
++++ b/fs/nfsd/vfs.c
+@@ -785,9 +785,15 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int 
type,
+                           flags, current_cred());
+       if (IS_ERR(*filp))
+               host_err = PTR_ERR(*filp);
+-      else
++      else {
+               host_err = ima_file_check(*filp, may_flags);
+ 
++              if (may_flags & NFSD_MAY_64BIT_COOKIE)
++                      (*filp)->f_mode |= FMODE_64BITHASH;
++              else
++                      (*filp)->f_mode |= FMODE_32BITHASH;
++      }
++
+ out_nfserr:
+       err = nfserrno(host_err);
+ out:
+@@ -2011,8 +2017,13 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh 
*fhp, loff_t *offsetp,
+       __be32          err;
+       struct file     *file;
+       loff_t          offset = *offsetp;
++      int             may_flags = NFSD_MAY_READ;
+ 
+-      err = nfsd_open(rqstp, fhp, S_IFDIR, NFSD_MAY_READ, &file);
++      /* NFSv2 only supports 32 bit cookies */
++      if (rqstp->rq_vers > 2)
++              may_flags |= NFSD_MAY_64BIT_COOKIE;
++
++      err = nfsd_open(rqstp, fhp, S_IFDIR, may_flags, &file);
+       if (err)
+               goto out;
+ 
+diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
+index 3f54ad0..85d4d42 100644
+--- a/fs/nfsd/vfs.h
++++ b/fs/nfsd/vfs.h
+@@ -27,6 +27,8 @@
+ #define NFSD_MAY_BYPASS_GSS           0x400
+ #define NFSD_MAY_READ_IF_EXEC         0x800
+ 
++#define NFSD_MAY_64BIT_COOKIE         0x1000 /* 64 bit readdir cookies for >= 
NFSv3 */
++
+ #define NFSD_MAY_CREATE               (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
+ #define NFSD_MAY_REMOVE               
(NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
+ 
+-- 
+1.7.10.4
+
Index: 
debian/patches/bugfix/all/ext3-return-32-64-bit-dir-name-hash-according-to-usa.patch
===================================================================
--- 
debian/patches/bugfix/all/ext3-return-32-64-bit-dir-name-hash-according-to-usa.patch
        (revision 0)
+++ 
debian/patches/bugfix/all/ext3-return-32-64-bit-dir-name-hash-according-to-usa.patch
        (revision 0)
@@ -0,0 +1,348 @@
+From: Eric Sandeen <sand...@redhat.com>
+Date: Thu, 26 Apr 2012 13:10:39 -0500
+Subject: ext3: return 32/64-bit dir name hash according to usage type
+
+commit d7dab39b6e16d5eea78ed3c705d2a2d0772b4f06 upstream.
+
+This is based on commit d1f5273e9adb40724a85272f248f210dc4ce919a
+ext4: return 32/64-bit dir name hash according to usage type
+by Fan Yong <yong....@whamcloud.com>
+
+Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
+to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
+and telldir().  However, this causes problems if there are 32-bit hash
+collisions, since the NFSv2 server can get stuck resending the same
+entries from the directory repeatedly.
+
+Allow ext3 to return a full 64-bit hash (both major and minor) for
+telldir to decrease the chance of hash collisions.
+
+This patch does implement a new ext3_dir_llseek op, because with 64-bit
+hashes, nfs will attempt to seek to a hash "offset" which is much
+larger than ext3's s_maxbytes.  So for dx dirs, we call
+generic_file_llseek_size() with the appropriate max hash value as the
+maximum seekable size.  Otherwise we just pass through to
+generic_file_llseek().
+
+Patch-updated-by: Bernd Schubert <bernd.schub...@itwm.fraunhofer.de>
+Patch-updated-by: Eric Sandeen <sand...@redhat.com>
+(blame us if something is not correct)
+
+Signed-off-by: Eric Sandeen <sand...@redhat.com>
+Signed-off-by: Jan Kara <j...@suse.cz>
+Signed-off-by: Jonathan Nieder <jrnie...@gmail.com>
+---
+ fs/ext3/dir.c           |  167 ++++++++++++++++++++++++++++++++++-------------
+ fs/ext3/hash.c          |    4 +-
+ include/linux/ext3_fs.h |    6 +-
+ 3 files changed, 129 insertions(+), 48 deletions(-)
+
+diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
+index 34f0a07..3268697 100644
+--- a/fs/ext3/dir.c
++++ b/fs/ext3/dir.c
+@@ -25,6 +25,7 @@
+ #include <linux/jbd.h>
+ #include <linux/ext3_fs.h>
+ #include <linux/buffer_head.h>
++#include <linux/compat.h>
+ #include <linux/slab.h>
+ #include <linux/rbtree.h>
+ 
+@@ -32,24 +33,8 @@ static unsigned char ext3_filetype_table[] = {
+       DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
+ };
+ 
+-static int ext3_readdir(struct file *, void *, filldir_t);
+ static int ext3_dx_readdir(struct file * filp,
+                          void * dirent, filldir_t filldir);
+-static int ext3_release_dir (struct inode * inode,
+-                              struct file * filp);
+-
+-const struct file_operations ext3_dir_operations = {
+-      .llseek         = generic_file_llseek,
+-      .read           = generic_read_dir,
+-      .readdir        = ext3_readdir,         /* we take BKL. needed?*/
+-      .unlocked_ioctl = ext3_ioctl,
+-#ifdef CONFIG_COMPAT
+-      .compat_ioctl   = ext3_compat_ioctl,
+-#endif
+-      .fsync          = ext3_sync_file,       /* BKL held */
+-      .release        = ext3_release_dir,
+-};
+-
+ 
+ static unsigned char get_dtype(struct super_block *sb, int filetype)
+ {
+@@ -60,6 +45,25 @@ static unsigned char get_dtype(struct super_block *sb, int 
filetype)
+       return (ext3_filetype_table[filetype]);
+ }
+ 
++/**
++ * Check if the given dir-inode refers to an htree-indexed directory
++ * (or a directory which chould potentially get coverted to use htree
++ * indexing).
++ *
++ * Return 1 if it is a dx dir, 0 if not
++ */
++static int is_dx_dir(struct inode *inode)
++{
++      struct super_block *sb = inode->i_sb;
++
++      if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
++                   EXT3_FEATURE_COMPAT_DIR_INDEX) &&
++          ((EXT3_I(inode)->i_flags & EXT3_INDEX_FL) ||
++           ((inode->i_size >> sb->s_blocksize_bits) == 1)))
++              return 1;
++
++      return 0;
++}
+ 
+ int ext3_check_dir_entry (const char * function, struct inode * dir,
+                         struct ext3_dir_entry_2 * de,
+@@ -99,18 +103,13 @@ static int ext3_readdir(struct file * filp,
+       unsigned long offset;
+       int i, stored;
+       struct ext3_dir_entry_2 *de;
+-      struct super_block *sb;
+       int err;
+       struct inode *inode = filp->f_path.dentry->d_inode;
++      struct super_block *sb = inode->i_sb;
+       int ret = 0;
+       int dir_has_error = 0;
+ 
+-      sb = inode->i_sb;
+-
+-      if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
+-                                  EXT3_FEATURE_COMPAT_DIR_INDEX) &&
+-          ((EXT3_I(inode)->i_flags & EXT3_INDEX_FL) ||
+-           ((inode->i_size >> sb->s_blocksize_bits) == 1))) {
++      if (is_dx_dir(inode)) {
+               err = ext3_dx_readdir(filp, dirent, filldir);
+               if (err != ERR_BAD_DX_DIR) {
+                       ret = err;
+@@ -232,22 +231,87 @@ out:
+       return ret;
+ }
+ 
++static inline int is_32bit_api(void)
++{
++#ifdef CONFIG_COMPAT
++      return is_compat_task();
++#else
++      return (BITS_PER_LONG == 32);
++#endif
++}
++
+ /*
+  * These functions convert from the major/minor hash to an f_pos
+- * value.
++ * value for dx directories
+  *
+- * Currently we only use major hash numer.  This is unfortunate, but
+- * on 32-bit machines, the same VFS interface is used for lseek and
+- * llseek, so if we use the 64 bit offset, then the 32-bit versions of
+- * lseek/telldir/seekdir will blow out spectacularly, and from within
+- * the ext2 low-level routine, we don't know if we're being called by
+- * a 64-bit version of the system call or the 32-bit version of the
+- * system call.  Worse yet, NFSv2 only allows for a 32-bit readdir
+- * cookie.  Sigh.
++ * Upper layer (for example NFS) should specify FMODE_32BITHASH or
++ * FMODE_64BITHASH explicitly. On the other hand, we allow ext3 to be mounted
++ * directly on both 32-bit and 64-bit nodes, under such case, neither
++ * FMODE_32BITHASH nor FMODE_64BITHASH is specified.
+  */
+-#define hash2pos(major, minor)        (major >> 1)
+-#define pos2maj_hash(pos)     ((pos << 1) & 0xffffffff)
+-#define pos2min_hash(pos)     (0)
++static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
++{
++      if ((filp->f_mode & FMODE_32BITHASH) ||
++          (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++              return major >> 1;
++      else
++              return ((__u64)(major >> 1) << 32) | (__u64)minor;
++}
++
++static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
++{
++      if ((filp->f_mode & FMODE_32BITHASH) ||
++          (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++              return (pos << 1) & 0xffffffff;
++      else
++              return ((pos >> 32) << 1) & 0xffffffff;
++}
++
++static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
++{
++      if ((filp->f_mode & FMODE_32BITHASH) ||
++          (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++              return 0;
++      else
++              return pos & 0xffffffff;
++}
++
++/*
++ * Return 32- or 64-bit end-of-file for dx directories
++ */
++static inline loff_t ext3_get_htree_eof(struct file *filp)
++{
++      if ((filp->f_mode & FMODE_32BITHASH) ||
++          (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++              return EXT3_HTREE_EOF_32BIT;
++      else
++              return EXT3_HTREE_EOF_64BIT;
++}
++
++
++/*
++ * ext3_dir_llseek() calls generic_file_llseek[_size]() to handle both
++ * non-htree and htree directories, where the "offset" is in terms
++ * of the filename hash value instead of the byte offset.
++ *
++ * Because we may return a 64-bit hash that is well beyond s_maxbytes,
++ * we need to pass the max hash as the maximum allowable offset in
++ * the htree directory case.
++ *
++ * NOTE: offsets obtained *before* ext3_set_inode_flag(dir, EXT3_INODE_INDEX)
++ *       will be invalid once the directory was converted into a dx directory
++ */
++loff_t ext3_dir_llseek(struct file *file, loff_t offset, int origin)
++{
++      struct inode *inode = file->f_mapping->host;
++      int dx_dir = is_dx_dir(inode);
++
++      if (likely(dx_dir))
++              return generic_file_llseek_size(file, offset, origin,
++                                              ext3_get_htree_eof(file));
++      else
++              return generic_file_llseek(file, offset, origin);
++}
+ 
+ /*
+  * This structure holds the nodes of the red-black tree used to store
+@@ -308,15 +372,16 @@ static void free_rb_tree_fname(struct rb_root *root)
+ }
+ 
+ 
+-static struct dir_private_info *ext3_htree_create_dir_info(loff_t pos)
++static struct dir_private_info *ext3_htree_create_dir_info(struct file *filp,
++                                                         loff_t pos)
+ {
+       struct dir_private_info *p;
+ 
+       p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
+       if (!p)
+               return NULL;
+-      p->curr_hash = pos2maj_hash(pos);
+-      p->curr_minor_hash = pos2min_hash(pos);
++      p->curr_hash = pos2maj_hash(filp, pos);
++      p->curr_minor_hash = pos2min_hash(filp, pos);
+       return p;
+ }
+ 
+@@ -406,7 +471,7 @@ static int call_filldir(struct file * filp, void * dirent,
+               printk("call_filldir: called with null fname?!?\n");
+               return 0;
+       }
+-      curr_pos = hash2pos(fname->hash, fname->minor_hash);
++      curr_pos = hash2pos(filp, fname->hash, fname->minor_hash);
+       while (fname) {
+               error = filldir(dirent, fname->name,
+                               fname->name_len, curr_pos,
+@@ -431,13 +496,13 @@ static int ext3_dx_readdir(struct file * filp,
+       int     ret;
+ 
+       if (!info) {
+-              info = ext3_htree_create_dir_info(filp->f_pos);
++              info = ext3_htree_create_dir_info(filp, filp->f_pos);
+               if (!info)
+                       return -ENOMEM;
+               filp->private_data = info;
+       }
+ 
+-      if (filp->f_pos == EXT3_HTREE_EOF)
++      if (filp->f_pos == ext3_get_htree_eof(filp))
+               return 0;       /* EOF */
+ 
+       /* Some one has messed with f_pos; reset the world */
+@@ -445,8 +510,8 @@ static int ext3_dx_readdir(struct file * filp,
+               free_rb_tree_fname(&info->root);
+               info->curr_node = NULL;
+               info->extra_fname = NULL;
+-              info->curr_hash = pos2maj_hash(filp->f_pos);
+-              info->curr_minor_hash = pos2min_hash(filp->f_pos);
++              info->curr_hash = pos2maj_hash(filp, filp->f_pos);
++              info->curr_minor_hash = pos2min_hash(filp, filp->f_pos);
+       }
+ 
+       /*
+@@ -478,7 +543,7 @@ static int ext3_dx_readdir(struct file * filp,
+                       if (ret < 0)
+                               return ret;
+                       if (ret == 0) {
+-                              filp->f_pos = EXT3_HTREE_EOF;
++                              filp->f_pos = ext3_get_htree_eof(filp);
+                               break;
+                       }
+                       info->curr_node = rb_first(&info->root);
+@@ -498,7 +563,7 @@ static int ext3_dx_readdir(struct file * filp,
+                       info->curr_minor_hash = fname->minor_hash;
+               } else {
+                       if (info->next_hash == ~0) {
+-                              filp->f_pos = EXT3_HTREE_EOF;
++                              filp->f_pos = ext3_get_htree_eof(filp);
+                               break;
+                       }
+                       info->curr_hash = info->next_hash;
+@@ -517,3 +582,15 @@ static int ext3_release_dir (struct inode * inode, struct 
file * filp)
+ 
+       return 0;
+ }
++
++const struct file_operations ext3_dir_operations = {
++      .llseek         = ext3_dir_llseek,
++      .read           = generic_read_dir,
++      .readdir        = ext3_readdir,
++      .unlocked_ioctl = ext3_ioctl,
++#ifdef CONFIG_COMPAT
++      .compat_ioctl   = ext3_compat_ioctl,
++#endif
++      .fsync          = ext3_sync_file,
++      .release        = ext3_release_dir,
++};
+diff --git a/fs/ext3/hash.c b/fs/ext3/hash.c
+index 7d215b4..d4d3ade 100644
+--- a/fs/ext3/hash.c
++++ b/fs/ext3/hash.c
+@@ -200,8 +200,8 @@ int ext3fs_dirhash(const char *name, int len, struct 
dx_hash_info *hinfo)
+               return -1;
+       }
+       hash = hash & ~1;
+-      if (hash == (EXT3_HTREE_EOF << 1))
+-              hash = (EXT3_HTREE_EOF-1) << 1;
++      if (hash == (EXT3_HTREE_EOF_32BIT << 1))
++              hash = (EXT3_HTREE_EOF_32BIT - 1) << 1;
+       hinfo->hash = hash;
+       hinfo->minor_hash = minor_hash;
+       return 0;
+diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
+index dec9911..d59ab12 100644
+--- a/include/linux/ext3_fs.h
++++ b/include/linux/ext3_fs.h
+@@ -781,7 +781,11 @@ struct dx_hash_info
+       u32             *seed;
+ };
+ 
+-#define EXT3_HTREE_EOF        0x7fffffff
++
++/* 32 and 64 bit signed EOF for dx directories */
++#define EXT3_HTREE_EOF_32BIT   ((1UL  << (32 - 1)) - 1)
++#define EXT3_HTREE_EOF_64BIT   ((1ULL << (64 - 1)) - 1)
++
+ 
+ /*
+  * Control parameters used by ext3_htree_next_block
+-- 
+1.7.10.4
+
Index: 
debian/patches/bugfix/all/ext4-return-32-64-bit-dir-name-hash-according-to-usa.patch
===================================================================
--- 
debian/patches/bugfix/all/ext4-return-32-64-bit-dir-name-hash-according-to-usa.patch
        (revision 0)
+++ 
debian/patches/bugfix/all/ext4-return-32-64-bit-dir-name-hash-according-to-usa.patch
        (revision 0)
@@ -0,0 +1,379 @@
+From: Fan Yong <yong....@whamcloud.com>
+Date: Sun, 18 Mar 2012 22:44:40 -0400
+Subject: ext4: return 32/64-bit dir name hash according to usage type
+
+commit d1f5273e9adb40724a85272f248f210dc4ce919a upstream.
+
+Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
+to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
+and telldir().  However, this causes problems if there are 32-bit hash
+collisions, since the NFSv2 server can get stuck resending the same
+entries from the directory repeatedly.
+
+Allow ext4 to return a full 64-bit hash (both major and minor) for
+telldir to decrease the chance of hash collisions.  This still needs
+integration on the NFS side.
+
+Patch-updated-by: Bernd Schubert <bernd.schub...@itwm.fraunhofer.de>
+(blame me if something is not correct)
+
+Signed-off-by: Fan Yong <yong....@whamcloud.com>
+Signed-off-by: Andreas Dilger <adil...@whamcloud.com>
+Signed-off-by: Bernd Schubert <bernd.schub...@itwm.fraunhofer.de>
+Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>
+Signed-off-by: Jonathan Nieder <jrnie...@gmail.com>
+---
+ fs/ext4/dir.c  |  214 ++++++++++++++++++++++++++++++++++++++++++++------------
+ fs/ext4/ext4.h |    6 +-
+ fs/ext4/hash.c |    4 +-
+ 3 files changed, 176 insertions(+), 48 deletions(-)
+
+diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
+index 164c560..689d1b1 100644
+--- a/fs/ext4/dir.c
++++ b/fs/ext4/dir.c
+@@ -32,24 +32,8 @@ static unsigned char ext4_filetype_table[] = {
+       DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
+ };
+ 
+-static int ext4_readdir(struct file *, void *, filldir_t);
+ static int ext4_dx_readdir(struct file *filp,
+                          void *dirent, filldir_t filldir);
+-static int ext4_release_dir(struct inode *inode,
+-                              struct file *filp);
+-
+-const struct file_operations ext4_dir_operations = {
+-      .llseek         = ext4_llseek,
+-      .read           = generic_read_dir,
+-      .readdir        = ext4_readdir,         /* we take BKL. needed?*/
+-      .unlocked_ioctl = ext4_ioctl,
+-#ifdef CONFIG_COMPAT
+-      .compat_ioctl   = ext4_compat_ioctl,
+-#endif
+-      .fsync          = ext4_sync_file,
+-      .release        = ext4_release_dir,
+-};
+-
+ 
+ static unsigned char get_dtype(struct super_block *sb, int filetype)
+ {
+@@ -60,6 +44,26 @@ static unsigned char get_dtype(struct super_block *sb, int 
filetype)
+       return (ext4_filetype_table[filetype]);
+ }
+ 
++/**
++ * Check if the given dir-inode refers to an htree-indexed directory
++ * (or a directory which chould potentially get coverted to use htree
++ * indexing).
++ *
++ * Return 1 if it is a dx dir, 0 if not
++ */
++static int is_dx_dir(struct inode *inode)
++{
++      struct super_block *sb = inode->i_sb;
++
++      if (EXT4_HAS_COMPAT_FEATURE(inode->i_sb,
++                   EXT4_FEATURE_COMPAT_DIR_INDEX) &&
++          ((ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) ||
++           ((inode->i_size >> sb->s_blocksize_bits) == 1)))
++              return 1;
++
++      return 0;
++}
++
+ /*
+  * Return 0 if the directory entry is OK, and 1 if there is a problem
+  *
+@@ -115,18 +119,13 @@ static int ext4_readdir(struct file *filp,
+       unsigned int offset;
+       int i, stored;
+       struct ext4_dir_entry_2 *de;
+-      struct super_block *sb;
+       int err;
+       struct inode *inode = filp->f_path.dentry->d_inode;
++      struct super_block *sb = inode->i_sb;
+       int ret = 0;
+       int dir_has_error = 0;
+ 
+-      sb = inode->i_sb;
+-
+-      if (EXT4_HAS_COMPAT_FEATURE(inode->i_sb,
+-                                  EXT4_FEATURE_COMPAT_DIR_INDEX) &&
+-          ((ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) ||
+-           ((inode->i_size >> sb->s_blocksize_bits) == 1))) {
++      if (is_dx_dir(inode)) {
+               err = ext4_dx_readdir(filp, dirent, filldir);
+               if (err != ERR_BAD_DX_DIR) {
+                       ret = err;
+@@ -254,22 +253,134 @@ out:
+       return ret;
+ }
+ 
++static inline int is_32bit_api(void)
++{
++#ifdef CONFIG_COMPAT
++      return is_compat_task();
++#else
++      return (BITS_PER_LONG == 32);
++#endif
++}
++
+ /*
+  * These functions convert from the major/minor hash to an f_pos
+- * value.
++ * value for dx directories
+  *
+- * Currently we only use major hash numer.  This is unfortunate, but
+- * on 32-bit machines, the same VFS interface is used for lseek and
+- * llseek, so if we use the 64 bit offset, then the 32-bit versions of
+- * lseek/telldir/seekdir will blow out spectacularly, and from within
+- * the ext2 low-level routine, we don't know if we're being called by
+- * a 64-bit version of the system call or the 32-bit version of the
+- * system call.  Worse yet, NFSv2 only allows for a 32-bit readdir
+- * cookie.  Sigh.
++ * Upper layer (for example NFS) should specify FMODE_32BITHASH or
++ * FMODE_64BITHASH explicitly. On the other hand, we allow ext4 to be mounted
++ * directly on both 32-bit and 64-bit nodes, under such case, neither
++ * FMODE_32BITHASH nor FMODE_64BITHASH is specified.
+  */
+-#define hash2pos(major, minor)        (major >> 1)
+-#define pos2maj_hash(pos)     ((pos << 1) & 0xffffffff)
+-#define pos2min_hash(pos)     (0)
++static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
++{
++      if ((filp->f_mode & FMODE_32BITHASH) ||
++          (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++              return major >> 1;
++      else
++              return ((__u64)(major >> 1) << 32) | (__u64)minor;
++}
++
++static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
++{
++      if ((filp->f_mode & FMODE_32BITHASH) ||
++          (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++              return (pos << 1) & 0xffffffff;
++      else
++              return ((pos >> 32) << 1) & 0xffffffff;
++}
++
++static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
++{
++      if ((filp->f_mode & FMODE_32BITHASH) ||
++          (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++              return 0;
++      else
++              return pos & 0xffffffff;
++}
++
++/*
++ * Return 32- or 64-bit end-of-file for dx directories
++ */
++static inline loff_t ext4_get_htree_eof(struct file *filp)
++{
++      if ((filp->f_mode & FMODE_32BITHASH) ||
++          (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++              return EXT4_HTREE_EOF_32BIT;
++      else
++              return EXT4_HTREE_EOF_64BIT;
++}
++
++
++/*
++ * ext4_dir_llseek() based on generic_file_llseek() to handle both
++ * non-htree and htree directories, where the "offset" is in terms
++ * of the filename hash value instead of the byte offset.
++ *
++ * NOTE: offsets obtained *before* ext4_set_inode_flag(dir, EXT4_INODE_INDEX)
++ *       will be invalid once the directory was converted into a dx directory
++ */
++loff_t ext4_dir_llseek(struct file *file, loff_t offset, int origin)
++{
++      struct inode *inode = file->f_mapping->host;
++      loff_t ret = -EINVAL;
++      int dx_dir = is_dx_dir(inode);
++
++      mutex_lock(&inode->i_mutex);
++
++      /* NOTE: relative offsets with dx directories might not work
++       *       as expected, as it is difficult to figure out the
++       *       correct offset between dx hashes */
++
++      switch (origin) {
++      case SEEK_END:
++              if (unlikely(offset > 0))
++                      goto out_err; /* not supported for directories */
++
++              /* so only negative offsets are left, does that have a
++               * meaning for directories at all? */
++              if (dx_dir)
++                      offset += ext4_get_htree_eof(file);
++              else
++                      offset += inode->i_size;
++              break;
++      case SEEK_CUR:
++              /*
++               * Here we special-case the lseek(fd, 0, SEEK_CUR)
++               * position-querying operation.  Avoid rewriting the "same"
++               * f_pos value back to the file because a concurrent read(),
++               * write() or lseek() might have altered it
++               */
++              if (offset == 0) {
++                      offset = file->f_pos;
++                      goto out_ok;
++              }
++
++              offset += file->f_pos;
++              break;
++      }
++
++      if (unlikely(offset < 0))
++              goto out_err;
++
++      if (!dx_dir) {
++              if (offset > inode->i_sb->s_maxbytes)
++                      goto out_err;
++      } else if (offset > ext4_get_htree_eof(file))
++              goto out_err;
++
++      /* Special lock needed here? */
++      if (offset != file->f_pos) {
++              file->f_pos = offset;
++              file->f_version = 0;
++      }
++
++out_ok:
++      ret = offset;
++out_err:
++      mutex_unlock(&inode->i_mutex);
++
++      return ret;
++}
+ 
+ /*
+  * This structure holds the nodes of the red-black tree used to store
+@@ -330,15 +441,16 @@ static void free_rb_tree_fname(struct rb_root *root)
+ }
+ 
+ 
+-static struct dir_private_info *ext4_htree_create_dir_info(loff_t pos)
++static struct dir_private_info *ext4_htree_create_dir_info(struct file *filp,
++                                                         loff_t pos)
+ {
+       struct dir_private_info *p;
+ 
+       p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
+       if (!p)
+               return NULL;
+-      p->curr_hash = pos2maj_hash(pos);
+-      p->curr_minor_hash = pos2min_hash(pos);
++      p->curr_hash = pos2maj_hash(filp, pos);
++      p->curr_minor_hash = pos2min_hash(filp, pos);
+       return p;
+ }
+ 
+@@ -429,7 +541,7 @@ static int call_filldir(struct file *filp, void *dirent,
+                      "null fname?!?\n");
+               return 0;
+       }
+-      curr_pos = hash2pos(fname->hash, fname->minor_hash);
++      curr_pos = hash2pos(filp, fname->hash, fname->minor_hash);
+       while (fname) {
+               error = filldir(dirent, fname->name,
+                               fname->name_len, curr_pos,
+@@ -454,13 +566,13 @@ static int ext4_dx_readdir(struct file *filp,
+       int     ret;
+ 
+       if (!info) {
+-              info = ext4_htree_create_dir_info(filp->f_pos);
++              info = ext4_htree_create_dir_info(filp, filp->f_pos);
+               if (!info)
+                       return -ENOMEM;
+               filp->private_data = info;
+       }
+ 
+-      if (filp->f_pos == EXT4_HTREE_EOF)
++      if (filp->f_pos == ext4_get_htree_eof(filp))
+               return 0;       /* EOF */
+ 
+       /* Some one has messed with f_pos; reset the world */
+@@ -468,8 +580,8 @@ static int ext4_dx_readdir(struct file *filp,
+               free_rb_tree_fname(&info->root);
+               info->curr_node = NULL;
+               info->extra_fname = NULL;
+-              info->curr_hash = pos2maj_hash(filp->f_pos);
+-              info->curr_minor_hash = pos2min_hash(filp->f_pos);
++              info->curr_hash = pos2maj_hash(filp, filp->f_pos);
++              info->curr_minor_hash = pos2min_hash(filp, filp->f_pos);
+       }
+ 
+       /*
+@@ -501,7 +613,7 @@ static int ext4_dx_readdir(struct file *filp,
+                       if (ret < 0)
+                               return ret;
+                       if (ret == 0) {
+-                              filp->f_pos = EXT4_HTREE_EOF;
++                              filp->f_pos = ext4_get_htree_eof(filp);
+                               break;
+                       }
+                       info->curr_node = rb_first(&info->root);
+@@ -521,7 +633,7 @@ static int ext4_dx_readdir(struct file *filp,
+                       info->curr_minor_hash = fname->minor_hash;
+               } else {
+                       if (info->next_hash == ~0) {
+-                              filp->f_pos = EXT4_HTREE_EOF;
++                              filp->f_pos = ext4_get_htree_eof(filp);
+                               break;
+                       }
+                       info->curr_hash = info->next_hash;
+@@ -540,3 +652,15 @@ static int ext4_release_dir(struct inode *inode, struct 
file *filp)
+ 
+       return 0;
+ }
++
++const struct file_operations ext4_dir_operations = {
++      .llseek         = ext4_dir_llseek,
++      .read           = generic_read_dir,
++      .readdir        = ext4_readdir,
++      .unlocked_ioctl = ext4_ioctl,
++#ifdef CONFIG_COMPAT
++      .compat_ioctl   = ext4_compat_ioctl,
++#endif
++      .fsync          = ext4_sync_file,
++      .release        = ext4_release_dir,
++};
+diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
+index 8cb184c..2ac1eef 100644
+--- a/fs/ext4/ext4.h
++++ b/fs/ext4/ext4.h
+@@ -1597,7 +1597,11 @@ struct dx_hash_info
+       u32             *seed;
+ };
+ 
+-#define EXT4_HTREE_EOF        0x7fffffff
++
++/* 32 and 64 bit signed EOF for dx directories */
++#define EXT4_HTREE_EOF_32BIT   ((1UL  << (32 - 1)) - 1)
++#define EXT4_HTREE_EOF_64BIT   ((1ULL << (64 - 1)) - 1)
++
+ 
+ /*
+  * Control parameters used by ext4_htree_next_block
+diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c
+index ac8f168..fa8e491 100644
+--- a/fs/ext4/hash.c
++++ b/fs/ext4/hash.c
+@@ -200,8 +200,8 @@ int ext4fs_dirhash(const char *name, int len, struct 
dx_hash_info *hinfo)
+               return -1;
+       }
+       hash = hash & ~1;
+-      if (hash == (EXT4_HTREE_EOF << 1))
+-              hash = (EXT4_HTREE_EOF-1) << 1;
++      if (hash == (EXT4_HTREE_EOF_32BIT << 1))
++              hash = (EXT4_HTREE_EOF_32BIT - 1) << 1;
+       hinfo->hash = hash;
+       hinfo->minor_hash = minor_hash;
+       return 0;
+-- 
+1.7.10.4
+
Index: 
debian/patches/bugfix/all/fs-add-new-FMODE-flags-FMODE_32bithash-and-FMODE_64b.patch
===================================================================
--- 
debian/patches/bugfix/all/fs-add-new-FMODE-flags-FMODE_32bithash-and-FMODE_64b.patch
        (revision 0)
+++ 
debian/patches/bugfix/all/fs-add-new-FMODE-flags-FMODE_32bithash-and-FMODE_64b.patch
        (revision 0)
@@ -0,0 +1,34 @@
+From: Bernd Schubert <bernd.schub...@itwm.fraunhofer.de>
+Date: Tue, 13 Mar 2012 22:51:38 -0400
+Subject: fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash
+
+commit 6a8a13e03861c0ab83ab07d573ca793cff0e5d00 upstream.
+
+Those flags are supposed to be set by NFS readdir() to tell ext3/ext4
+to 32bit (NFSv2) or 64bit hash values (offsets) in seekdir().
+
+Signed-off-by: Bernd Schubert <bernd.schub...@itwm.fraunhofer.de>
+Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>
+Signed-off-by: Jonathan Nieder <jrnie...@gmail.com>
+---
+ include/linux/fs.h |    4 ++++
+ 1 file changed, 4 insertions(+)
+
+diff --git a/include/linux/fs.h b/include/linux/fs.h
+index 29b6353..fb7ce74 100644
+--- a/include/linux/fs.h
++++ b/include/linux/fs.h
+@@ -92,6 +92,10 @@ struct inodes_stat_t {
+ /* File is opened using open(.., 3, ..) and is writeable only for ioctls
+    (specialy hack for floppy.c) */
+ #define FMODE_WRITE_IOCTL     ((__force fmode_t)0x100)
++/* 32bit hashes as llseek() offset (for directories) */
++#define FMODE_32BITHASH         ((__force fmode_t)0x200)
++/* 64bit hashes as llseek() offset (for directories) */
++#define FMODE_64BITHASH         ((__force fmode_t)0x400)
+ 
+ /*
+  * Don't update ctime and mtime.
+-- 
+1.7.10.4
+
Index: 
debian/patches/bugfix/all/nfsd-rename-int-access-to-int-may_flags-in-nfsd_open.patch
===================================================================
--- 
debian/patches/bugfix/all/nfsd-rename-int-access-to-int-may_flags-in-nfsd_open.patch
        (revision 0)
+++ 
debian/patches/bugfix/all/nfsd-rename-int-access-to-int-may_flags-in-nfsd_open.patch
        (revision 0)
@@ -0,0 +1,84 @@
+From: Bernd Schubert <bernd.schub...@itwm.fraunhofer.de>
+Date: Sun, 18 Mar 2012 22:44:49 -0400
+Subject: nfsd: rename 'int access' to 'int may_flags' in nfsd_open()
+
+commit 999448a8c0202d8c41711c92385323520644527b upstream.
+
+Just rename this variable, as the next patch will add a flag and
+'access' as variable name would not be correct any more.
+
+Signed-off-by: Bernd Schubert <bernd.schub...@itwm.fraunhofer.de>
+Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>
+Acked-by: J. Bruce Fields <bfie...@redhat.com>
+Signed-off-by: Jonathan Nieder <jrnie...@gmail.com>
+---
+ fs/nfsd/vfs.c |   18 ++++++++++--------
+ 1 file changed, 10 insertions(+), 8 deletions(-)
+
+diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
+index 5c3cd82..b395c61 100644
+--- a/fs/nfsd/vfs.c
++++ b/fs/nfsd/vfs.c
+@@ -726,12 +726,13 @@ static int nfsd_open_break_lease(struct inode *inode, 
int access)
+ 
+ /*
+  * Open an existing file or directory.
+- * The access argument indicates the type of open (read/write/lock)
++ * The may_flags argument indicates the type of open (read/write/lock)
++ * and additional flags.
+  * N.B. After this call fhp needs an fh_put
+  */
+ __be32
+ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
+-                      int access, struct file **filp)
++                      int may_flags, struct file **filp)
+ {
+       struct dentry   *dentry;
+       struct inode    *inode;
+@@ -746,7 +747,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int 
type,
+        * and (hopefully) checked permission - so allow OWNER_OVERRIDE
+        * in case a chmod has now revoked permission.
+        */
+-      err = fh_verify(rqstp, fhp, type, access | NFSD_MAY_OWNER_OVERRIDE);
++      err = fh_verify(rqstp, fhp, type, may_flags | NFSD_MAY_OWNER_OVERRIDE);
+       if (err)
+               goto out;
+ 
+@@ -757,7 +758,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int 
type,
+        * or any access when mandatory locking enabled
+        */
+       err = nfserr_perm;
+-      if (IS_APPEND(inode) && (access & NFSD_MAY_WRITE))
++      if (IS_APPEND(inode) && (may_flags & NFSD_MAY_WRITE))
+               goto out;
+       /*
+        * We must ignore files (but only files) which might have mandatory
+@@ -770,12 +771,12 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, 
int type,
+       if (!inode->i_fop)
+               goto out;
+ 
+-      host_err = nfsd_open_break_lease(inode, access);
++      host_err = nfsd_open_break_lease(inode, may_flags);
+       if (host_err) /* NOMEM or WOULDBLOCK */
+               goto out_nfserr;
+ 
+-      if (access & NFSD_MAY_WRITE) {
+-              if (access & NFSD_MAY_READ)
++      if (may_flags & NFSD_MAY_WRITE) {
++              if (may_flags & NFSD_MAY_READ)
+                       flags = O_RDWR|O_LARGEFILE;
+               else
+                       flags = O_WRONLY|O_LARGEFILE;
+@@ -785,7 +786,8 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int 
type,
+       if (IS_ERR(*filp))
+               host_err = PTR_ERR(*filp);
+       else
+-              host_err = ima_file_check(*filp, access);
++              host_err = ima_file_check(*filp, may_flags);
++
+ out_nfserr:
+       err = nfserrno(host_err);
+ out:
+-- 
+1.7.10.4
+
Index: debian/patches/series
===================================================================
--- debian/patches/series       (revision 19368)
+++ debian/patches/series       (working copy)
@@ -393,3 +393,10 @@
 bugfix/all/usb-Add-quirk-detection-based-on-interface-informati.patch
 bugfix/all/usb-Add-USB_QUIRK_RESET_RESUME-for-all-Logitech-UVC-.patch
 bugfix/alpha/alpha-use-large-data-model.diff
+
+# 64-bit NFS readdir cookies on ext3/ext4 with dir_index
+bugfix/all/fs-add-new-FMODE-flags-FMODE_32bithash-and-FMODE_64b.patch
+bugfix/all/ext4-return-32-64-bit-dir-name-hash-according-to-usa.patch
+bugfix/all/nfsd-rename-int-access-to-int-may_flags-in-nfsd_open.patch
+bugfix/all/nfsd-vfs_llseek-with-32-or-64-bit-offsets-hashes.patch
+bugfix/all/ext3-return-32-64-bit-dir-name-hash-according-to-usa.patch
Index: debian/changelog
===================================================================
--- debian/changelog    (revision 19368)
+++ debian/changelog    (working copy)
@@ -115,6 +115,10 @@
   * Make xen-linux-system meta-packages depend on xen-system. This allows
     automatic updates. (closes: #681637)
 
+  [ Jonathan Nieder ]
+  * ext3, ext4: dir_index: Return 64-bit readdir cookies for NFSv3 and 4
+    (Closes: #685407)
+
  -- Ben Hutchings <b...@decadent.org.uk>  Tue, 24 Jul 2012 02:20:37 +0100
 
 linux (3.2.23-1) unstable; urgency=low

Reply via email to