Hello, all. Currently, there are several race conditions around dentry/inode reclamation.
a. sysfs_dirent->s_dentry dereferencing in sysfs_readdir() b. sysfs_dirent->s_dentry dereferencing in sysfs_drop_dentry() c. sysfs_dirent->s_dentry clearing in sysfs_d_iput() All aboves are done without synchronization and can cause oops if the timing is right (or wrong). These race conditions are difficult to trigger but with the attached patch (sysfs-races.patch) and the following commands running parallelly, all three are reliably reproducible (you may have to change timings or disable others to trigger specific one). 1. while true; do insmod drivers/ata/libata.ko; insmod drivers/ata/ata_piix.ko; sleep 1; rmmod ata_piix; rmmod libata; sleep 1; echo -n . ; done 2. while true; do find /sys -type f | xargs cat > /dev/null; echo -n .; sleep 1; done 3. while true; do find /sys/class/scsi_disk -type f | sort | xargs cat > /dev/null; echo -n .; sleep 1; done 4. while true; do umount /sys; sleep 1; mount /sys; sleep 1; echo -n .; done #1 assumes there are several devices attached to ata_piix controller. Change #1 to any module or command which creates and removes sysfs nodes repeatedly and adjust #3 to cat those sysfs nodes. All known race conditions are fixed in the current -mm. #a is replaced by adding sd->s_ino and allocating unique ino with ida. #b and #c are fixed during sysfs_drop_dentry() rewrite. However, those changes were too big to apply to 2.6.22-rcX or any stable branches. This patchset contains three minimal backports of fixes in -mm. With all patches in the patchset and sysfs-races.patch applied, kernel survived ~20 hours of stress test without any problem. Thanks. -- tejun
--- fs/sysfs/dir.c | 7 +++++-- fs/sysfs/inode.c | 2 ++ 2 files changed, 7 insertions(+), 2 deletions(-) Index: work1/fs/sysfs/dir.c =================================================================== --- work1.orig/fs/sysfs/dir.c +++ work1/fs/sysfs/dir.c @@ -18,6 +18,8 @@ static void sysfs_d_iput(struct dentry * { struct sysfs_dirent * sd = dentry->d_fsdata; + udelay(10); + if (sd) { BUG_ON(sd->s_dentry != dentry); sd->s_dentry = NULL; @@ -538,9 +540,10 @@ static int sysfs_readdir(struct file * f name = sysfs_get_name(next); len = strlen(name); - if (next->s_dentry) + if (next->s_dentry) { + msleep(1); ino = next->s_dentry->d_inode->i_ino; - else + } else ino = iunique(sysfs_sb, 2); if (filldir(dirent, name, len, filp->f_pos, ino, Index: work1/fs/sysfs/inode.c =================================================================== --- work1.orig/fs/sysfs/inode.c +++ work1/fs/sysfs/inode.c @@ -248,6 +248,8 @@ void sysfs_drop_dentry(struct sysfs_dire struct dentry * dentry = sd->s_dentry; struct inode *inode; + msleep(1); + if (dentry) { spin_lock(&dcache_lock); spin_lock(&dentry->d_lock);