Hi again!
I have some more information about this bug, which really seems to be
more a unionfs-bug than nfs-kernel-server. Is it possible to reassign
it to unionfs (Sarge, version 1.0.11-1)?
Sorry for the long mail.
The crash is:
------------
<1>Unable to handle kernel NULL pointer dereference at virtual address 00000014
printing eip:
e12e83e5
*pde = 00000000
Oops: 0000 [#8]
PREEMPT
Modules linked in: unionfs i830 ipv6 nfsd exportfs lockd sunrpc ide_cd evdev
pcspkr floppy parport_pc parport snd_intel8x0 snd_ac97_codec snd_pcm snd_timer
snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_device snd
pci_hotplug ehci_hcd uhci_hcd intel_agp agpgart dm_mod capability commoncap
rfcomm l2cap hci_vhci hci_usb hci_uart bfusb firmware_class bluetooth
i810_audio ac97_codec soundcore e1000 sr_mod cdrom sg ide_scsi 3c59x usbkbd
usbcore genrtc ext3 jbd mbcache ide_generic piix ide_disk ide_core sd_mod
ata_piix libata scsi_mod unix font vesafb cfbcopyarea cfbimgblt cfbfillrect
CPU: 0
EIP: 0060:[<e12e83e5>] Not tainted
EFLAGS: 00010246 (2.6.8-2-686)
EIP is at unionfs_open+0x225/0x8de0 [unionfs]
eax: ce1c7f14 ebx: de28fbec ecx: ce1c7f14 edx: ce1c7f14
esi: 00000000 edi: c558ae60 ebp: d2fbeb00 esp: de28fb2c
ds: 007b es: 007b ss: 0068
Process nfsd (pid: 2717, threadinfo=de28e000 task=de4961b0)
Stack: 0000000c 000000d0 00000323 00000004 e13ae08b e13b4a00 e13a86a7 e13aa100
00000323 de28fbc8 d82f1d00 de28fc88 00000038 00000038 00000000 00000000
d8120680 de28fc88 ce1c7f14 00000000 00000000 00000000 00000038 d8120680
Call Trace:
[<c01556b7>] open_private_file+0xb7/0xd0
[<e0af1845>] get_name+0x95/0x130 [exportfs]
[<c016ce07>] d_find_alias+0x27/0x50
[<e0af157c>] find_exported_dentry+0x57c/0x730 [exportfs]
[<c016fcc3>] iput+0x63/0x90
[<e0bc72e2>] xprt_destroy+0x42/0x60 [sunrpc]
[<e0bc34c9>] rpc_destroy_client+0x69/0xe0 [sunrpc]
[<e0bc8aaa>] rpc_release_task+0x12a/0x1b0 [sunrpc]
[<e0bc83f0>] __rpc_execute+0x370/0x410 [sunrpc]
[<c0139ec1>] __rmqueue+0xd1/0x110
[<c013a345>] buffered_rmqueue+0xf5/0x1d0
[<c013a730>] __alloc_pages+0x310/0x370
[<c013a7c3>] __get_free_pages+0x33/0x40
[<c013e857>] alloc_slabmgmt+0x57/0x70
[<c0156e11>] invalidate_inode_buffers+0x11/0x80
[<c021694b>] sock_destroy_inode+0x1b/0x20
[<c016e8e3>] destroy_inode+0x43/0x70
[<c016fcc3>] iput+0x63/0x90
[<e0bcc85c>] svc_tcp_accept+0x2ec/0x420 [sunrpc]
[<e0c23dd7>] exp_find_key+0x87/0xa0 [nfsd]
[<e0af1ada>] export_decode_fh+0x5a/0x7a [exportfs]
[<e0c1e320>] nfsd_acceptable+0x0/0x120 [nfsd]
[<e0c1e64b>] fh_verify+0x20b/0x5a0 [nfsd]
[<e0c1e320>] nfsd_acceptable+0x0/0x120 [nfsd]
[<e0c2760d>] nfsd3_proc_getattr+0x7d/0xc0 [nfsd]
[<e0c1c747>] nfsd_dispatch+0xd7/0x1e0 [nfsd]
[<e0c1c670>] nfsd_dispatch+0x0/0x1e0 [nfsd]
[<e0bcb451>] svc_process+0x4b1/0x620 [sunrpc]
[<e0c1c4b6>] nfsd+0x206/0x3c0 [nfsd]
[<e0c1c2b0>] nfsd+0x0/0x3c0 [nfsd]
[<c01042ad>] kernel_thread_helper+0x5/0x18
Code: 8b 76 14 89 74 24 34 89 77 04 8b 5c 24 34 8b 84 24 8c 00 00
-----------
i.e., unionfs_open seems to be the culprit. unionfs_open+0x225
disassembled with -S gives:
-----------
if (ret) {
37363: 85 f6 test %esi,%esi
37365: 74 7e je 373e5 <unionfs_open+0x225>
...
<------ I cut some text here
ASSERT2(ret->udi_bend <= ret->udi_bcount);
373c4: 39 d3 cmp %edx,%ebx
373c6: 0f 8f 31 8a 00 00 jg 3fdfd <unionfs_open+0x8c3d>
ASSERT2(ret->udi_bend <= sbmax(dent->d_sb));
373cc: 8b 4c 24 48 mov 0x48(%esp),%ecx
373d0: 8b 41 48 mov 0x48(%ecx),%eax
373d3: 8b 80 4c 01 00 00 mov 0x14c(%eax),%eax
373d9: 8b 40 18 mov 0x18(%eax),%eax
373dc: 40 inc %eax
373dd: 39 c3 cmp %eax,%ebx
373df: 0f 8f c1 89 00 00 jg 3fda6 <unionfs_open+0x8be6>
373e5: 8b 76 14 mov 0x14(%esi),%esi
<------ Crash is here
373e8: 89 74 24 34 mov %esi,0x34(%esp)
-----------
The module crashes since %esi is 0. I looked in the unionfs-source
(file.c) to see if I could spot the place in the source, and it seems
that the dbstart()-call
-----------
bstart = fbstart(file) = dbstart(dentry);
-----------
on line 823 is the problem. Looking at dbstart in unionfs.h::522, it's
defined as
-----------
#define dbstart(dentry) __dbstart(dentry, __FILE__, __FUNCTION__, __LINE__)
static inline int __dbstart(const struct dentry *dentry, const char *file,
const char *function, int line) {
return dtopd(dentry)->udi_bstart;
}
-----------
and I'm since dtopd is defined as
-----------
#define dtopd(dent) __dtopd(dent, 1, __FILE__, __FUNCTION__, __LINE__)
static inline struct unionfs_dentry_info *__dtopd(const struct dentry *dent,
int check, const char *file, const char *function, int line) {
struct unionfs_dentry_info *ret;
PASSERT2(dent);
ret = (struct unionfs_dentry_info *)(dent)->d_fsdata;
/* We are really only interested in catching poison here. */
if (ret) {
PASSERT2(ret);
if (check) {
if ((ret->udi_bend > ret->udi_bcount) || (ret->udi_bend
> sbmax(dent->d_sb))) {
printk("udi_bend = %d, udi_count = %d, sbmax =
%d\n", ret->udi_bend, ret->udi_bcount, sbmax(dent->d_sb));
}
ASSERT2(ret->udi_bend <= ret->udi_bcount);
ASSERT2(ret->udi_bend <= sbmax(dent->d_sb));
}
}
return ret;
}
-----------
I'm pretty sure that the branch at 37365 in the disassembly is taken,
and therefore the dtopd(dentry)->udi_bstart results in a NULL-pointer
dereference. So (dent)->d_fsdata is apparently NULL.
This happens after a while with a NFS-exported unionfs filesystem both
in kernel 2.4.27-2-686 and 2.6.8-2-686 (this is from the latter). Hope
this helps.
// Simon
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]