Hello, We are still facing some ext2fs hangs sometimes, and I got an interesting backtrace today:
Thread 94 (thread 348.94): #0 0x010d753c in mach_msg_trap () from /lib/libc.so.0.3 #1 0x010d7cc9 in mach_msg () from /lib/libc.so.0.3 #2 0x010a3630 in cproc_block () at /build/mbanck/hurd-20071119/build-tree/hurd/libthreads/cprocs.c:643 #3 0x010a3d1a in __mutex_lock_solid (ptr=________ CCCC 0x8123c6c __________) at /build/mbanck/hurd-20071119/build-tree/hurd/libthreads/cprocs.c:955 #4 0x0104a1e9 in diskfs_release_peropen (po=0x8242640) at /build/mbanck/hurd-20071119/build-tree/hurd/libdiskfs/peropen-rele.c:25 #5 0x0104a402 in diskfs_protid_rele (arg=0x81d0818) at /build/mbanck/hurd-20071119/build-tree/hurd/libdiskfs/protid-rele.c:34 #6 0x010aa912 in _ports_complete_deallocate (pi=0x81d0818) at /build/mbanck/hurd-20071119/build-tree/hurd/libports/complete-deallocate.c:49 #7 0x010a9421 in ports_port_deref (portstruct=0x81d0818) at /build/mbanck/hurd-20071119/build-tree/hurd/libports/port-deref.c:48 #8 0x01038825 in diskfs_S_dir_lookup (dircred=0x80e9698, path=0x1aedf4c "@test", flags=262154=O_NONBLOCK|O_WRONLY|O_EXLOCK, mode=0, retry=0x1aebf44, retryname=0x1aebf4c "", returned_port=0x1aec350, returned_port_poly=0x1aebe48) at /build/mbanck/hurd-20071119/build-tree/hurd/libdiskfs/dir-lookup.c:478 dir-lookup.c:478 is as follows: 469 if (! error) 470 { 471 if (flags & O_EXLOCK) 472 error = fshelp_acquire_lock (&np->userlock, &newpi->po->lock_status, 473 &np->lock, LOCK_EX); 474 else if (flags & O_SHLOCK) 475 error = fshelp_acquire_lock (&np->userlock, &newpi->po->lock_status, 476 &np->lock, LOCK_SH); 477 if (error) 478 ports_port_deref (newpi); /* Get rid of NEWPI. */ 479 } i.e. someone tried to open @test exclusively, but it failed (EINTR), and thus we drop newpi. The problem is that since that's the last reference, it cals _ports_complete_deallocate, which ends up calling diskfs_release_peropen which tries to acquire the lock on np, which we _already_ have, thus the deadlock, which quickly propagates to the /. It looks like we have the same problem with diskfs_create_protid(). Since in case of an error we don't return a locked np, I guess the correct fix is as attached to this mail? Samuel
Index: libdiskfs/dir-lookup.c =================================================================== RCS file: /cvsroot/hurd/hurd/libdiskfs/dir-lookup.c,v retrieving revision 1.53 diff -u -p -r1.53 dir-lookup.c --- libdiskfs/dir-lookup.c 13 May 2002 22:04:48 -0000 1.53 +++ libdiskfs/dir-lookup.c 5 Jun 2008 23:11:18 -0000 @@ -463,7 +463,10 @@ diskfs_S_dir_lookup (struct protid *dirc { error = diskfs_create_protid (newpo, dircred->user, &newpi); if (error) - diskfs_release_peropen (newpo); + { + mutex_unlock(&np->lock); + diskfs_release_peropen (newpo); + } } if (! error) @@ -475,7 +478,10 @@ diskfs_S_dir_lookup (struct protid *dirc error = fshelp_acquire_lock (&np->userlock, &newpi->po->lock_status, &np->lock, LOCK_SH); if (error) - ports_port_deref (newpi); /* Get rid of NEWPI. */ + { + mutex_unlock(&np->lock); + ports_port_deref (newpi); /* Get rid of NEWPI. */ + } } if (! error)