On Fri, 2008-06-06 at 00:12 +0100, Samuel Thibault wrotekfs/dir-lookup.c:478 > > dir-lookup.c:478 is as follows: > > 469 if (! error) > 470 { > 471 if (flags & O_EXLOCK) > 472 error = fshelp_acquire_lock (&np->userlock, &newpi->po->lock_status, > 473 &np->lock, LOCK_EX); > 474 else if (flags & O_SHLOCK) > 475 error = fshelp_acquire_lock (&np->userlock, &newpi->po->lock_status, > 476 &np->lock, LOCK_SH); > 477 if (error) > 478 ports_port_deref (newpi); /* Get rid of NEWPI. */ > 479 } > > i.e. someone tried to open @test exclusively, but it failed (EINTR), > and thus we drop newpi. The problem is that since that's the last > reference, it cals _ports_complete_deallocate, which ends up calling > diskfs_release_peropen which tries to acquire the lock on np, which we > _already_ have, thus the deadlock, which quickly propagates to the /. > > It looks like we have the same problem with diskfs_create_protid().
Your patch (which I see has been applied) seems to be clearly the Wrong Thing in the case where NP == DNP. Look at the out: sequence at the end of the function. If NP == DNP, we are holding two references to the node, ("one under each variable name") but only one lock. You've released the lock, and the later nput(dnp) will produce only disaster. I believe your patch is the wrong solution to this problem. The calling sequence for diskfs_release_peropen expects to be called only for an unlocked node. The right thing to do is to throw NEWPI away at the very end of the function. Can you please revert your patch, and put something like the following in the OUT: sequence right before the return? if (newpi && error) ports_port_deref (newpi); (and, in the much rarer case where diskfs_create_protid fails, make a similar adjustment to throw away NEWPO only after the unlocks have occurred at the end of the function). Thomas