Hi, Fatfs still has serious locking problems write writing and I had a close look at this today.
First I need to separate the problem into sub-problems: -- The diskfs_node_refcnt_lock problem -- diskfs_node_refcnt_lock can be locked while write_node tries to lock it. The problem that is (perhaps) easy to fix is the problem caused by diskfs_drop_node. This function locks diskfs_node_refcnt_lock until the function returns. In the function (so while diskfs_node_refcnt_lock is locked) diskfs_node_update is called. Diskfs_node_update calls the function write_node in fatfs (indirectly) if the node is the disk node structure is dirty. Because fatfs has to lookup the directory that holds the node that should be updated it (or one of the functions used for that, such details don't matter right now) it has to lock diskfs_node_refcnt_lock. Ofcourse it is _not_ an option not to lock diskfs_node_refcnt_lock!! This problem is hopefully not too hard to fix, here are some possible solutions: - Make sure that for every opened file the directory is know so we don't have to look it up. Just add a "struct node *dirnode" to the "struct dirnode" of fatfs. (fatfs only solution, evading the problem) - Make sure diskfs_drop_node doesn't call diskfs_node_update while diskfs_node_refcnt_lock is locked. This requires some careful libdiskfs hacking and could be really hard. (generic solution, fixing the problem) Personally I consider this a libdiskfs bug. Other filesystems can have the same problem someday. -- The directory already locked problem -- Fatfs locks a directory when it is already locked Fatfs needs to lock the directory that holds a file in order to write the meta-data of a file to disk. This happens in the function write_node. The problem is that this directory could have been locked by one of the callers of write_node. So locking the directory is not safe because the directory could have been locked (and that makes the thread hang). Not locking the directory isn't safe either because it introduces race conditions. We can't make a special case here like we did for read_node because there are too many callers, all with different behavior. I propose to modify the interfaces for diskfs_update_node and diskfs_write_disknode. By adding a flag to notice these functions if the directory is locked, or even better by passing the "dp" to the function if it is locked (as Roland once proposed for read_node). I even like to make similar changes for diskfs_cached_lookup so fatfs and other filesystems that have similar problems can support nfs. I will put these two issues as a task for fatfs on savannah in some days if no-one has a problem with that. Thanks, Marco _______________________________________________ Bug-hurd mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-hurd