On Sun, Oct 24, 2010 at 05:21:06AM +0000, David Holland wrote: > > Because individual write() calls are supposed to be atomic, I don't > think there is such a thing as a locking improvement that'll help with > this behavior. :-/
I think write() only needs to lock the the file enough to ensure that the file offset is correct. Possibly the written range needs locking against other accesses - but I think the app is supposed to use file locking for that (and mmap will always be non-atomic w.r.t. write). Actually if 2 writes are issued for the same part of a file, the kernel can act as if they were requested in either order - since the app(s) cannot know the order the calls would be made in! Which means it could just sleep the 2nd write until the first terminates! Writes with O_APPEND (and writes that extend the file) are more problematical since you cant allow a second such write to start until the first has completed - for instance it might try to read from an unmapped user-space address and return a short length. > Except I guess going to some kind of multiversion model for vnodes. Don't you just need 2 locks? One for locking the data areas, and the other for the file data itself. David -- David Laight: da...@l8s.co.uk