On Thu, Oct 18, 2012 at 03:43:25PM -0400, John Baldwin wrote: > On Thursday, October 18, 2012 12:42:18 pm Konstantin Belousov wrote: > > On Thu, Oct 18, 2012 at 09:39:34AM -0400, John Baldwin wrote: > > > On Thursday, October 18, 2012 4:35:37 am Konstantin Belousov wrote: > > > > On Thu, Oct 18, 2012 at 10:08:22AM +1000, Tristan Verniquet wrote: > > > > > > > > > > I want to work with large (1-10G) files in memory but eventually sync > > > > > them back out to disk. The problem is that the sync process appears to > > > > > lock the file in kernel for the duration of the sync, which can run > > > > > into minutes. This prevents other processes from reading from the file > > > > > (unless they already have it mapped) for this whole time. Is there > > > > > any way to prevent this? I think I read in a post somewhere about > > > > > openbsd implementing partial-writes when it hits a file with lots of > > > > > dirty pages in order to prevent this. Is there anything available for > > > > > FreeBSD or is there another way around it? > > > > > > > > > No, currently the vnode lock is held exclusive for the whole duration > > > > of the msync(2) syscall or its analog from the syncer. > > > > > > > > Making a change to periodically drop the vnode lock in > > > > vm_object_page_clean() might be possible, but requires the benchmarking > > > > to make sure that we do not pessimize the common case. Also, this opens > > > > a possibility for the vnode reclamation meantime. > > > > > > You can simulate this in userland by breaking up your msync() into > > > multiple > > > msync() calls where each call just syncs a portion of the file. > > Be aware that this is much-much slower than msyncing the whole file, even > > if file is very large. The reason is that pager initiates asynchronous > > _immediate_ clustered write for such situations. Async writes (AKA > > bdwrite()) are only specified for full range msyncing. > > Ugh. It would seem to me that msync(MS_ASYNC) should be doing delayed > writes. The vm_pager_putpages() is called with the VM_PAGER_CLUSTER_OK flag for MS_ASYNC, according to my reading of the code. This results in neither IO_SYNC nor IO_ASYNC flags passed to VOP_WRITE() from vnode_pager_generic_putpages().
Since the mapped regions are typically large enough to mmap the whole fs blocks, the code in ffs_vnops.c:ffs_write() ends up in the cluster_write(). Usually, fully populated cluster is written asynchronously. > > > > > Anyway, note that you cannot 'work with large files in memory', even if > > > > you have enough RAM and no pressure to hold all the file pages resident. > > > > The syncer will do a writeback periodically regardless of the > > > > application > > > > calling msync(2) or not, with the interval of approximately 30 seconds. > > > > > > You can mmap with MAP_NOSYNC to prevent the syncer from writing the file > > > out > > > every 30 seconds. > > > > This also prevents msync(2) from syncing the region. The flag is fine > > for throw-away data, but not for the scenario that was described, I > > think. > > Oof. I could see that in certain situations you might want to control this > behavior from an application (similar to how I now make use of fadvise() at > work). Having a way to disable syncer but having msync(MS_ASYNC) do > something useful would be good. I was wrong there, sorry. Only syncer and fsync(2) would ignore VPO_NOSYNC pages.
pgpgMLIMbKRYN.pgp
Description: PGP signature