On Wed, Apr 03, 2024 at 05:40:47PM +0000, Ice Cream wrote: > I'm trying to speed up sort(1) by using mmap(2) instead of temp > files. > > ftmp() (see code below) is called in the sort functions to create and > return a temp file. mkstemp() is used to create the temp file, then > the file pointer (returned by fdopen) is returned to the sort functions > for use. I'm trying to understand where and how mmap should come > into the picture here, and how to implement this feature.
I expect the intent was to mmap the temporary files to avoid the overhead incurred by stdio. (As opposed to just allocating memory, which as Mouse points out carries problems.) I'm not sure this is actually a good idea. Using raw file handles instead of stdio FILEs might provide some speedup (depending on the write patterns and how big the blocks written are) but it's never been entirely clear that mmap is actually substantively faster than using raw file handles. Meanwhile, there are several disadvantages: - as Mouse pointed out, you need to know the size in advance; - read or write errors on memory mapped files result in SIGSEGV, which is annoying to deal with and does actually turn up in the field sometimes (*); - even if you apply MADV_SEQUENTIAL with madvise(2) the mmap interface can't really do as good a job of prefetching; - on 32-bit platforms the size is limited. FWIW. > PS: It was mentioned in the TODO file > > speed up sort(1) by using mmap(2) rather than temp files I can't find this reference :-( -- David A. Holland dholl...@netbsd.org