> Date: Thu, 25 Apr 2013 02:16:33 +0200 > Cc: e...@gnu.org, bug-make@gnu.org > From: Frank Heckenbach <f.heckenb...@fh-soft.de> > > > > On Windows, you said fstat was very expensive, didn't you? Or is > > > lseek even worse? > > > > I think anything that potentially moves the file pointer can be > > sometimes expensive and is best avoided. (On Windows, I'd use > > GetFileInformationByHandle.) > > OK, if that's so, do that. But I don't think that's true on POSIX.
I don't think it's worth doing on Windows as well, see below. > > > Nothing is actually read by lseek (and even if it were, it would > > > only need to look at the first and last part of the file, not read > > > all the content, if that was the worry). > > > > Are you sure? How can lseek "jump" to the last byte of the file, if > > the file is not contiguous on disk, except by reading some of it? > > lseek doesn't need to read any data. It just sets the current offset > of the FD to the given position, so the next read (which in this > case never happens before seeking to the beginning) knows where to > read. Even in the case of SEEK_END, all it has to do is add the > given offset (here: 0) to the current file size. What I meant is that lseek doesn't just return the byte position, it also makes sure the next read or write happens at that position. So at some point, some piece of software needs to tell the disk to move its reading head to the right point. Whether this happens as part of lseek or the subsequent read/write, and whether this requires reading some of the data on the disk, is a matter of how this is implemented and what data structures does the filesystem maintain in memory at all times. > Instead of testing, I just looked at the implementation (Linux > 3.2.2). The following is really the whole relevant code. As you see, > nothing's read from the disk, it only handles in-memory data. (Also > the file size is in memory for open files; even it were not, it > would be a constant-time access to the inode and wouldn't need to > touch any data blocks.) I timed lseek on Windows on very large files (hundreds of MBs), and found that a single lseek takes less than 1 usec, at least with NTFS volumes and in my core i7 box. So, while more efficient ways of revealing whether the file is empty are possible, I don't think such a small penalty justifies yet another set of ifdef's. The only situation where lseek could be really expensive is if the volume is compressed, because lseek on Windows returns the uncompressed offsets in that case. I don't have access to any machine which has such volumes, so I cannot test that. (Does Unix support such filesystems? If so, what does lseek do there?) _______________________________________________ Bug-make mailing list Bug-make@gnu.org https://lists.gnu.org/mailman/listinfo/bug-make