In article <aanlktinpuyzl5laqbv-b3bux6ozyd6+umpxrptqh7...@mail.gmail.com>, Tom Potts <karake...@gmail.com> wrote: > Hi, all. I'm not sure if this is a bug report, a feature request or what, > so I'm posting it here first to see what people make of it. I was copying > over a large number of files using shutil, and I noticed that the final > files were taking up a lot more space than the originals; a bit more > investigation showed that files with a positive nominal filesize which > originally took up 0 blocks were now taking up the full amount. It seems > that Python does not write back file holes as it should; here is a simple > program to illustrate: > data = '\0' * 1000000 > file = open('filehole.test', 'wb') > file.write(data) > file.close() > A quick `ls -sl filehole.test' will show that the created file actually > takes up about 980k, rather than the 0 bytes expected.
I would expect the file size to be 980k in that case. AFAIK, simply writing null bytes doesn't automatically create a sparse file on Unix-y systems. Generally, on file systems that support it, files become sparse when you don't write to certain parts of it, i.e. by using lseek(2) to position forward past the end of the file when writing, thereby implying that the intermediate blocks should be treated as zero when reading. Only files on certain file systems on certain platforms support operations like that. Python makes no claim to do that optimization in either its lower-level i/o routines or in the shutil module. The latter's copyfile just copies bytes from input to output. If you want to always preserve sparse files, you could use the GNU cp routine with --sparse=always. If you look at its code, you see that it checks for all-zero blocks when copying and then uses lseek to skip over them when writing. Something like that could be added to shutil, with the necessary tests for which platforms support it. If you are interested in adding that feature, you could write a patch and open a feature request on the Python bug tracker (http://bugs.python.org/). It's not likely to progress without a supplied patch and even then maybe not. -- Ned Deily, n...@acm.org -- http://mail.python.org/mailman/listinfo/python-list