On 2010-09-29, Ned Deily <n...@acm.org> wrote: ><aanlktinpuyzl5laqbv-b3bux6ozyd6+umpxrptqh7...@mail.gmail.com>, > Tom Potts <karake...@gmail.com> wrote: >> Hi, all. I'm not sure if this is a bug report, a feature request or what, >> so I'm posting it here first to see what people make of it. I was copying >> over a large number of files using shutil, and I noticed that the final >> files were taking up a lot more space than the originals; a bit more >> investigation showed that files with a positive nominal filesize which >> originally took up 0 blocks were now taking up the full amount. It seems >> that Python does not write back file holes as it should; here is a simple >> program to illustrate: >> data = '\0' * 1000000 >> file = open('filehole.test', 'wb') >> file.write(data) >> file.close() >> A quick `ls -sl filehole.test' will show that the created file actually >> takes up about 980k, rather than the 0 bytes expected. > > I would expect the file size to be 980k in that case. AFAIK, simply > writing null bytes doesn't automatically create a sparse file on Unix-y > systems.
Correct. As Ned says, you create holes by seeking past the end of the file before writing data, not by writing 0x00 bytes. Here's a demonstration: Writing 0x00 values: $ dd if=/dev/zero of=foo1 bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 0.0315967 s, 332 MB/s $ ls -l foo1 -rw-r--r-- 1 grante users 10485760 Sep 29 15:32 foo1 $ du -s foo1 10256 foo1 Seeking, then write a single byte: $ dd if=/dev/zero of=foo2 bs=1 count=1 seek=10485759 1+0 records in 1+0 records out 1 byte (1 B) copied, 8.3075e-05 s, 12.0 kB/s $ ls -l foo2 -rw-r--r-- 1 grante users 10485760 Sep 29 15:35 foo2 $ du -s foo2 16 foo2 -- Grant Edwards grant.b.edwards Yow! Please come home with at me ... I have Tylenol!! gmail.com -- http://mail.python.org/mailman/listinfo/python-list