On Tue, Jan 17, 2006 at 02:36:44PM -0500, Daniel Ouellet wrote: > [...] But having a > file that is let say 1MB of valid data that grow very quickly to 4 and > 6GB quickly and takes time to rsync between servers were in one instance > fill the fill system and create other problem. (:> I wouldn't call that > a feature.
As Otto noted, you've distinguish between file size (that's what stat(2) and friends report, and at the same time it's the number of bytes you can read sequentially from the file), and a file's disk usage. For more explanations, see the RATIONALE section at http://www.opengroup.org/onlinepubs/009695399/utilities/du.html (You may have to register, but it doesn't hurt) See also the reference to lseek(2) mentioned there. > But at the same time, I wasn't using the -S switch in rsync, > so my own stupidity there. However, why spend lots of time processing > empty files I still don't understand that however. Please note that -S in rsync does not *guarantee* that source and destination files are *identical* in terms of holes or disk usage. For example: $ dd if=/dev/zero of=foo bs=1m count=42 $ rsync -S foo host: $ du foo $ ssh host du foo Got it? The local foo is *not* sparse (no holes), but the remote one has been "optimized" by rsync's -S switch. We recently had a very controverse (and flaming) discussion at our local UG on such optimizations (or "heuristics", as in GNU cp). IMO, if they have to be explicitely enabled (like `-S' for rsync), that's o.k. The other direction (copy is *not* sparse by default) is exactly what I would expect. Telling wether a sequence of zeroes is a hole or just a (real) block of zeroes isn't possible in userland -- it's a filesystem implementation detail. To copy the *exact* contents of an existing filesystem including all holes to another disk (or system), you *have* to use filesystem-specific tools, such as dump(8) and restore(8). Period. > I did research on google for sparse files and try to get more > informations about it. In some cases I would assume like if you do round > database type of stuff where you have a fix file that you write in at > various place or something, would be good and useful, but a sparse file > that keep growing over time uncontrol, I may be wrong, but I don't call > that useful feature. Sparse files for databases on heavy load (many insertions and updates) ar the death of performance -- you'll get files with blocks spreaded all over your filesystem. OTH, *spare* databases such as quotas files (potentially large, but growing very slowly) are good candidates for sparse files. Ciao, Kili