OK, sounds like completely different scenarios. We haven't been using rsync locally to do copies or things like that.
-----Original Message----- From: Linda Walsh [mailto:[EMAIL PROTECTED] Sent: Monday, December 18, 2006 2:42 PM To: Rob Bosch Cc: 'Matt McCutchen'; 'rsync' Subject: Re: File Fragmentation issue...especially relating to NTFS... Hey Rob, some clarifications/comments...:-) I was talking about the undesirable fragmentation that results from using rsync as a type of smart-copy -- copying files from a source to a destination with the target "rsync" invoked by the source-tree process (i.e. rsync isn't a daemon). For me, that's the most common case. Rsync and other cygwin-based utils don't operate as well as the native Windows Copy or Xcopy commands. I never use rsync in daemon mode due to security paranoia. I use 'ssh' when connecting to remote systems. I do use rync with "RSYNC_RSH =/usr/bin/ssh". It is also rare that I have multiple rsyncs copying to the same target system or have multiple rsync commands copying/updating directory trees on the same system, locally. My test case, that produced bad fragmentation, was using only 1 writer in the case for rsync copying from one source host to another. It came out of this same topic (File Fragmentation) arising on the cygwin list, where there was a general problem with the *nix utils producing poor fragmentation performance on NTFS. I did tests of two linux file systems that had utils that listed #fragments on their file system (only ext2/3 and xfs had such a util). I used sysinternals "Contig" program to show fragmentation on NTFS. Writing the entire in 1 large write (using a 64M test file), provided the fewest fragments on NTFS. The write size didn't make as much of a difference on a heavily fragmented ext2/ext3 disk, with over 500 fragments being produced for the 64M file in my test cases. The write size also didn't make as much of a difference on XFS: it reliably put the 64M file in 1 contiguous fragment (extent) in all of my tested block sizes. These were all cases using 1 writer/disk. For NTFS, what seemed to be true was that fragmentation could be positively affected by writing output files in 1 write call -- buffering the entire file in memory before writing. For example, if I use rsync to copy 1 file to a target host where the target host uses NTFS, rsync writes in "smallish" chunks (better than the gnu-file utils, though). As a result, NTFS looks for a contiguous space for each of the "smallish" chunks (whatever rsync's file-write size is). In this simplistic case, on NTFS, rsync can achieve performance as optimally as the native windows utilities like the shell Copy and xcopy commands internet explorer by buffering the entire file in memory and doing the final write in 1 write call (this presumes your file fits in memory, which is another "issue"). In the single-writer case, there doesn't seem to be any performance penalty that I could easily measure. As an aside, IE downloads a file first to a tempdir. Given slow download speeds compared to other operations, it might have been presumed that users might engage in other activities while a large file download was in progress -- this would approximate the multiple-writers case. As for multiple writers -- one might have to use the falloc type call, which may be more effective for a heavily loaded "server" process writing multiple files at the same time, as it might be too memory intensive to buffer all incoming files completely in memory before writing them out. I do wonder, though, since the "falloc" type call (or windows equivalent) is presumably more efficient than downloading to a tmpdir and then copying to the final destination, why wouldn't the MS ID designers have used the call? MS has a history of using internal or specialized MS calls to make their products perform better than competitors. I'd think the "falloc" call would be a perfect candidate for speeding up file download performance in their browser (in the sense that it is a type of file-transfer "agent"). Maybe they were just being lazy? Rob Bosch wrote: > The fragmentation we see on NTFS is due to so many streams writing to the > same disk when multiple rsync clients are sending data to the rsync daemon. > Windows will not reserve space for any of the new files unless the > posix_fallocate function is used. The writes occur as the data comes > in...resulting in extremely high fragmentation. The testing I've done with > Matt's patch has shown the posix_fallocate function nearly eliminates the > fragmentation issue with little to no performance penalty even with very > large files (50GB+). > > I'd be happy to hear the results of additional strategies though so please > post what you find out! --- I would be too! :-) Seriously -- I think I need to get some more HW here, my previously used "test disk" has gotten a bit too full to be real useful. I'm not sure I'm quite as interested in the multi-writer case from a practical standpoint: I don't see too many cases where a single user using rsync is going to see this behavior. Additionally, even on a server, which might be the target of multiple writers, do you have any information on how often one actually sees multiple writers? Statistically, I wouldn't think it would be often. _If_ you agree that it isn't the common or likely case, I'd also be concerned about how much (if it does "at all") it affects throughput. Rsync is, unfortunately slower than either cp or tar when the target tree is not present. I don't have figures at hand, but I seem to remember it being 20-30% slower than either in local file-transfers, and notably slower than "tar <input> |ssh remote-sys tar <output>" (with rsync using RSH=ssh). While I would guess that rsync is used more often to update (vs. create), I think it would be "great", if it had performance in "create" cases that had "worst" cases only 5% worse than cp or tar (though preferably 0% worse, or _better_, (assuming it's possible; may not be). Rsync's speed in "create" cases vs. cp and tar is more often an issue for me than file-fragmentation, as I have defragmentation run nightly. I.e. -- "bottom" line -- does "fadvise" have a negative performance impact? In my usage, that would be less desirable than concerns about fragmentation (which get addressed by nightly defrags). Linda -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html