I have two (related) enhancements to cp to propose. Code is attached. Enhancement 1: reading by blocks --------------------------------
Reading and writing of files is typically fastest and most efficient when you read or write blocks of a certain size. cp seems to understand this in that it uses the block size (according to stat()) of the source file as the buffer size for its copy. But cp makes only a feeble attempt to read whole blocks. If a read is short for any reason (the operating system is free to return less than the requested size any time it wants), all future reads are out of sync with the file's blocks. Also, the size returned by stat() isn't necessarily the best size. Modern filesystems are much more complicated than can be described by a single number like that. My enhancement is 1) add a --readsize option so the user can choose the buffer size. It still defaults to the stat() block size. 2) keep the reads synchronized to a multiple of the read size by filling in short reads. E.g. If a 4K read returns 3K, cp does a 1K read to resynchronize to a 4K boundary. This is a minor change to the code -- it just involves replacing the file read call with a subroutine that loops until it gets it all. Enhancement 2: handling of unreadable portions of source file ------------------------------------------------------------- Today, if cp encounters an unreadable stretch of file, it just quits. I have added two new alternatives, controlled by the new --errors option. In both, cp searches ahead in the file until it finds a readable portion. With --errors=zero, cp pretends it read zeroes in place of the unreadable portion. With --errors=skip, cp pretends the unreadable bytes just didn't exist; the resulting file is shorter than the source. Another new option, --errorgrain, tells how finely cp searches for the end of the bad area; it is the step size by which cp seeks forward, trying a read at each step. Default is 512 bytes. cp issues warnings at the end of each file copy telling how much data was lost. This is mostly just embellishment of the new read subroutine I mentioned above. There's also the option handling and the statistics reporting, but no structural changes to the existing copy logic. Purpose of error handling ------------------------- I need this because I have large cpio backup files that sometimes have media errors. A single sector is missing here and there from the file. I want to copy the file to a good disk and proceed to salvage all the data inside the backup file that is not affected by the missing sectors. cpio can generally resynchronize quite well if data after the error remains at the same offset, so I use cp --errors=zero. I can do some of this with dd if I'm desperate, but dd is technically for a rather lower level job -- directly driving a device driver -- not byte stream files. It doesn't for example, deal with short reads in a byte stream way. I have other backup volumes that contain full images of the original filesystem; I restore those using cp --archive. Again, if a single sector somewhere is bad, I'd rather have the cp complete on all the other files, and whatever it could save of the ruined file, than have to manually sort through the thousands of files in that directory and work around the bad ones. -- Bryan Henderson Phone 408-621-2000 San Jose, California
cperror.patch
Description: Binary data
_______________________________________________ Bug-coreutils mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-coreutils