[EMAIL PROTECTED] writes: > > vic:/# tar -xvzf /mnt/backup-compaq/home.tar.gz > home/ > home/victor/ > home/victor/.R/ > home/victor/.R/help.db > tar: Skipping to next header > tar: Archive contains obsolescent base-64 headers > > vic:/# gunzip /mnt/backup-compaq/home.tar.gz > gunzip: /mnt/backup-compaq/home.tar.gz: invalid compressed data--crc error > gunzip: /mnt/backup-compaq/home.tar.gz: invalid compressed data--length error > vic:/#
Vittorio: Don't kill yourself quite yet. These are the usual symptoms of corruption in the middle of a gzip-compressed archive. If it was only a couple of corrupt blocks, you have a pretty good chance of recovering most of your data. "zcat" will happily plow through the corrupt compressed file and generate uncompressed text as best it can. Because of the nature of the compression algorithm, a single bad compressed input block will result in a long string of corrupt uncompressed output blocks. Eventually, the decompression state will probably resynchronize (though this is not 100% guaranteed, it'll probably happen within a few hundred kilobytes). Assuming this happens, "zcat" will start generating good output again. The trouble is, the output won't generally be properly aligned, so "tar" (which started skipping 512-byte blocks at the first bad header) won't find another header aligned at the start of a 512-byte block, and will gobble up the whole file without finding anything else to untar. So, how do you realign? Well, assuming most of the files in your tarfile are from the "home/vic/" directory, every header block will start with the string "home/vic/". You can run the following one-liner: zcat /mnt/backup-compaq/home.tar.gz | perl -ne 'm,home/vic/, && do { ++$count[($l + length($`)) % 512] }; $l += length($_); END { for (0..511) { printf "%3d %5d\n", $_, $count[$_] if $count[$_] } }' | sort -nr +1 | head -20 (I've formatted it onto multiple lines, but it should be typed in as a single line.) This simply counts the number of times the string "home/vic/" appears in the file at each possible offset within a 512-byte block. It outputs, at most, the 20 most frequent offsets. For a small, corrupt tarfile, the output might look like: 165 291 0 27 18 2 398 1 The left column list the offsets; the right column gives the count of each offset. This indicates that the string appeared 291 times at offset 165, 27 times at offset 0, and a handful of times at offsets 18 and 398. These last two are false positives (occurrences of "home/vic/" that weren't from a tar header). The 27 occurrences of 0 are the headers before the corruption. After the corruption, when the decompressed stream recovered, it was offset by 165 bytes, and those 291 other headers are recovered files at the wrong offset. When you do this (since your tarfile is so gigantic), you'll probably have a good number of false positives. However, if there was corruption in only one place, there should be one non-zero offset that is overwhelmingly more frequent than the rest. Anyway, to recover files at a particular offset, for example offset 165, use the following: zcat /mnt/backup-compaq/home.tar.gz | tail -c +166 | tar tvf - Note that the number in the "tail" command should be one more than the offset output by the one-liner. Good luck! -- Kevin <[EMAIL PROTECTED]> -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]