On 4/13/05, John W. Krahn <[EMAIL PROTECTED]> wrote: > Jay Savage wrote: > > On 4/12/05, John W. Krahn <[EMAIL PROTECTED]> wrote: > > > >>>while (!eof) { > >> > >>You need to specify the filehandle you are reading from. > > > > A bare eof operates on the last filehandle read from. perldoc -f eof. > > The only place I'm going to get EOF from the OS here is IMG, and it > > works as expected. > > OK sorry, the only time I've used eof is when processing files through <> to > reset the $. variable. Generally there is no need to use eof at all. > >
I don't do this much, either. And now I remember why. > >>The second problem is that $eofsec contains multiple bytes and may (and > >>probably does) overlap $sector boundaries. The easy solution would be to > >>read > >>the entire file into a variable. > > > > I thought about this, but everything I've read assumes that the files > > begin on sector boudaries. > > That is usually the case, but are you sure that the "sectors" are 512 bytes in > length? > I'm comfortable with this. All the FAT16 docs say "almost always". Clusters can be of varying sizes, but they're multiples of 512 bytes sectors. And empirically, it works on my test data. > > > The code I borrowed from only eaxmined the > > first 4 bytes of each read, and the regex works anchored at the > > beginning of the string. > > Yes the $magic regex does but I was talking about the $eofsec regex. > > I was thinking the start at sector (or cluster, as I later discovered) boundary behavior was part of the low probability: if everything starts on sector, files with split EOF/EOI markers must be a multiple of 512 +/- (length(EOF)-1) bytes long. That really narrows the choices. > > If I were only missing a few files, I'd say > > this was it. But I'm not picking up any. All the EOFs can't be split > > across boundaries, can they? Or can they? > > I would say that statistically it is impossible, but I'm no expert. :-) > > > > It's worth checking into > > again. I was hoping to avoid reading the entire image into memory, > > though. > > Are you *certain* that the values of $magic and $eofsec are correct? I ask > because the JPEG FAQ says this: > Yes. The FF D8 FF E1 string is widely used, and I've verified it for my data with hexdump. Apparently someone's decided to use a non-standard APPO for digital camera files. It would make sense that they're "raw", but I haven't gone really digging through the headers to verify that. It may be simple non-confomance to the standard. > > [11] How do I recognize which file format I have, and what do I do about > > it? > > > > If you have an alleged JPEG file that your software won't read, it's likely > > to be HSI format or some other proprietary JPEG-based format. You can tell > > what you have by inspecting the first few bytes of the file: > > > > 1. A JFIF-standard file will start with the four bytes (hex) FF D8 FF E0, > > followed by two variable bytes (often hex 00 10), followed by 'JFIF'. > > > > 2. If you see FF D8 at the start, but not the 'JFIF' marker, you may have a > > "raw JPEG" file. This is probably decodable as-is by JFIF software --- > > it's worth a try, anyway. > > > > 3. HSI files start with 'hsi1'. You're out of luck unless you have HSI > > software. Portions of the file may look like plain JPEG data, but they > > won't decompress properly with non-HSI programs. > > > > 4. A Macintosh PICT file, if JPEG-compressed, will have several hundred > > bytes of header (often 726 bytes, but not always) followed by JPEG data. > > Look for the 3-byte sequence (hex) FF D8 FF --- the text 'Photo - JPEG' > > will usually appear shortly before this header, and 'JFIF' or > > 'AppleMark' > > will usually appear shortly after it. Strip off everything before the > > FF D8 FF and you should be able to decode the file. > > > > 5. Anything else: it's a proprietary format, or not JPEG at all. If you > > are > > lucky, the file may consist of a header and a raw JPEG data stream. > > If you can identify the start of the JPEG data stream (look for FF D8), > > try stripping off everything before that. > > Also the standard EOI (end of information) marker is "\xFF\xD9". I haven't had any luck finding this in a useful place, either. It seems to appear about halfway through, when it appears at all. ThSince these are munged or deleted files to begin with, it's not really a surprise. That's why I was hoping for a file(-system) EOF instead fof stream EOI. I probably should have made that clearer. I'm thinking now that I should probably just take what I can get from the recovery I have, and postprocess the recovered files, instead of trying to get the recovery itself to be clean. Thanks, jay -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>