"Peter S. May" <[EMAIL PROTECTED]> wrote: <SNIP>
> I don't know how the internals of "file" work. If I were trying to get > a generic file-like program to grok OpenPGP, here's probably how I'd go > about it: > > * If the first non-blank line started "--- BEGIN PGP ", it would > probably be reasonable to call it armored OpenPGP and perhaps look into > it further, to figure out a subtype. > * If the file program decides the file isn't any other type it > recognizes, take a look at the first byte of the file, which must be a > valid OpenPGP packet tag. You could run some or all of these tests > before passing the file on to GPGME, which would ultimately determine a > file's reasonable OpenPGP compatibility. Some assumptions based on bis-18: > > (in pseudocode, of course) > > function is_pgp_packet_tag (byte) > if byte & 0xC0 == 0xC0 // new format tag > tag_number = byte & 0x3f > else if byte & 0xC0 == 0x80 // old format tag > tag_number = (byte & 0x3c) >> 2 > else > return false // first bit is always set > > if tag_number == 0 > return false // 0 is reserved > > // the rest of the assumptions may change with future > // versions of the spec and need to be kept up to date > if tag_number == 15 or tag_number == 16 > return false // 15 and 16 are not currently defined > if tag_number >= 20 > return false > // Values 20 to 59 are not currently defined > // Values 60 to 63 are defined as private and GPG can't grok them > > After those checks, I would either pass the file on to GPGME or run one > more heuristic first: Read a packet header. If it's valid, extract the > length it specifies and jump forward that many bytes. Then repeat. If > any of the tags are !is_pgp_packet_tag(), or if the last length > specifier you find leads you past the end of the file, it's not OpenPGP. > Else, it has a significant chance of being formally correct. > > Might be too complicated a check for file, but I think it would work. > > PSM I was originally only going to respond to the Peter May out of group. The more I think about it, that would be the wrong thing to do. If what he has is what everybody can live with (I didn't see any objections) not only for now but into the forseeable future we are okay. If you can't live with it, speak up now and tell us WHERE we are going wrong! This discussion if continued will be going out of group. First, the file command does read into a --armor encrypted file and from what is on the very first line, it KNOWS what it is: $ file TOOMUCH.asc TOOMUCH.asc: PGP armored data message It is when you do NOT use --armor (-a) when file doesn't know what to do with it. The file command uses the magic database. On my system and most Linux systems it would be here but it will be in different places on different systems: $ ls -1 /usr/share/file magic # human readable for "file" command magic.mgc # binary USED by "file" command magic.mime # human readable for KMimeMagic magic.mime.mgc # binary USED by KMimeMagic You don't edit these files directly, They are created from source. You will NOT see the magic.mime* files if you don't have KDE. To know a little about magic, just do: man magic # this will tell where the magic files are man file You can see that the byte order can be easily handled as LONG as it doesn't start to conflict with something else. The file command can't use GPGME (what do you if it isn't there?). file needs to be self contained except for its database. If you look for ELF in the "magic" file, the very first thing you see is: # ORCA/EZ assembler: # # This will not identify ORCA/M source files, since those have # some sort of date code instead of the two zero bytes at 6 and 7 # XXX Conflicts with ELF file will NEVER identify that kind of a file because of a conflict with ELF. Usually, if there are conflicts, the people submitting the information will drop it if they have far less files. It isn't who is first that trumps the others. It is which file is most likely to be seen when you have collisions that wins out. Most people don't even know what an ORCA/EZ assembler file is. I picked ELF for a reason. If you look at how ELF does it you can see how they handle SOME of the conditionals which need to be handled for various big-endian / little-endian and chip bit sizes to arrive at the proper string. That would give you some idea of how to pick the proper strings for the encryption types. The only problem is, ELF ALWAYS starts with the first four bytes "\177ELF". We don't have that with a PGP encrypted file. We have multiple ways of starting, etc. There is a slight possibility of unrolling all that into MULTIPLE definitions but not just ONE. It still looks to me like what OpenPGP has done is incompatible with the file program. If you want to look into it further, I suggest we go off-group to do it, but ONLY if everybody is happy that your analysis is correct and COMPLETE! It looks awfully convoluted to me though (not your analysis - their multiple ways for creating an encrypted file). The file command never was designed with what OpenPGP has done in creating their files in mind. And if they add even more it will become even more impossible at putting the information into the magic database that file uses. So people better make sure they use the correct filename extension (.gpg or .pgp) when they create an OpenPGP encrypted file. That will probably be all we have to go on to identify what it is. We will need the OpenPGP programs to do the rest of the identification. HHH PS If I didn't know better, I would say they designed the various file header formats to be incompatible with the file command. _______________________________________________ Gnupg-users mailing list Gnupg-users@gnupg.org http://lists.gnupg.org/mailman/listinfo/gnupg-users