Here's a patch to add PDB detection. It has a ten-character magic sequence at the start ("HEADER "), and I make sure that it at least looks something like a PDB file before concluding that it is. There are regex tests, but they don't actually run unless the initial string match succeeds, so I don't think the performance hit is particularly severe.
I don't have a good source of PDB files, so if this magic fails (either false positive *or* false negative), please attach one or more samples to this bug report and I'll try to adapt the patch. Adam Buchbinder
--- file/magic/Magdir/scientific 2009-02-16 10:59:52.000000000 -0500 +++ file/magic/Magdir/scientific 2009-02-18 16:34:11.000000000 -0500 @@ -69,3 +69,32 @@ 0 string \060\000\040\000\110\000\105\000\101\000\104\000 GEDCOM data 0 string \376\377\000\060\000\040\000\110\000\105\000\101\000\104 GEDCOM data 0 string \377\376\060\000\040\000\110\000\105\000\101\000\104\000 GEDCOM data + +# PDB: Protein Data Bank files +# +# Adam Buchbinder <adam.buchbin...@gmail.com> +# +# http://www.wwpdb.org/documentation/format32/sect2.html +# http://www.ch.ic.ac.uk/chemime/ +# +# The PDB file format is fixed-field, 80 columns. From the spec: +# +# COLS DATA +# 1 - 6 "HEADER" +# 11 - 50 String(40) +# 51 - 59 Date +# 63 - 66 IDcode +# +# Thus, positions 7-10, 60-62 and 67-80 are spaces. The Date must be in the +# format DD-MMM-YY, e.g., 01-JAN-70, and the IDcode consists of numbers and +# uppercase letters. However, examples have been seen without the date string, +# e.g., the example on the chemime site. + +0 string HEADER\ \ \ \ +>&0 regex/1 \^.{40} +>>&0 regex/1 [0-9]{2}-[A-Z]{3}-[0-9]{2}\ {3} +>>>&0 regex/1s [A-Z0-9]{4}.{14}$ +>>>>&0 regex/1 [A-Z0-9]{4} Protein Data Bank data, ID Code %s +!:mime chemical/x-pdb +>>>>0 regex/1 [0-9]{2}-[A-Z]{3}-[0-9]{2} \b, %s +