Hi Yuri, If you don't like Python, like myself (and I'm not alone, it would seem), you could try Ruby (http://www.ruby-lang.org/en/). Some examples of PDB file manipulation are below (taken from [1]).
The language is a great improvement in Perl and Python in my opinion, but the downside is that there aren't as many crystallographic features as CCTBX et al, I don't think. Yours, Mark [1] http://bioruby.open-bio.org/wiki/SampleCodes Structure I/O and Manipulation How do I read a structure file? The current implementation only reads PDB format files, there will hopefully be code for parsing mmCIF and PDB XML files in the future. Adding these new formats will probably change the API for reading and writing structure files. Given the above caveat, the current way to read a PDB file is to slurp the file into a string which is fed to the constructor of the Bio::PDB class: require 'bio/db/pdb'string = IO.read('pdb1tim.ent') structure = Bio::PDB.new(string) The new Bio::PDB object now contains all the annotations and coordinate data. How do I write a structure file? As with reading, the current implementation can only write in PDB format. All the coordinate classes in the Bio::PDB heirarchy have .to_s methods which return the object in PDB format. This makes output as simple as: require 'bio/db/pdb' file = File.new('pdb1tim.ent').gets(nil) structure = Bio::PDB.new(file)puts structure.to_s # Writes whole structureputs structure[nil]['A'].to_s # Writes chain A only How do I access annotations? The annotations from the PDB file are stored in Bio::PDB::Record objects. You can retrieve a specific annotation by calling the '.record' method of a Bio::PDB object with the name of the annotation as the argument: structure.record("HEADER") In fact '.record' returns an array of all Bio::PDB::Records of the given type, so you'll need to call '.first' to get to the actual Bio::PDB::Record. Bio::PDB::Record provides methods to get to the individual data in each record. E.g. The 'HEADER' record provides classification, depDate and idCode methods. You can look at the PDB format to see what fields are available in each record, or just look in the pdb.rb file which has a hash of all the definitions. So, to get the title of a PDB file you would do it like this: structure.record('TITLE').first.title Some records have their own special methods: structure.remark(num) #Returns hash of the REMARK records based on 'num' structure.jrnl #Returns a hash of the JRNL records structure.jrnl('TITL') #Returns an array of all the TITL subfields from #the JRNL records structure.helix(id) #Returns the HELIX records based on 'id' #Returns an array if no 'id' is given structure.turn #Same interface as '.helix' structure.sheet #Same interface as '.sheet' structure.seqres(id) #Returns the sequence given in the SEQRES record #as a Bio::Sequence::AA object structure.keywords #returns a flattened lsit of keywords And some methods are included for Bio::DB compatability: structure.entry_id #Returns idCode structure.accession #Same as entry_id structure.definition #Title structure.version #REVDAT How do I access the coordinate data? Coordinate data is stroed in a heirarchy of classes with Bio::PDB at the top. A Bio::PDB object contains one or more Bio::PDB::Model objects which contain one or more Bio::PDB::Chain objects which contain one or more Bio::PDB::Residue objects which contain Bio::PDB::Atom objects. There are two ways to access the coordinate data in a Bio::PDB object: keyed access and iterators. Keyed access provides the simplest way to get to data if you know which residues or atoms you're interested in. For instance if you wanted to get hold of chain 'A' you can do it like this: chain = structure[nil]['A'] The nil refers to the model, which will be nil in crystal structures but may have a number in NMR structures. Every level in the heirarchy can be accessed this way. E.g. to get hold of the CA atom of residue 100 in chain 'A': structure[nil]['A']['100']['CA'] or if you still have your chain object: chain['100']['CA'] The residue identifier is not just the residue number. PDB residues can have insertion codes and even negative numbers. chain['100A']['CA'] could be valid. Iterators are also provided so you can easily go through all objects in the heirarchy. E.g: structure.each{ |model| model.each{ |chain| chain.each{ |residue| residue.each{ |atom| puts atom } } }} Goes through every atom in the structure and outputs it. All the classes in the heirarchy implement the standard Enumerable and Comparable mixins as well. Each class has a number of accessors to allow access to the data, most is in the Bio::PDB::Atom class: - Bio::PDB::Model has a 'model_serial' accessor only. - Bio::PDB::Chain has an 'id' accessor for getting and setting the chain id. - Bio::PDB::Residue has accessors for resName, resSeq and iCode. - Bio::PDB::Atom has accessors for serial, element, alt_loc, x, y, z, occ, and bfac To find the x coordinate of the CA atom of residue 100 is then: structure[nil]['A']['100']['CA'].x How do I make a deep copy of a structure object? You can't! There seem to be problems implementing a deep copy function in BioRuby. This is to be fixed in future versions. Note to developers: have we tried the time-old hack of Marshal.load(Marshal.dump(obj)) to do the trick? How do I measure distances and angles? Methods for measuring distance and dihedral angles are provided in the Utils module. To measure distance, use the Bio::PDB::Utils.distance method: res1 = structure[nil]['A']['100'] res2 = structure[nil]['A']['101'] distance = Bio::PDB::Utils.distance(res1['CA'].xyz, res2['CA'].xyz) Bio::PDB::Utils.distance requires two Vectors as its arguments. Bio::PDB::Coordinate objects, which are produced by the .xyz call to the Bio::PDB::Atom object, are Vectors. You can also provide a Vector directly: distance = Bio::PDB::Utils.distance(res1['CA'].xyz, [10,10,10]) And use the .centreOfGravity and .geometricCentre methods, both of which provide a Bio::PDB::Coordinate: distance = Bio::PDB::Utils.distance(res1.centreOfGravity, res1.geometricCentre) All objects in the Bio::PDB heirarchy implement the centreOfGravity and geometricCentre methods. Dihedral angles are calculated from four Vectors / Bio::PDB::Coordinates: phi = Bio::PDB::Utils.dihedral_angle(res1['C'].xyz, res2['N'].xyz, res2['CA'].xyz, res2['C'].xyz) psi = Bio::PDB::Utils.dihedral_angle(res2['N'].xyz, res2['CA'].xyz, res2['C'].xyz, res3['N'].xyz) How do I find specific atoms, residues, chains or models? If the iterators and keyed access aren't powerful enough for you, the finder method is also provided. All the objects in the Bio::PDB hierarchy implement finder. It requires the class of object you wish to find as the first argument, and then a block which is evaluated for each object. If the block returns true then that object is added to an array of 'found' things. For example, to find all the atoms within a box: atoms_in_box = structure.finder(Bio::PDB::Atom){ |atom| atom.x.between?(10,20) && atom.y.between?(20,30) && atom.z.between?(30,40)} Or, to find all the HIS residues, in chain 'A' can be written like this: his_residues = structure[nil]['A'].finder(Bio::PDB::Residue){ |res| res.resName == 'HIS'} or like this: his_residues = structure.finder(Bio::PDB::Residue){ |res| res.resName == 'HIS' && res.chain.id == 'A'} Since the results are returned in arrays, it is very simple to do set operations, such as finding the overlap or difference between two selections: sel1 = structure.finder(Bio::PDB::Atom){ |atom| atom.residue.resName == 'HIS'} sel2 = structure.finder(Bio::PDB::Atom){ |atom| atom.x.between?(10,20) && atom.y.between?(20,30) && atom.z.between?(30,40)} sel3 = sel1 & sel2 sel1 = sel1 - sel3 Selection 3 now contains all the HIS atoms within the box and selection 1 contains all the HIS atoms outside the box. How do I work with ligand data? Because some PDB files reuse residue numbers that they already used in the ATOM records in the HETATM records, we have to add an extra flag when we want to access HETATM data. The extra flag is simply adding 'LIGAND' to the residue id. E.g Given the PDB file ATOM 1 N GLY A 2 -5.308 22.647 33.657 1.00 59.93 ATOM 2 CA GLY A 2 -5.090 23.965 33.005 1.00 53.36 ATOM 3 C GLY A 2 -6.209 24.197 32.021 1.00 52.56 ATOM 4 O GLY A 2 -7.000 25.134 32.176 1.00 55.02 .... HETATM 2097 O8 PHT A 2 -18.122 40.603 18.146 1.00 16.14 HETATM 2098 O9 PHT A 2 -17.348 39.940 20.109 1.00 16.06 HETATM 2099 C10 PHT A 2 -15.622 41.753 16.662 1.00 13.34 HETATM 2100 O11 PHT A 2 -16.077 42.457 15.721 1.00 16.27 Asking for structure[nil]['A']['2'] is ambiguous, so if you want the ligand data you must use structure[nil]['A']['LIGAND2'] There should also be an .each_ligand method in the Bio::PDB::Chain class for iterating through the ligands, but this is not implemented in the current cvs version. How do I work with solvent? Solvent is defined in the current parser as any HETATM record with HOH as the residue type. This may not be ideal for some files, but deals with 95% of cases which is good enough for me at the moment! Solvent residues are stored in a separate Bio::PDB::Chain attached to each Bio::PDB::Model. Currently the only methods available are 'addSolvent' which adds a residue to the solvent chain and 'removeSolvent', which sets the whole solvent chain to nil. An each_solvent method for Bio::PDB::Model has been implemented in my copy but is not in cvs at the moment. How do I get the sequence of a chain? There are two types of sequence in a PDB file: the real sequence of whatever protein has been crystalised, and the sequence observable in the structure. The first type of sequence is available through the SEQRES record (if this is provided in the PDB file). You can access this through the top-level Bio::PDB object: structure.seqres['A'] provides the sequence given in the SEQRES record for chain A. The observed sequence is obtained from the Bio::PDB::Chain object. E.g. structure[nil]['A'].atom_seq Both methods return a Bio::Sequence::AA object. On 24 January 2012 04:46, Yuri Pompeu <yuri.pom...@ufl.edu> wrote: > Hello Everyone, > I want to play around with some coding/programming. Just simple > calculations from an input PDB file, B factors averages, occupancies, > molecular weight, so forth... > What should I use python,C++, visual basic? > thanks >