Re: [Open Babel] (no subject)
Dear Noel Thank you for your answer. Please see my comments bellow. 2017-05-22 16:00 GMT-03:00 Noel O'Boyle : > In other words, you want to assign atom types based on the structure. > Yes, that's right. > The source of the structure is immaterial except in so far as it > introduces noise. For example, to read a PDB file you need to guess > various things. To read a MOL file, you don't need to guess anything. > That noise is what we are trying to avoid by always calculating (guessing) things with the same algorithm. > Regarding your code, you should never throw away information and then > try to guess it. Well, that depend on your faith on the quality of the information putted in the input format. One can always set a flag to keep the input information if its considered accurate enough, but if you want consistency regarding the input file format I don't see other way but to strip off all the information in the input and recalculate it. Also, I note in passing that DeleteHydrogens() > doesn't delete anything, it just suppresses any explicit hydrogens. > I'm a bit unclear why you are using the internal Open Babel atom > types. Personally, I would avoid this as the atom types may not be > suitable. Instead, just implement your own atom type function to suit > your needs. Any atom typing can be implemented as a function that > takes an OBAtom* and returns the type, perhaps as an enum. > Are you referring to functions like "IsAmideNitrogen" or so?. We used these functions, and they worked just fine for our needs. The problem we faced was with "IsAromatic" that we couldn't make it input-format agnostic. Our guess is that some information of the input format is always remaining when calling it, regardless UnsetAromaticPerceived and the like were called before. This lead us to try the route of put all the atom types in internal Open Babel types and build upon it. > - Noel > > On 22 May 2017 at 18:56, Marcos Villarreal wrote: > > Hello, > > > > For an application we are developing, we would like to get an atom typing > > independent of the input format. > > For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens of > > the same molecule (i.e. identical heavy atom coordinates) should get the > > same atom types. > > The attached program is our try in that direction, but unfortunately > without > > success. How could one get ride off all the input information and let > babel > > do all the new calculations of atom types? > > > > Thank you in advance. > > > > > > int main(int argc,char **argv) > > { > > > > OpenBabel::OBConversion conv; > > OpenBabel::OBMol mol; > > std::string filename; > > filename = argv[1]; > > > > conv.ReadFile(&mol,filename); > > > > mol.DeleteHydrogens(); > > mol.ConnectTheDots(); > > mol.PerceiveBondOrders(); > > > > int i=0; > > FOR_ATOMS_OF_MOL(atom, mol) { > > i++; > > std::cout << i << ": " << atom->GetType() << std::endl ; > > } > > > > } > > > > > > > > -- > > Marcos Villarreal > > Dpto de Química Teórica y Computacional > > Facultad de Ciencias Químicas > > Universidad Nacional de Córdoba > > Argentina. > > > > > -- > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > ___ > > OpenBabel-discuss mailing list > > OpenBabel-discuss@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss > > > -- Marcos Villarreal Dpto de Química Teórica y Computacional Facultad de Ciencias Químicas Universidad Nacional de Cordoba -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Re: [Open Babel] (no subject)
Thank you Miro for your answer. Tha is in the spirit of we want to do, but without writing an intermediate file. We think that all the conversions can be done inside the code. Marcos. 2017-05-22 16:06 GMT-03:00 Miro Moman : > Quick and dirty workaround: Convert it to .xyz (removing the Hs if needed) > then compute the atom types from that file and see what happens... > > On 22 May 2017 19:24, "Marcos Villarreal" wrote: > >> Hello, >> >> For an application we are developing, we would like to get an atom typing >> independent of the input format. >> For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens of >> the same molecule (i.e. identical heavy atom coordinates) should get the >> same atom types. >> The attached program is our try in that direction, but unfortunately >> without success. How could one get ride off all the input information and >> let babel do all the new calculations of atom types? >> >> Thank you in advance. >> >> >> int main(int argc,char **argv) >> { >> >> OpenBabel::OBConversion conv; >> OpenBabel::OBMol mol; >> std::string filename; >> filename = argv[1]; >> >> conv.ReadFile(&mol,filename); >> >> mol.DeleteHydrogens(); >> mol.ConnectTheDots(); >> mol.PerceiveBondOrders(); >> >> int i=0; >> FOR_ATOMS_OF_MOL(atom, mol) { >> i++; >> std::cout << i << ": " << atom->GetType() << std::endl ; >> } >> >> } >> >> >> >> -- >> Marcos Villarreal >> Dpto de Química Teórica y Computacional >> Facultad de Ciencias Químicas >> Universidad Nacional de Córdoba >> Argentina. >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> OpenBabel-discuss mailing list >> OpenBabel-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/openbabel-discuss >> >> -- Marcos Villarreal Dpto de Química Teórica y Computacional Facultad de Ciencias Químicas Universidad Nacional de Cordoba -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Re: [Open Babel] (no subject)
Maybe if you can give an example of the problem with aromaticity, we can help? The only information that is used by that function is the structure, so it was probably wrong at that point. On 23 May 2017 at 13:16, Marcos Villarreal wrote: > > Dear Noel Thank you for your answer. Please see my comments bellow. > > 2017-05-22 16:00 GMT-03:00 Noel O'Boyle : >> >> In other words, you want to assign atom types based on the structure. > > >Yes, that's right. > >> >> The source of the structure is immaterial except in so far as it >> introduces noise. For example, to read a PDB file you need to guess >> various things. To read a MOL file, you don't need to guess anything. > > > That noise is what we are trying to avoid by always calculating (guessing) > things with the same algorithm. > >> >> Regarding your code, you should never throw away information and then >> try to guess it. > > > Well, that depend on your faith on the quality of the information putted in > the input format. > One can always set a flag to keep the input information if its considered > accurate enough, but if you want consistency regarding the input file format > I don't see other way but to strip off all the information in the input and > recalculate it. > >> Also, I note in passing that DeleteHydrogens() >> doesn't delete anything, it just suppresses any explicit hydrogens. > > >> I'm a bit unclear why you are using the internal Open Babel atom >> types. Personally, I would avoid this as the atom types may not be >> suitable. >> >> Instead, just implement your own atom type function to suit >> your needs. Any atom typing can be implemented as a function that >> takes an OBAtom* and returns the type, perhaps as an enum. > > > Are you referring to functions like "IsAmideNitrogen" or so?. We used these > functions, and they worked just fine for our needs. > The problem we faced was with "IsAromatic" that we couldn't make it > input-format agnostic. Our guess is that some information of the input > format is always remaining when calling it, regardless > UnsetAromaticPerceived and the like were called before. > This lead us to try the route of put all the atom types in internal Open > Babel types and build upon it. > >> >> - Noel >> >> On 22 May 2017 at 18:56, Marcos Villarreal wrote: >> > Hello, >> > >> > For an application we are developing, we would like to get an atom >> > typing >> > independent of the input format. >> > For example a mol2 with all Hydrogen atoms and a pdb without Hydrogens >> > of >> > the same molecule (i.e. identical heavy atom coordinates) should get the >> > same atom types. >> > The attached program is our try in that direction, but unfortunately >> > without >> > success. How could one get ride off all the input information and let >> > babel >> > do all the new calculations of atom types? >> > >> > Thank you in advance. >> > >> > >> > int main(int argc,char **argv) >> > { >> > >> > OpenBabel::OBConversion conv; >> > OpenBabel::OBMol mol; >> > std::string filename; >> > filename = argv[1]; >> > >> > conv.ReadFile(&mol,filename); >> > >> > mol.DeleteHydrogens(); >> > mol.ConnectTheDots(); >> > mol.PerceiveBondOrders(); >> > >> > int i=0; >> > FOR_ATOMS_OF_MOL(atom, mol) { >> > i++; >> > std::cout << i << ": " << atom->GetType() << std::endl ; >> > } >> > >> > } >> > >> > >> > >> > -- >> > Marcos Villarreal >> > Dpto de Química Teórica y Computacional >> > Facultad de Ciencias Químicas >> > Universidad Nacional de Córdoba >> > Argentina. >> > >> > >> > -- >> > Check out the vibrant tech community on one of the world's most >> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> > ___ >> > OpenBabel-discuss mailing list >> > OpenBabel-discuss@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/openbabel-discuss >> > > > > > > -- > Marcos Villarreal > Dpto de Química Teórica y Computacional > Facultad de Ciencias Químicas > Universidad Nacional de Cordoba -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Re: [Open Babel] (no subject)
When I convert the molecules as given with obabel, you're right - you run into a bug that's been fixed on the development branch - aromaticity is perceived differently depending on the presence/absence of explicit hydrogens: > obabel 3rlb_ligand.* -osmi Cc1nc(N)c(Cn2csc(CCO)c2C)cn13rlb_ligand Cc1nc(N)c(CN2CSC(=C2C)CCO)cn1 ./3rlb_ligand.pdb If you delete the explicit Hs first, you can get the same aromaticity perception for both: >obabel 3rlb_ligand.* -d -O tmp.sdf >obabel tmp.sdf -osmi Cc1nc(N)c(CN2=CSC(=C2C)CCO)cn1 3rlb_ligand Cc1nc(N)c(CN2CSC(=C2C)CCO)cn1 ./3rlb_ligand.pdb If you paste these SMILES into Marvin Sketch you can see the difference. The MOL2 file contains an extra double bond to a nitrogen. So what's going on?... I'm guessing that the correct structure is in the MOL2 file, but it was read incorrectly by Open Babel and so is missing the charge on the 4-valent nitrogen. MOL2 is a horrible format but we should do a better job. I note in passing that MarvinSketch interprets it the same as Open Babel but that's no excuse. The PDB file of course does not contain any bond orders and so we guess them. We do an okay job - this is an example where we miss the bond. If you removed these bond orders from the MOL2 file you would get the same wrong structure too. - Noel On 23 May 2017 at 15:24, Marcos Villarreal wrote: > Here is one example from the PDBBind refine data set. > Please find bellow the code, the output, and attached the mol2 and the pdb > input files. > > Code: > > #include > #include > #include > #include > #include > > int main(int argc,char **argv) > { > > OpenBabel::OBConversion conv; > OpenBabel::OBMol mol; > std::string filename; > filename = argv[1]; > > conv.ReadFile(&mol,filename); > > mol.DeleteHydrogens(); > mol.ConnectTheDots(); > mol.PerceiveBondOrders(); > mol.UnsetAromaticPerceived(); > > FOR_ATOMS_OF_MOL(atom, mol) { > std::cout << atom->IsAromatic() ; > } > > } > > Output: > 001000 (mol2) > 11 (pdb) > > > > 2017-05-23 9:43 GMT-03:00 Noel O'Boyle : >> >> Maybe if you can give an example of the problem with aromaticity, we >> can help? The only information that is used by that function is the >> structure, so it was probably wrong at that point. >> >> On 23 May 2017 at 13:16, Marcos Villarreal wrote: >> > >> > Dear Noel Thank you for your answer. Please see my comments bellow. >> > >> > 2017-05-22 16:00 GMT-03:00 Noel O'Boyle : >> >> >> >> In other words, you want to assign atom types based on the structure. >> > >> > >> >Yes, that's right. >> > >> >> >> >> The source of the structure is immaterial except in so far as it >> >> introduces noise. For example, to read a PDB file you need to guess >> >> various things. To read a MOL file, you don't need to guess anything. >> > >> > >> > That noise is what we are trying to avoid by always calculating >> > (guessing) >> > things with the same algorithm. >> > >> >> >> >> Regarding your code, you should never throw away information and then >> >> try to guess it. >> > >> > >> > Well, that depend on your faith on the quality of the information putted >> > in >> > the input format. >> > One can always set a flag to keep the input information if its >> > considered >> > accurate enough, but if you want consistency regarding the input file >> > format >> > I don't see other way but to strip off all the information in the input >> > and >> > recalculate it. >> > >> >> Also, I note in passing that DeleteHydrogens() >> >> doesn't delete anything, it just suppresses any explicit hydrogens. >> > >> > >> >> I'm a bit unclear why you are using the internal Open Babel atom >> >> types. Personally, I would avoid this as the atom types may not be >> >> suitable. >> >> >> >> Instead, just implement your own atom type function to suit >> >> your needs. Any atom typing can be implemented as a function that >> >> takes an OBAtom* and returns the type, perhaps as an enum. >> > >> > >> > Are you referring to functions like "IsAmideNitrogen" or so?. We used >> > these >> > functions, and they worked just fine for our needs. >> > The problem we faced was with "IsAromatic" that we couldn't make it >> > input-format agnostic. Our guess is that some information of the input >> > format is always remaining when calling it, regardless >> > UnsetAromaticPerceived and the like were called before. >> > This lead us to try the route of put all the atom types in internal Open >> > Babel types and build upon it. >> > >> >> >> >> - Noel >> >> >> >> On 22 May 2017 at 18:56, Marcos Villarreal >> >> wrote: >> >> > Hello, >> >> > >> >> > For an application we are developing, we would like to get an atom >> >> > typing >> >> > independent of the input format. >> >> > For example a mol2 with all Hydrogen atoms and a pdb without >> >> > Hydrogens >> >> > of >> >> > the same molecule (i.e. identical heavy atom coordinates) should get >> >> > the >> >> > same atom types. >> >> > The attached pr
Re: [Open Babel] (no subject)
Thank you Noel for look into this. So how do you suggest to do this inside the code, that is without passing for and intermediate file. I remind you that our gol is to get the same atom types (say aromatics) regardless the input format. For now we are interested in consistency before "accuracy", which is another subject. As a related note, we have tested several atom typing programs (Knodle, I-interpret, Unicon and also Open Babel) and the perception of the number of aromatic atoms typically differ in 10-20 % when analyzing a 3600 structures in the PDBbind database. 2017-05-23 11:47 GMT-03:00 Noel O'Boyle : > When I convert the molecules as given with obabel, you're right - you > run into a bug that's been fixed on the development branch - > aromaticity is perceived differently depending on the presence/absence > of explicit hydrogens: > > > obabel 3rlb_ligand.* -osmi > Cc1nc(N)c(Cn2csc(CCO)c2C)cn13rlb_ligand > Cc1nc(N)c(CN2CSC(=C2C)CCO)cn1 ./3rlb_ligand.pdb > > If you delete the explicit Hs first, you can get the same aromaticity > perception for both: > >obabel 3rlb_ligand.* -d -O tmp.sdf > >obabel tmp.sdf -osmi > Cc1nc(N)c(CN2=CSC(=C2C)CCO)cn1 3rlb_ligand > Cc1nc(N)c(CN2CSC(=C2C)CCO)cn1 ./3rlb_ligand.pdb > > If you paste these SMILES into Marvin Sketch you can see the > difference. The MOL2 file contains an extra double bond to a nitrogen. > So what's going on?... > > I'm guessing that the correct structure is in the MOL2 file, but it > was read incorrectly by Open Babel and so is missing the charge on the > 4-valent nitrogen. MOL2 is a horrible format but we should do a better > job. I note in passing that MarvinSketch interprets it the same as > Open Babel but that's no excuse. > > The PDB file of course does not contain any bond orders and so we > guess them. We do an okay job - this is an example where we miss the > bond. If you removed these bond orders from the MOL2 file you would > get the same wrong structure too. > > - Noel > > > > On 23 May 2017 at 15:24, Marcos Villarreal wrote: > > Here is one example from the PDBBind refine data set. > > Please find bellow the code, the output, and attached the mol2 and the > pdb > > input files. > > > > Code: > > > > #include > > #include > > #include > > #include > > #include > > > > int main(int argc,char **argv) > > { > > > > OpenBabel::OBConversion conv; > > OpenBabel::OBMol mol; > > std::string filename; > > filename = argv[1]; > > > > conv.ReadFile(&mol,filename); > > > > mol.DeleteHydrogens(); > > mol.ConnectTheDots(); > > mol.PerceiveBondOrders(); > > mol.UnsetAromaticPerceived(); > > > > FOR_ATOMS_OF_MOL(atom, mol) { > > std::cout << atom->IsAromatic() ; > > } > > > > } > > > > Output: > > 001000 (mol2) > > 11 (pdb) > > > > > > > > 2017-05-23 9:43 GMT-03:00 Noel O'Boyle : > >> > >> Maybe if you can give an example of the problem with aromaticity, we > >> can help? The only information that is used by that function is the > >> structure, so it was probably wrong at that point. > >> > >> On 23 May 2017 at 13:16, Marcos Villarreal > wrote: > >> > > >> > Dear Noel Thank you for your answer. Please see my comments bellow. > >> > > >> > 2017-05-22 16:00 GMT-03:00 Noel O'Boyle : > >> >> > >> >> In other words, you want to assign atom types based on the structure. > >> > > >> > > >> >Yes, that's right. > >> > > >> >> > >> >> The source of the structure is immaterial except in so far as it > >> >> introduces noise. For example, to read a PDB file you need to guess > >> >> various things. To read a MOL file, you don't need to guess anything. > >> > > >> > > >> > That noise is what we are trying to avoid by always calculating > >> > (guessing) > >> > things with the same algorithm. > >> > > >> >> > >> >> Regarding your code, you should never throw away information and then > >> >> try to guess it. > >> > > >> > > >> > Well, that depend on your faith on the quality of the information > putted > >> > in > >> > the input format. > >> > One can always set a flag to keep the input information if its > >> > considered > >> > accurate enough, but if you want consistency regarding the input file > >> > format > >> > I don't see other way but to strip off all the information in the > input > >> > and > >> > recalculate it. > >> > > >> >> Also, I note in passing that DeleteHydrogens() > >> >> doesn't delete anything, it just suppresses any explicit hydrogens. > >> > > >> > > >> >> I'm a bit unclear why you are using the internal Open Babel atom > >> >> types. Personally, I would avoid this as the atom types may not be > >> >> suitable. > >> >> > >> >> Instead, just implement your own atom type function to suit > >> >> your needs. Any atom typing can be implemented as a function that > >> >> takes an OBAtom* and returns the type, perhaps as an enum. > >> > > >> > > >> > Are you referring to functions like "IsAmideNitrogen" or so?. We used > >> > these > >> > functions, and they worked just
Re: [Open Babel] (no subject)
> For now we are interested in consistency before "accuracy", which is another > subject. As a related note, we have tested several atom typing programs > (Knodle, I-interpret, Unicon and also Open Babel) and the perception of the > number of aromatic atoms typically differ in 10-20 % when analyzing a 3600 > structures in the PDBbind database. This is hardly surprising. For one, if I take 10 organic chemists in a room and ask them to identify aromatic rings, I’ll get at least 10-20% variation. More specifically, there is not one uniform cheminformatics model for aromaticity - because there is no well-defined chemical definition. That’s omitting the hard cases, even given a specific aromatic model. I’d guess we get 5-10 bug reports per year on specific cases for OB aromaticity detection. But your question is how do you get uniform atom types, regardless of the input file format. This is probably impossible. If you have data in format X with correct bond and formal charge assignments (e.g., SDF) and data in XYZ format with atoms and no bonds or formal charges, you have to assume that all the bond perception is perfect. I don’t have a good metric for OB’s implementation, but I’d guess somewhere in the ~90-95% range. In short, please don’t throw away good data. Stick to file formats that retain as much information as possible. -Geoff -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Re: [Open Babel] (no subject)
Hello Geoff, thank you for your answer. Please see my comments which are inline with yours comments below 2017-05-23 12:38 GMT-03:00 Geoffrey Hutchison : > > For now we are interested in consistency before "accuracy", which is > another subject. As a related note, we have tested several atom typing > programs (Knodle, I-interpret, Unicon and also Open Babel) and the > perception of the number of aromatic atoms typically differ in 10-20 % when > analyzing a 3600 structures in the PDBbind database. > This is hardly surprising. For one, if I take 10 organic chemists in a room > and ask them to identify aromatic rings, I’ll get at least 10-20% variation. > > More specifically, there is not one uniform cheminformatics model for > aromaticity - because there is no well-defined chemical definition. That’s > omitting the hard cases, even given a specific aromatic model. I’d guess we > get 5-10 bug reports per year on specific cases for OB aromaticity > detection. > > That was exactly the point implied in this comment. Open Babel seems as good as any other program at detecting aromaticity. > But your question is how do you get uniform atom types, regardless of the > input file format. This is probably impossible. If you have data in format > X with correct bond and formal charge assignments (e.g., SDF) and data in > XYZ format with atoms and no bonds or formal charges, you have to assume > that all the bond perception is perfect. I don’t have a good metric for > OB’s implementation, but I’d guess somewhere in the ~90-95% range. > > Well, as long as coordinates and atomic numbers are provided in a file, it should be possible to always come up with the same atom typing, regardless the format. Indeed you will have to loose information for the sake of consistency. > In short, please don’t throw away good data. Stick to file formats that > retain as much information as possible. > I agree with you in principle, but consider the following not uncommon scenario. We are working on docking (autodock vina) whose score depends on atom typing. As you know the ligands come in different formats, usually pdb, mol2 or sdf. We would expect to obtain the same docking result regardless the input format. -Marcos. > -Geoff -- Marcos Villarreal Dpto de Química Teórica y Computacional Facultad de Ciencias Químicas Universidad Nacional de Cordoba -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss
Re: [Open Babel] (no subject)
On 05/23/2017 12:24 PM, Marcos Villarreal wrote: > I agree with you in principle, but consider the following not uncommon > scenario. We are working on docking (autodock vina) whose score depends on > atom typing. As you know the ligands come in different formats, usually > pdb, mol2 or sdf. We would expect to obtain the same docking result > regardless the input format. Why? PDB files contain a 3D structure, complete with stereo config (because that's how the crystal structure works). MOL/SDF doesn't have to include 3D coordinates, nor any usable stereo flags. Unless all my MOL/SDFs were generated from PDBs with zero information loss, I wouldn't expect anything from them. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss