On Aug 29, 2011, at 4:51 AM, Noel O'Boyle wrote: > Hi all, > > Some of you may be interested in a talk I presented last week at the > 5th Meeting on U.S. Government Chemical Databases and Open Chemistry > (in Frederick, MD). The talk, entitled "Improving the quality of > chemical databases with community-developed tools (and vice versa)", > described improvements in OB over the last 2 years in the handling of > chemical data and also using OB to find errors in databases. > > You can find the talk at > http://www.redbrick.dcu.ie/~noel/talks/ImprovingDatabases.pptx. >
Thanks for the talk. It fits very well with one of the things I mentioned in Cambridge this past Jan. I really am inspired to make a real attempt to clean up the NCI structures as well as possible. It certainly isn't possible to be certain about how well a structure represents what is in the bottle, but we should be able to make sure that the way the structures are drawn consistently represent what we know and don't know about the compound. I'd also like to see if we can use other info to get the best structure possible. A few examples: 1) there are structures that have information in labels. So sometimes the stereochemistry at a particular center is is given by a dummy atom nearby with a label 'R' or 'S'. 2) the molecular formula has historically been entered independently of the structure. When you check to see if you can get the molecular formula from the structure, you get some inconsistencies. Some are pretty easy to figure out (the difference between M.Cl, M.Cl- and M.HCl) and some that might take a bit more thought. There are also cases that the counter-ion is left out of the structure, but appears in the MF. Of course all this is in addition to the kinds of chemistry and consistency checks you talked about in the talk. If you or any other OpenBabel folks want to help out, let me know. I'll prob have an sdf file with the raw database extraction on our ftp site in the next few days. At the very least I expect I will have some questions for the list. DanZ /******************************************** * Daniel Zaharevitz * Chief, Information Technology Branch * Developmental Therapeutics Program * National Cancer Institute * zahar...@mail.nih.gov * ********************************************/ ------------------------------------------------------------------------------ EMC VNX: the world's simplest storage, starting under $10K The only unified storage solution that offers unified management Up to 160% more powerful than alternatives and 25% more efficient. Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss