> On Jan 7, 2019, at 14:10, Noel O'Boyle <baoille...@gmail.com> wrote: > > Can you clarify the requirement for bumping the version? That is, which of > the following is the invariant: > 1. Any molecule represented in any format changes must create the same > fingerprint > 2. Any SMILES string must create the same fingerprint > 3. Any OBMol must create the same fingerprint
I don't know how to answer that question. I think the answer is (2), but where "SMILES" is replaced with "input structure record". The idea of the version in the chemfp type string is to let people know if it's reasonable to use the same fingerprint data set after changing to a new version of a toolkit. For example, I might use Open Babel to generate MACCS fingerprints from a ChEMBL SD file, and the same version of Open Babel to convert a query SMILES into a query fingerprint to find the k=10 nearest neighbors. After a period of time I upgrade to a new version of Open Babel. I would like get a warning if the generation method has changed enough that I should re-compute the MACCS keys. Or, someone may publish a paper which uses an Open Babel-generated FP2 data set. I download the dataset and want to know if my installed version of Open Babel is likely compatible with it. My criteria hasn't been so strict as "any" change. For example, if the SD parser was changed to better support information which is in 1 out of every 100,000 PubChem record, and that change sometimes affects one bit of a fingerprint, then in principle the version number could be bumped. Usually that's between the threshold of noticeability. Fingerprints are blunt tools for comparing molecules, and we already expect some level of error when working with structure and fingerprint files. On the other hand, a change in 1% of the records seems like enough to bump the version number. Chemfp has a "software" header which helps in cases where more fine-grained versioning might be needed. For example,: #software=OpenBabel/2.4.1 chemfp/3.0 says that the data set was generated with Open Babel 2.4.1 using chemfp 3.0. However, it's impossible for software to look at "2.4.0" vs. "2.4.1" or "2.4.90" and tell if the fingerprint generation method changed. (Plus, the 2.4.90 has been the same since 2017-10-11 so isn't enough information if someone wants to reproduce an analysis. Ideally someone who publishes a paper based on a version installed from version control should include the relevant git commit id.) > Since you know where to edit, you can if you wish make the change directly on > github, if you have an account there. But otherwise, I can do it. I can make the change. I'm trying to figure out what change to make. If there were two significant periods of time since 2.4.1 was released, with different fingerprint generation methods, then I would build versions of Open Babel for those periods so that chemfp's versioning captures that information. Eg, have a "/3" and a "/4". But Open Babel would only need the "/4". If there's only one significant implementation change, which is what it now seems like, then the easiest code change is to bump all versions to "/3". I'm fine with that. In principle I would like to add a "version" string to the plugin system, so that I can replace: << "#type=OpenBabel-" << _pFP->GetID() << "/1" << '\n' with something like << "#type=OpenBabel-" << _pFP->GetID() << "/" << _pFP->GetVersion() << '\n' which means the implementation version numbers can be bumped independently. However, that requires adding a new attribute to the OBPlugin class, which I think would break ABI compatibility and require a rebuild of all third-party extensions. I could instead add it to OBFingerprint, which would break fewer things. On the other hand, my feeling is that that's overkill for the FP[234] and MACCS fingerprints as those have been stable for a long time. The newer circular fingerprints are not as stable, but they also can't be exported to FPS format. Do the Open Babel core developers think this feature is useful enough to outweigh the potential of breaking existing third-party plugins? If not, is there an alternative way to add a version string which is acceptable? If not, I'll just change all of the version numbers from "/1" to "/3". Best regards, Andrew da...@dalkescientific.com _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss