> On Jan 7, 2019, at 14:10, Noel O'Boyle <baoille...@gmail.com> wrote:
> 
> Can you clarify the requirement for bumping the version? That is, which of 
> the following is the invariant:
> 1. Any molecule represented in any format changes must create the same 
> fingerprint
> 2. Any SMILES string must create the same fingerprint
> 3. Any OBMol must create the same fingerprint

I don't know how to answer that question. I think the answer is (2), but where 
"SMILES" is replaced with "input structure record".

The idea of the version in the chemfp type string is to let people know if it's 
reasonable to use the same fingerprint data set after changing to a new version 
of a toolkit.

For example, I might use Open Babel to generate MACCS fingerprints from a 
ChEMBL SD file, and the same version of Open Babel to convert a query SMILES 
into a query fingerprint to find the k=10 nearest neighbors. After a period of 
time I upgrade to a new version of Open Babel. I would like get a warning if 
the generation method has changed enough that I should re-compute the MACCS 
keys.

Or, someone may publish a paper which uses an Open Babel-generated FP2 data 
set. I download the dataset and want to know if my installed version of Open 
Babel is likely compatible with it.

My criteria hasn't been so strict as "any" change. For example, if the SD 
parser was changed to better support information which is in 1 out of every 
100,000 PubChem record, and that change sometimes affects one bit of a 
fingerprint, then in principle the version number could be bumped.

Usually that's between the threshold of noticeability. Fingerprints are blunt 
tools for comparing molecules, and we already expect some level of error when 
working with structure and fingerprint files.

On the other hand, a change in 1% of the records seems like enough to bump the 
version number.

Chemfp has a "software" header which helps in cases where more fine-grained 
versioning might be needed. For example,:

  #software=OpenBabel/2.4.1 chemfp/3.0

says that the data set was generated with Open Babel 2.4.1 using chemfp 3.0. 
However, it's impossible for software to look at "2.4.0" vs. "2.4.1" or 
"2.4.90" and tell if the fingerprint generation method changed.

(Plus, the 2.4.90 has been the same since 2017-10-11 so isn't enough 
information if someone wants to reproduce an analysis. Ideally someone who 
publishes a paper based on a version installed from version control should 
include the relevant git commit id.)

> Since you know where to edit, you can if you wish make the change directly on 
> github, if you have an account there. But otherwise, I can do it.

I can make the change. I'm trying to figure out what change to make.

If there were two significant periods of time since 2.4.1 was released, with 
different fingerprint generation methods, then I would build versions of Open 
Babel for those periods so that chemfp's versioning captures that information. 
Eg, have a "/3" and a "/4". But Open Babel would only need the "/4".

If there's only one significant implementation change, which is what it now 
seems like, then the easiest code change is to bump all versions to "/3".

I'm fine with that.

In principle I would like to add a "version" string to the plugin system, so 
that I can replace:

        << "#type=OpenBabel-" << _pFP->GetID() << "/1" << '\n'

with something like

        << "#type=OpenBabel-" << _pFP->GetID() << "/" << _pFP->GetVersion() << 
'\n'

which means the implementation version numbers can be bumped independently.

However, that requires adding a new attribute to the OBPlugin class, which I 
think would break ABI compatibility and require a rebuild of all third-party 
extensions.

I could instead add it to OBFingerprint, which would break fewer things.

On the other hand, my feeling is that that's overkill for the FP[234] and MACCS 
fingerprints as those have been stable for a long time.

The newer circular fingerprints are not as stable, but they also can't be 
exported to FPS format.

Do the Open Babel core developers think this feature is useful enough to 
outweigh the potential of breaking existing third-party plugins? If not, is 
there an alternative way to add a version string which is acceptable?

If not, I'll just change all of the version numbers from "/1" to "/3".

Best regards,

                                Andrew
                                da...@dalkescientific.com




_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to