On Mar 29, 2017, at 10:30, Noel O'Boyle <baoille...@gmail.com> wrote:
> The ECFP is new to Open Babel and hasn't been sorted out properly.
> Geoff's been on to me to look into it, but it's way down my list at
> the moment. So in short, I agree, and encourage a prospective user to
> step up and look into it.

Thanks Noel.

I can work on some of that, that is, a sparse->dense function, perhaps along 
the lines of what the RDKit does.

There are a few things I don't know how to do, and would like advice:

  1) I don't know how to pass in configuration information with the current API

The equivalent RDKit code takes in the bit length and the number of bits per 
hash, with default values. In Open Babel, the current API expects callers pass 
in an nBits of 0 to get the default size. I can't change that without breaking 
backwards compatibility, which I fully agree is a no-go.

I can change the code so that passing in nBits=0 generates (say) a a 1024 bit 
fingerprint.

However, then there would then be no way to get the list of values, which the 
current ECFP function returns, unless I do something like use "nBits=-1" (or 
"nBits=1") as a special-flag.

There's also no way to pass in the number of bits per hash. For that I can use 
a default value.

Any suggestions?

  2) I have no good way to evaluate the length and density values.

I could make a guess on a decent length and density, or perhaps copy RDKit's 
algorithm directly and use its defaults.

Better would be if I explore a bit of parameter space, like 512, 1024, 2048 
bits and 4-8 bits per value.

But I have no data sets which I could use in that evaluation.

Noel, as a co-author of "Comparing structural fingerprints using a 
literature-based similarity benchmark", do you have any recommendations for how 
I can do an evaluation?


A different option is that I can put the sparse->dense code into chemfp, where 
I can more easily control the parameters, label it "experimental" (which, I've 
found out, doesn't prevent people from using it), and get some feedback from 
that, which might inform future Open Babel development.


Cheers,
                                Andrew
                                da...@dalkescientific.com



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to