On 06/12/2021 22:59, Wolcott, Chris (NIH/NCI) [C] wrote:
Francois,
I apologize upfront if I am not using the correct verbiage
(compounds, molecules, ...). I am a software developer writing a new
web application for the project staff. The old web application
(developed in .NET) stored 370,000+ compounds and related information
generated from oBabel 2.4 in our MongoDB database. I did find an
error that the C#/MongoDB interface did not understand Unsigned
Integers and stored everything as signed. Doesn't look like it was a
big problem because when the data was retrieved it was converted back
to unsigned before being passed to Tanimoto.
In the new website I am using oBabel 3.1.1 and regenerated the
information for the 370k+ compounds plus added another 30k+ compounds
from a new library we have begun to use in the labs. The fingerprints
are generated using fp2 [32 bit unsigned arrays) via OBFingerprint. I
then use Tanimoto for similarity analysis. It takes about 5 seconds to
compare a single compound to the 400k+ pre-generated fingerprints.
With your questions I will attempt to educate myself a little bit more
on molecular fingerprints.
Any comments, references, prayers would be appreciated.
Here is a venerable reference on molecular fingerprints:
https://www.daylight.com/dayhtml/doc/theory/theory.finger.html
--------------------------------------------------------------------
Fingerprints being lossy encodings of molecules:
it is possible that different molecules end-up
with the same fingerprint.
If you use an unfolded-counted fingerprint (instead of
folded-uncounted,
usually),
this "funny" event should occur less frequently.
Another possibility might be to use a fingerprints with more bits.
Which fingerprint are you using by the way?
Regards,
F.
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss