Re: [Open Babel] Identical Fingerprints for three compounds

2021-12-07 Thread Wolcott, Chris (NIH/NCI) [C] via OpenBabel-discuss
Francois, Guess I have some light reading during the holidays. Thanks again for the education and reference. >> Here is a venerable reference on molecular fingerprints: >> https://www.daylight.com/dayhtml/doc/theory/theory.finger.html ___ Open

Re: [Open Babel] Identical Fingerprints for three compounds

2021-12-07 Thread Francois Berenger
On 06/12/2021 22:59, Wolcott, Chris (NIH/NCI) [C] wrote: Francois, I apologize upfront if I am not using the correct verbiage (compounds, molecules, ...). I am a software developer writing a new web application for the project staff. The old web application (developed in .NET) stored 370,00

Re: [Open Babel] Identical Fingerprints for three compounds

2021-12-06 Thread Wolcott, Chris (NIH/NCI) [C] via OpenBabel-discuss
Andrew, Wow, thank you for the detailed reply. I am happy with the current processing time of 5 secs to compare 400,000+ fingerprints, but I will look at the stack overflow discussion. I am pretty well versed in MongoDB and hadn't thought about calculating it fully in MongoDB. I will

Re: [Open Babel] Identical Fingerprints for three compounds

2021-12-06 Thread Andrew Dalke
Hi Chris, The FP2 fingerprint works along these lines: 1) Choose a fingerprint size 'n', which is a power of 2. 2) Allocate a vector of w = n/32 words to store the bitstring 3) For each linear subpath up to length 7 (these correspond to n-grams for words): a) use a hash based on the atom

Re: [Open Babel] Identical Fingerprints for three compounds

2021-12-06 Thread Wolcott, Chris (NIH/NCI) [C] via OpenBabel-discuss
Francois, I apologize upfront if I am not using the correct verbiage (compounds, molecules, ...). I am a software developer writing a new web application for the project staff. The old web application (developed in .NET) stored 370,000+ compounds and related information generated from oBab

Re: [Open Babel] Identical Fingerprints for three compounds

2021-12-06 Thread Francois Berenger
Dear Chris, Fingerprints being lossy encodings of molecules: it is possible that different molecules end-up with the same fingerprint. If you use an unfolded-counted fingerprint (instead of folded-uncounted, usually), this "funny" event should occur less frequently. Another possibility might

[Open Babel] Identical Fingerprints for three compounds

2021-12-05 Thread Wolcott, Chris (NIH/NCI) [C] via OpenBabel-discuss
Is it expected or is there any easy explanation why three different smiles create the same fingerprint? Are compounds come from the same synthetic library. 1st Compound Canonical Smile: O=C1N[C@H]2C[C@H](N(C2)Cc2ccncc2)C(=O)N2CCO[C@@H](C2)CN(C[C@H]2O[C@@H](C1)[C@H](O)[C@@H]2O)C(=