It should be noted that hashing to bits always looses information, there is
simply no way around this.
While this is a partial answer to your question, you can at least find the
atoms that set a bit. Look for "Explaining bits from Morgan Fingerprints"
http://www.rdkit.org/docs/GettingStartedInPython.html
Also, while morgan fingerprints are not bloom filters, the wikipedia entry
on bloom filters has a lot of information regarding alternative hashing
functions and does have some history of chemical fingerprints in general
that may help answer some questions:
https://en.wikipedia.org/wiki/Bloom_filter
On Thu, Oct 6, 2016 at 6:02 AM, Guillaume GODIN <
[email protected]> wrote:
> Dear Jacob,
>
> This is a Hashing funciton that is used to compress the data:
>
> https://en.wikipedia.org/wiki/Universal_hashing
>
> http://rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf
>
> Greg write it in page 2 => "Typical kernels extract features of the
> molecule, hash them, and use
> the hash to determine bits that should be set"
>
> The hashing is a simple function like modulo, etc,...
>
> Best regards,
>
> Dr. Guillaume GODIN
> Principal Scientist
> Chemoinformatic & Datamining
> Innovation
> CORPORATE R&D DIVISION
> DIRECT LINE +41 (0)22 780 3645
> MOBILE +41 (0)79 536 1039
> Firmenich SA
> RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8
>
>
> ________________________________________
> De : Jacob Gora <[email protected]>
> Envoyé : jeudi 6 octobre 2016 11:23
> À : [email protected]
> Objet : [Rdkit-discuss] Implementation details bitvectors from
> morgan/circular fingerprints
>
> Hi,
>
> is there any information on how RDkit creates bitvectors from circular
> fingerprints?
> As the theoretic featurespace is too big for storage and the default
> feature space used in RDkit, when converting is only 2048, there must be
> some kind of
> information loss (and compression?).
>
> Can anyone explain how this is handled in detail?
> What features are used for the BV in the end, how is it decided on.
>
> Regards
> Jacob
>
>
>
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
> **********************************************************************
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
> **********************************************************************
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss