Dear Jacob, This is a Hashing funciton that is used to compress the data:
https://en.wikipedia.org/wiki/Universal_hashing http://rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf Greg write it in page 2 => "Typical kernels extract features of the molecule, hash them, and use the hash to determine bits that should be set" The hashing is a simple function like modulo, etc,... Best regards, Dr. Guillaume GODIN Principal Scientist Chemoinformatic & Datamining Innovation CORPORATE R&D DIVISION DIRECT LINE +41 (0)22 780 3645 MOBILE +41 (0)79 536 1039 Firmenich SA RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8 ________________________________________ De : Jacob Gora <[email protected]> Envoyé : jeudi 6 octobre 2016 11:23 À : [email protected] Objet : [Rdkit-discuss] Implementation details bitvectors from morgan/circular fingerprints Hi, is there any information on how RDkit creates bitvectors from circular fingerprints? As the theoretic featurespace is too big for storage and the default feature space used in RDkit, when converting is only 2048, there must be some kind of information loss (and compression?). Can anyone explain how this is handled in detail? What features are used for the BV in the end, how is it decided on. Regards Jacob ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ********************************************************************** DISCLAIMER This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from Firmenich, are confidential and solely for the use of the intended recipient. The contents do not represent the opinion of Firmenich except to the extent that it relates to their official business. ********************************************************************** ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

