On Nov 14, 2011, at 1:47 PM, Ernst-Georg Schmid wrote: >> Since an unfolded FP2 is 1024 bits long (1021 >> actually used) it doesn't fit into the largest integer datatype of >> MySQL, UNSIGNED BIGINT which is 2^64. So you either have to store it >> in a BLOB, but then you have to deal with BLOB input/output and cannot >> use the database's own bit operators but have to develop your own, >> like Mychem does.
In my cyclops-mysql package http://www.dalkescientific.com/writings/diary/archive/2010/10/03/cyclops_mysql_jquery_and_marvin.html I store the fingerprints as a hex-encoded string. This obviously takes up twice as much space as a denser blob encoding, but has the advantage that you can look at it if needed. It also means I don't have to worry about database int sizes or int operations. In fact, since I was using the PubChem fingerprints, of size 881 bits, I could put the result in a string of size 221. I, like Mychem, wrote my own popcount and Tanimoto routines for working with hex encoded fingerprints. On Nov 15, 2011, at 1:24 PM, Jérôme Pansanel wrote: > When using blob, a tanimoto search against 1M compounds takes less than > 2s with Mychem on a simple desktop. I reported on my performance numbers in http://dalkescientific.com/writings/CUP2009.pdf On my laptop in 2009 I did 130,000 Tanimoto tests per second. Looking at the performance numbers on this laptop, with my newest code base, it's about 275,000 per second for 4096 bit fingerprints. A 1024 fingerprint should be about 4x faster, so it's about the same performance as what Jérôme reports. I bring this up to suggest that hex encoding, while it seems pretty slow over a byte-blob or several integer columns, has the advantage of being readable and easily importable into other software. And it's not appreciably slow. On Nov 14, 2011, at 1:47 PM, Ernst-Georg Schmid wrote: >> I doubt the use of MD5ed canonical SMILES for exact searching. >> Certainly this works, but why not use the InChI-Key for better data >> interoperability? - SMILES can interoperate with many more tools - Perhaps they have a preferred charge form or tautomer form? >> And we are slowly leaving 'openbabel-discuss' towards >> 'how-to-build-a-chemical-database-discuss'. :-) I think that would be a good meeting topic someday. :) Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss