Hello, I'd say that the reason for choosing this storage method was a technical decision. Since an unfolded FP2 is 1024 bits long (1021 actually used) it doesn't fit into the largest integer datatype of MySQL, UNSIGNED BIGINT which is 2^64. So you either have to store it in a BLOB, but then you have to deal with BLOB input/output and cannot use the database's own bit operators but have to develop your own, like Mychem does.
So they decided to split the fingerprint into chunks that can be stored in MySQL native datatypes, which gives 32 unsigned 32 bit integers (why they store them in BIGINT columns and waste storage space escapes me, maybe because MySQL's bit operators always use BIGINT internally). As you can see on page 2945, this has one advantage: columns that are 0 (i.e. have no bits set) can be completely omitted in the query. But the resulting SQL is somewhat ugly, with different columns of the mol_fp table used, depending on the query molecule. I guess that's why they have written the gen_search_sql() function that generates the correct query SQL string for you. Still, this is a linear scan on the mol_fp table but one second for searching benzene (constrained to 900 results) on 10^6 molecules is ok. Keeping the query as close to the database as possible is a viable design decision, but without knowing the plans the optimizer makes out of those queries and knowing the typical queries this system should handle, nobody can say if that strategy is 'best'. Or how this system will scale under multiuser load btw. All I can say is that the SQL you have to use is ok for use in a program but not if you want to search with a manually typed SQL. ;-> Maybe Jerome can comment more on this. Also, as seen on page 2946, the matching itself is done outside MySQL with pybel. So this is not a fully integrated system, it's a me-too of MolDB4, done a bit different. I doubt the use of MD5ed canonical SMILES for exact searching. Certainly this works, but why not use the InChI-Key for better data interoperability? And we are slowly leaving 'openbabel-discuss' towards 'how-to-build-a-chemical-database-discuss'. :-) Best regards, Ernst-Georg ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ OpenBabel-discuss mailing list OpenBabel-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/openbabel-discuss