Hello,

I'd say that the reason for choosing this storage method was a technical 
decision. Since an unfolded FP2 is 1024 bits long (1021 actually used) it 
doesn't fit into the largest integer datatype of MySQL, UNSIGNED BIGINT which 
is 2^64. So you either have to store it in a BLOB, but then you have to deal 
with BLOB input/output and cannot use the database's own bit operators but have 
to develop your own, like Mychem does.

So they decided to split the fingerprint into chunks that can be stored in 
MySQL native datatypes, which gives 32 unsigned 32 bit integers (why they store 
them in BIGINT columns and waste storage space escapes me, maybe because 
MySQL's bit operators always use BIGINT internally). As you can see on page 
2945, this has one advantage: columns that are 0 (i.e. have no bits set) can be 
completely omitted in the query. But the resulting SQL is somewhat ugly, with 
different columns of the mol_fp table used, depending on the query molecule. I 
guess that's why they have written the gen_search_sql() function that generates 
the correct query SQL string for you.

Still, this is a linear scan on the mol_fp table but one second for searching 
benzene (constrained to 900 results) on 10^6 molecules is ok. Keeping the query 
as close to the database as possible is a viable design decision, but without 
knowing the plans the optimizer makes out of those queries and knowing the 
typical queries this system should handle, nobody can say if that strategy is 
'best'. Or how this system will scale under multiuser load btw. All I can say 
is that the SQL you have to use is ok for use in a program but not if you want 
to search with a manually typed SQL. ;->

Maybe Jerome can comment more on this.

Also, as seen on page 2946, the matching itself is done outside MySQL with 
pybel. So this is not a fully integrated system, it's a me-too of MolDB4, done 
a bit different.

I doubt the use of MD5ed canonical SMILES for exact searching. Certainly this 
works, but why not use the InChI-Key for better data interoperability?

And we are slowly leaving 'openbabel-discuss' towards 
'how-to-build-a-chemical-database-discuss'. :-)

Best regards,

Ernst-Georg

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to