Very very interesting. Thank you very much indeed for all the information.
Best regards.

> Hello,
>
> I'd say that the reason for choosing this storage method was a technical
> decision. Since an unfolded FP2 is 1024 bits long (1021 actually used) it
> doesn't fit into the largest integer datatype of MySQL, UNSIGNED BIGINT
> which is 2^64. So you either have to store it in a BLOB, but then you have
> to deal with BLOB input/output and cannot use the database's own bit
> operators but have to develop your own, like Mychem does.
>
> So they decided to split the fingerprint into chunks that can be stored in
> MySQL native datatypes, which gives 32 unsigned 32 bit integers (why they
> store them in BIGINT columns and waste storage space escapes me, maybe
> because MySQL's bit operators always use BIGINT internally). As you can
> see on page 2945, this has one advantage: columns that are 0 (i.e. have no
> bits set) can be completely omitted in the query. But the resulting SQL is
> somewhat ugly, with different columns of the mol_fp table used, depending
> on the query molecule. I guess that's why they have written the
> gen_search_sql() function that generates the correct query SQL string for
> you.
>
> Still, this is a linear scan on the mol_fp table but one second for
> searching benzene (constrained to 900 results) on 10^6 molecules is ok.
> Keeping the query as close to the database as possible is a viable design
> decision, but without knowing the plans the optimizer makes out of those
> queries and knowing the typical queries this system should handle, nobody
> can say if that strategy is 'best'. Or how this system will scale under
> multiuser load btw. All I can say is that the SQL you have to use is ok
> for use in a program but not if you want to search with a manually typed
> SQL. ;->
>
> Maybe Jerome can comment more on this.
>
> Also, as seen on page 2946, the matching itself is done outside MySQL with
> pybel. So this is not a fully integrated system, it's a me-too of MolDB4,
> done a bit different.
>
> I doubt the use of MD5ed canonical SMILES for exact searching. Certainly
> this works, but why not use the InChI-Key for better data
> interoperability?
>
> And we are slowly leaving 'openbabel-discuss' towards
> 'how-to-build-a-chemical-database-discuss'. :-)
>
> Best regards,
>
> Ernst-Georg
>
>



------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss

Reply via email to