Hi all,

I've started working on an extension of the SQLAlchemy database
toolkit that is aimed to support direct access from python to the
functions and data types exposed by the database chemical cartridge.
In brief this means that instead of interacting with the RDBMS using
raw SQL queries, it may become possible to execute the entire workflow
(data preprocessing and cleanup, insertion, selection and further
processing) without leaving the python interpreter, and at the same
time delegating the construction of the required SQL expressions to a
higher-level API. Just to make a simple example, instead of using

select count(*) from molecules where structure @> 'O=C1OC2=CC=CC=C2C=C1';

one might type something like the following:

>>> constraint = Molecule.structure.contains('O=C1OC2=CC=CC=C2C=C1')
>>> print session.query(Molecule).filter(constraint).count()

(ok, in this specific case the python expression is a bit more
verbose, but it's a very simple SQL query :-)

The project is still in an initial phase, and the code is far from
being mature, but the development is currently strongly focused on the
RDKit postgresql extension. Structure searches and molecular
descriptors should be fully supported, and bit fingerprints and
associated similarity operators are also available (but modifying the
default threshold similarity values is not yet possible). The code is
currently hosted on github

https://github.com/rvianello/razi

and some draft documentation (at the moment mainly intended to
illustrate the idea than providing a detailed reference) is also
available:

http://razi.readthedocs.org

If you use the RDKit chemical cartridge or SQLAlchemy (or both), I
hope you will find the idea interesting and I'd love to hear from you.
Comments, ideas and suggestions would be very welcome.

Cheers,
Riccardo

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to