Hi Riccardo,

are you planning on supporting other cartridges/database dialects? If
you only want to include RDKit/PostgreSQL then your implementation
might be much more of what you actually need :)

If you want to change the similarity thresholds you will need
something like session.execute(text("SET
rdkit.tanimoto_threshold=:threshold").execution_options(autocommit=True).params(threshold=threshold)),
probably wrapped inside a function. Important is that you use
execution_options(autocommit=True) because SQLAlchemy won't autocommit
SET operations (if you set it in the engine config).

I also have rdkit in my database api but I went for the hybrid
approach in SQLAlchemy that allows you to distinguish between methods
on the class and instance level. With an instance of an rdkit molecule
for example, the api will use the local rdkit installation for a
substructure pattern match. On the class level however, the same
expression is turned into an SQL expression to query the database. I
also use the @reconstructor decorator to turn the database rdmol
smiles string back in to a Python RDMol but this is only useful if you
plan on using RDKit on the client side as well.

# instance of ChemCompRDMol
>>>> print sti.RDMol.contains('c1ccccc1')
True

# class itself
>>>> print ChemCompRDMol.contains(sti.ism)
pdbchem.chem_comp_rdmols.rdmol OPERATOR(rdkit.@>) :rdmol_1

Here is an example:

   @reconstructor
   def init_on_load(self):
       '''
       Turns the rdmol column that is returned as a SMILES string back into an
       RDMol object.
       '''
       self.rdmol = MolFromSmiles(self.rdmol)

   @hybrid_method
   def contains(self, smiles):
       '''
       '''
       return self.rdmol.HasSubstructMatch(MolFromSmiles(smiles))

   @contains.expression
   def contains(self, smiles):
       '''
       '''
       return self.rdmol.op('OPERATOR(rdkit.@>)')(smiles)

and that's basically it. I have the cartridge installed in it's own
schema, that's why I need the OPERATOR() syntax.

Cheers,

Adrian

On Fri, Jul 1, 2011 at 17:22, Riccardo Vianello
<[email protected]> wrote:
> Hi all,
>
> I've started working on an extension of the SQLAlchemy database
> toolkit that is aimed to support direct access from python to the
> functions and data types exposed by the database chemical cartridge.
> In brief this means that instead of interacting with the RDBMS using
> raw SQL queries, it may become possible to execute the entire workflow
> (data preprocessing and cleanup, insertion, selection and further
> processing) without leaving the python interpreter, and at the same
> time delegating the construction of the required SQL expressions to a
> higher-level API. Just to make a simple example, instead of using
>
> select count(*) from molecules where structure @> 'O=C1OC2=CC=CC=C2C=C1';
>
> one might type something like the following:
>
>>>> constraint = Molecule.structure.contains('O=C1OC2=CC=CC=C2C=C1')
>>>> print session.query(Molecule).filter(constraint).count()
>
> (ok, in this specific case the python expression is a bit more
> verbose, but it's a very simple SQL query :-)
>
> The project is still in an initial phase, and the code is far from
> being mature, but the development is currently strongly focused on the
> RDKit postgresql extension. Structure searches and molecular
> descriptors should be fully supported, and bit fingerprints and
> associated similarity operators are also available (but modifying the
> default threshold similarity values is not yet possible). The code is
> currently hosted on github
>
> https://github.com/rvianello/razi
>
> and some draft documentation (at the moment mainly intended to
> illustrate the idea than providing a detailed reference) is also
> available:
>
> http://razi.readthedocs.org
>
> If you use the RDKit chemical cartridge or SQLAlchemy (or both), I
> hope you will find the idea interesting and I'd love to hear from you.
> Comments, ideas and suggestions would be very welcome.
>
> Cheers,
> Riccardo
>
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2d-c2
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to