Re: [HACKERS] Caching Python modules

PostgreSQL - Hans-Jürgen Schönig Wed, 17 Aug 2011 05:44:33 -0700

On Aug 17, 2011, at 2:19 PM, Jan Urbański wrote:

> On 17/08/11 14:09, PostgreSQL - Hans-Jürgen Schönig wrote:
>> CREATE OR REPLACE FUNCTION textprocess.add_to_corpus(lang text, t text) 
>> RETURNS float4 AS $$
>> 
>>        from SecondCorpus import SecondCorpus
>>        from SecondDocument import SecondDocument
>> 
>> i am doing some intense text mining here.
>> the problem is: is it possible to cache those imported modules from function 
>> to function call.
>> GD works nicely for variables but can this actually be done with imported 
>> modules as well?
>> the import takes around 95% of the total time so it is definitely something 
>> which should go away somehow.
>> i have checked the docs but i am not more clever now.
> 
> After a module is imported in a backend, it stays in the interpreter's
> sys.modules dictionary and importing it again will not cause the module
> Python code to be executed.
> 
> As long as you are using the same backend you should be able to call
> add_to_corpus repeatedly and the import statements should take a long
> time only the first time you call them.
> 
> This simple test demonstrates it:
> 
> $ cat /tmp/slow.py
> import time
> time.sleep(5)
> 
> $ PYTHONPATH=/tmp/ bin/postgres -p 5433 -D data/
> LOG:  database system was shut down at 2011-08-17 14:16:18 CEST
> LOG:  database system is ready to accept connections
> 
> $ bin/psql -p 5433 postgres
> Timing is on.
> psql (9.2devel)
> Type "help" for help.
> 
> postgres=# select slow();
> slow
> ------
> 
> (1 row)
> 
> Time: 5032.835 ms
> postgres=# select slow();
> slow
> ------
> 
> (1 row)
> 
> Time: 1.051 ms
> 
> Cheers,
> Jan





hello jan …

the code is actually like this …
the first function is called once per backend. it compiles some fairly fat in 
memory stuff …
this takes around 2 secs or so … but this is fine and not an issue.

-- setup the environment
CREATE OR REPLACE FUNCTION textprocess.setup_sentiment(pypath text, lang text) 
RETURNS void AS $$
        import sys
        sys.path.append(pypath)
        sys.path.append(pypath + "/external")

        from SecondCorpus import SecondCorpus
        import const

        GD['path_to_classes'] = pypath
        GD['corpus'] = SecondCorpus(lang)
        GD['lang'] = lang

        return;
$$ LANGUAGE 'plpythonu' STABLE;

this is called more frequently ...

-- add a document to the corpus
CREATE OR REPLACE FUNCTION textprocess.add_to_corpus(lang text, t text) RETURNS 
float4 AS $$

        from SecondCorpus import SecondCorpus
        from SecondDocument import SecondDocument

        doc1 = SecondDocument(GD['corpus'].senti_provider, lang, t)
        doc1.create_sentences()
        GD['corpus'].add_document(doc1)
        GD['corpus'].process()
        return doc1.total_score
$$ LANGUAGE 'plpythonu' STABLE;

the point here actually is: if i use the classes in a normal python command 
line program this routine does not look like an issue
creating the document object and doing the magic in there is not a problem 
actually …

on the SQL side this is already fairly heavy for some reason ...

 funcid | schemaname  |    funcname     | calls | total_time | self_time | 
?column? 
--------+-------------+-----------------+-------+------------+-----------+----------
 235287 | textprocess | setup_sentiment |    54 |     100166 |    100166 |     
1854
 235288 | textprocess | add_to_corpus   |   996 |     438909 |    438909 |      
440

looks like some afternoon with some more low level tools :(.

        many thanks,

                hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Caching Python modules

Reply via email to