On Aug 17, 2011, at 2:19 PM, Jan Urbański wrote: > On 17/08/11 14:09, PostgreSQL - Hans-Jürgen Schönig wrote: >> CREATE OR REPLACE FUNCTION textprocess.add_to_corpus(lang text, t text) >> RETURNS float4 AS $$ >> >> from SecondCorpus import SecondCorpus >> from SecondDocument import SecondDocument >> >> i am doing some intense text mining here. >> the problem is: is it possible to cache those imported modules from function >> to function call. >> GD works nicely for variables but can this actually be done with imported >> modules as well? >> the import takes around 95% of the total time so it is definitely something >> which should go away somehow. >> i have checked the docs but i am not more clever now. > > After a module is imported in a backend, it stays in the interpreter's > sys.modules dictionary and importing it again will not cause the module > Python code to be executed. > > As long as you are using the same backend you should be able to call > add_to_corpus repeatedly and the import statements should take a long > time only the first time you call them. > > This simple test demonstrates it: > > $ cat /tmp/slow.py > import time > time.sleep(5) > > $ PYTHONPATH=/tmp/ bin/postgres -p 5433 -D data/ > LOG: database system was shut down at 2011-08-17 14:16:18 CEST > LOG: database system is ready to accept connections > > $ bin/psql -p 5433 postgres > Timing is on. > psql (9.2devel) > Type "help" for help. > > postgres=# select slow(); > slow > ------ > > (1 row) > > Time: 5032.835 ms > postgres=# select slow(); > slow > ------ > > (1 row) > > Time: 1.051 ms > > Cheers, > Jan
hello jan … the code is actually like this … the first function is called once per backend. it compiles some fairly fat in memory stuff … this takes around 2 secs or so … but this is fine and not an issue. -- setup the environment CREATE OR REPLACE FUNCTION textprocess.setup_sentiment(pypath text, lang text) RETURNS void AS $$ import sys sys.path.append(pypath) sys.path.append(pypath + "/external") from SecondCorpus import SecondCorpus import const GD['path_to_classes'] = pypath GD['corpus'] = SecondCorpus(lang) GD['lang'] = lang return; $$ LANGUAGE 'plpythonu' STABLE; this is called more frequently ... -- add a document to the corpus CREATE OR REPLACE FUNCTION textprocess.add_to_corpus(lang text, t text) RETURNS float4 AS $$ from SecondCorpus import SecondCorpus from SecondDocument import SecondDocument doc1 = SecondDocument(GD['corpus'].senti_provider, lang, t) doc1.create_sentences() GD['corpus'].add_document(doc1) GD['corpus'].process() return doc1.total_score $$ LANGUAGE 'plpythonu' STABLE; the point here actually is: if i use the classes in a normal python command line program this routine does not look like an issue creating the document object and doing the magic in there is not a problem actually … on the SQL side this is already fairly heavy for some reason ... funcid | schemaname | funcname | calls | total_time | self_time | ?column? --------+-------------+-----------------+-------+------------+-----------+---------- 235287 | textprocess | setup_sentiment | 54 | 100166 | 100166 | 1854 235288 | textprocess | add_to_corpus | 996 | 438909 | 438909 | 440 looks like some afternoon with some more low level tools :(. many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers