pmap performance degradation after 30-40 min of work?

Jim foo.bar Fri, 12 Oct 2012 07:23:47 -0700

Hi all,

I finally found an ideal use-case for pmap, however something verystrange seems to be happening after roughly 30 minutes of execution!


Ok so here is the scenario:

I've got 383 raw scienific papers (.txt) in directory that i'm groupingusing 'file-seq' and so I want to pmap a fn on each element of that seq(each document). The fn takes a document and a dictionary and annotatesthe document with terms found in the dictionary. Basically it uses regexto tag any occurrences of words that exist in the dictionary. Whenpmapping is finished, I should have a list of (annotated) strings thatwill be processed serially (doseq) in order to produce a massive filewith all these strings separated by a new-line character (this is howmost adaptive feature generators expect the data to be).

So you can see, this is perfect for pmap and indeed it seems to be doingextremely well but only for the first 240 papers roughly! all the cpusare working hard but after approximately 30-40 min cpu utilisation andoverall performance seems to degrade quite a bit...For some strangereason, 2 of my cores seem to refuse to do any work after these 240papers which results in a really really slow process. When I start theprocess it is going so fast that I cannot even read the output but as Isaid after 30-40 min it is getting unbelievably slow! Had theperformance been stable I reckon I need less than 60 min in order toannotate all 383 papers but with the current behaviour I have no choicebut to abort and restart it passing it the leftovers...


any ideas? are there any issues involved with creating that many futures?

Jim



--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

pmap performance degradation after 30-40 min of work?

Reply via email to