No i haven't profiled memory , only cpu but what you're saying makes perfect sense. In every single iteration (each file) I'm 'slurp'-ing the 2 files (the dictionary and the file to annotate) provided. Would you suggest a different GC if that is the case or simply stop slurping?

Jim


On 12/10/12 15:28, Adam wrote:
Have you tried running jconsole to monitor the memory usage? It sounds like maybe you're running out of heap space and you're mainly seeing the garbage collector doing it's thing vs your actual program.

~Adam~


On Fri, Oct 12, 2012 at 9:23 AM, Jim foo.bar <jimpil1...@gmail.com <mailto:jimpil1...@gmail.com>> wrote:

    Hi all,

    I finally found an ideal use-case for pmap, however something very
    strange seems to be happening after roughly 30 minutes of execution!

    Ok so here is the scenario:

    I've got 383 raw scienific papers (.txt) in directory that i'm
    grouping using 'file-seq' and so I want to pmap a fn on each
    element of that seq (each document). The fn takes a document and a
    dictionary and annotates the document with terms found in the
    dictionary. Basically it uses regex to tag any occurrences of
    words that exist in the dictionary. When pmapping is finished, I
    should have a list of (annotated) strings that will be processed
    serially (doseq) in order to produce a massive file with all these
    strings separated by a new-line character (this is how most
    adaptive feature generators expect the data to be).

    So you can see, this is perfect for pmap and indeed it seems to be
    doing extremely well but only for the first 240 papers roughly!
    all the cpus are working hard but after approximately 30-40 min
     cpu utilisation and overall performance seems to degrade quite a
    bit...For some strange reason, 2 of my cores seem to refuse to do
    any work after these 240 papers which results in a really really
    slow process. When I start the process it is going so fast that I
    cannot even read the output but as I said after 30-40 min it is
    getting unbelievably slow! Had the performance been stable I
    reckon I  need less than 60 min in order to annotate all 383
    papers but with the current behaviour I have no choice but to
    abort and restart it passing it the leftovers...

    any ideas? are there any issues involved with creating that many
    futures?

    Jim



-- You received this message because you are subscribed to the Google
    Groups "Clojure" group.
    To post to this group, send email to clojure@googlegroups.com
    <mailto:clojure@googlegroups.com>
    Note that posts from new members are moderated - please be patient
    with your first post.
    To unsubscribe from this group, send email to
    clojure+unsubscr...@googlegroups.com
    <mailto:clojure%2bunsubscr...@googlegroups.com>
    For more options, visit this group at
    http://groups.google.com/group/clojure?hl=en


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to