Hi all,

Long post, but it boils down that I'm running into a transaction
failed after retry limit after running my simulation for a couple of
hours. I chatted briefly with fyuryu in #clojure, and am now pasting
some of the hopefully relevant information into this post. Hope anyone
can shed a light. The recommendation of fyuryu was to use 'await-for'
rather than await, but I'm a but worried that that is just a way to
ignore some underlying problem.

I've the simulation still online and in limbo (long live emacs --
daemon), so I can answer additional questions.

I'll paste part of the program, the output, the agent-errors and some
additional things I tried below.



Program: a world of refs, in which each ref might contain a host. This
host can die, infect another or evolve its infection if there is a
host present in the ref, or it can be born, if there is no host
currently in the ref.

(def world
     ;world is a 1d vector of refs to empty cells
     (vec (map (fn [_] (ref nil)) (range popsize))))

(defn birth
  "Written from an empty place point-of-view.
  standard meiosis, density dependent birth, but no allee effect"
  [loc]
  (if (< (rand) 0.5)
    (dosync
     (if (empty? @(world loc))
       (let [father (:genes @(world (rand-int popsize)))]
         (if father
           (let [living (apply vector (gather :born))
                 mother (:genes @(living (rand-int (count living))))
                 genes (mix-genes father mother)
                 rx (genes-to-regexps genes)]
             (alter (world loc) assoc :born @year :genes genes :rx
rx))))))))


(defn do-year
  "Calculate one year"
  [_]
   (send (agent nil) inc-year)
   (doseq [a (shuffle-java [birth death infect infect evolve evolve
evolve evolve evolve])
           i (shuffle-java (range popsize))]
     (a i)))

This here is the main loop. I'm sending off 8 agents that each do a
years-worth of calculation and then return. Every 125x8 = 1000 years I
do a report.

(setup)
(dotimes [m 900]
         (time
          (do
            (dotimes [y 125]
                     (def proc1 (agent nil))
                     (def proc2 (agent nil))
                     (def proc3 (agent nil))
                     (def proc4 (agent nil))
                     (def proc5 (agent nil))
                     (def proc6 (agent nil))
                     (def proc7 (agent nil))
                     (def proc8 (agent nil))
                     (send-off proc1 do-year)
                     (send-off proc2 do-year)
                     (send-off proc3 do-year)
                     (send-off proc4 do-year)
                     (send-off proc5 do-year)
                     (send-off proc6 do-year)
                     (send-off proc7 do-year)
                     (send-off proc8 do-year)
                     (await proc1 proc2 proc3 proc4 proc5 proc6 proc7
proc8))
            (report))))


Here is the output at the moment the whole thing started to break
down. First it is happily reporting every 1000 years, but then it gets
stuck and keeps going through the remaining 'm' times reporting the
same year every 2 seconds. It finishes the simulation eventually

125000   living:    913   infected:  913
 ave VL:  5.449945235487422
pro alleles in population:  4    3.6127308424095594    (697 451 383
295)
tap alleles in population:  3    2.8695026549738807    (792 525 509)
mhc alleles in population:  6    4.691854803757676    (480 456 358 308
222 2)
"Elapsed time: 89751.28 msecs"

126000   living:    932   infected:  912
 ave VL:  5.484320175438609
pro alleles in population:  4    3.8288904414827614    (629 446 423
366)
tap alleles in population:  3    2.5917933783687013    (960 524 380)
mhc alleles in population:  7    4.361219059095245    (544 506 384 284
128 14 4)
"Elapsed time: 90156.899 msecs"

126288   living:    939   infected:  933
 ave VL:  5.5225080385852285
pro alleles in population:  6    3.9313139960273147    (621 467 404
360 24 2)
tap alleles in population:  4    2.7024232960074537    (830 718 317
13)
mhc alleles in population:  5    3.8634863880746124    (706 490 288
280 114)
"Elapsed time: 28014.93 msecs"

126288   living:    939   infected:  933
 ave VL:  5.5225080385852285
pro alleles in population:  6    3.9313139960273147    (621 467 404
360 24 2)
tap alleles in population:  4    2.7024232960074537    (830 718 317
13)
mhc alleles in population:  5    3.8634863880746124    (706 490 288
280 114)
"Elapsed time: 1871.911 msecs"

I read out the agent-errors directly after the simulation ended, and I
got the following:

126288   living:    939   infected:  933
 ave VL:  5.5225080385852285
pro alleles in population:  6    3.9313139960273147    (621 467 404
360 24 2)
tap alleles in population:  4    2.7024232960074537    (830 718 317
13)
mhc alleles in population:  5    3.8634863880746124    (706 490 288
280 114)
"Elapsed time: 2055.683 msecs"
java.lang.Exception: Agent has errors (NO_SOURCE_FILE:0)
user=> (agent-errors proc1)
(agent-errors proc1)
(#<Exception java.lang.Exception: Transaction failed after reaching
retry limit>)
user=> (agent-errors proc2)
(agent-errors proc2)
(#<Exception java.lang.Exception: Transaction failed after reaching
retry limit>)
user=> (agent-errors proc3)
(agent-errors proc3)
(#<Exception java.lang.Exception: Transaction failed after reaching
retry limit>)
user=> (agent-errors proc4)
(agent-errors proc4)
(#<Exception java.lang.Exception: Transaction failed after reaching
retry limit>)
user=> agent-errors proc5)
agent-errors proc5)
#<core$agent_errors__3300 clojure.core$agent_errors__3...@5ca27a>
user=> #<Agent clojure.lang.ag...@feea85>
user=> java.lang.Exception: Unmatched delimiter: )
user=> (agent-errors proc6)
(agent-errors proc6)
(#<Exception java.lang.Exception: Transaction failed after reaching
retry limit>)
user=> (agent-errors proc7)
(agent-errors proc7)
(#<Exception java.lang.Exception: Transaction failed after reaching
retry limit>)
user=> agent-errors proc8)
agent-errors proc8)
#<core$agent_errors__3300 clojure.core$agent_errors__3...@5ca27a>
user=> #<Agent clojure.lang.ag...@56ac22>
user=> java.lang.Exception: Unmatched delimiter: )



I started mucking with it a bit more and find that I can't change a
single ref. Everything seems to be locked. If I make 'death' do a
println each time it is tried, I see that it is indeed trying to apply
itself to ref 1 about several thousand times.

user=> (death 1)
(death 1)
java.lang.Exception: Transaction failed after reaching retry limit
(NO_SOURCE_FILE:0)
user=> (death 2)
(death 2)
java.lang.Exception: Transaction failed after reaching retry limit
(NO_SOURCE_FILE:0)
user=> (death 3)
(death 3)
java.lang.Exception: Transaction failed after reaching retry limit
(NO_SOURCE_FILE:0)


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to