Seems fairly reproducable.

With a population of a 1000 people:

126288   living:    939   infected:  933
 ave VL:  5.5225080385852285
pro alleles in population:  6    3.9313139960273147    (621 467 404
360 24 2)
tap alleles in population:  4    2.7024232960074537    (830 718 317
13)
mhc alleles in population:  5    3.8634863880746124    (706 490 288
280 114)
"Elapsed time: 28014.93 msecs"

With a population of 10,000 people:
12626   living:    9068   infected:  8980
 ave VL:  5.234532293986656
pro alleles in population:  22    4.068465156597464    (8164 2554 1903
1143 1022 717 712 483 422 391 274 160 84 74 9 9 5 3 3 2 1 1)
tap alleles in population:  21    3.2913665666051255    (9429 2163
1495 1461 891 596 487 443 423 361 289 49 23 8 6 4 2 2 2 1 1)
mhc alleles in population:  17    11.51415636214343    (2636 2010 1914
1834 1400 1322 1260 1234 1162 998 894 506 454 292 198 20 2)
"Elapsed time: 984358.517 msecs"


So. It actually happens 10 times earlier with 10,000 people than with
a 1000 ones. Puzzling.


On Dec 22, 3:33 pm, bOR_ <boris.sch...@gmail.com> wrote:
> * So far it happened in both instances that I ran the simulation for
> more than 100k simulated years, so while this is reproducable, it does
> take a number of hours to get there. I can see if I can get the effect
> faster with a smaller population or something.
>
> * When I start the simulation, the memory usage is 2.4% of the
> available memory (16gb), and it is happily running on 8 Intel(R) Xeon
> (R) CPU X5482  @ 3.20GHz 's.
> (from 'top').
>
> * inc-year:
>
> (defn inc-year
>   [_]
>   (dosync (commute year inc)))
>
> *Whole source is 
> here:http://clojure.googlegroups.com/web/eden.clj?gsc=rQ4WoRYAAAB68Q78LH5o...
>
> *gather indeed scans all refs, but is only called once every 1000
> years, and right after an 'await', so I figured everything should have
> been free then.
>
> On Dec 22, 2:56 pm, Rich Hickey <richhic...@gmail.com> wrote:
>
> > On Dec 22, 7:41 am, bOR_ <boris.sch...@gmail.com> wrote:
>
> > > Hi all,
>
> > > Long post, but it boils down that I'm running into a transaction
> > > failed after retry limit after running my simulation for a couple of
> > > hours. I chatted briefly with fyuryu in #clojure, and am now pasting
> > > some of the hopefully relevant information into this post. Hope anyone
> > > can shed a light. The recommendation of fyuryu was to use 'await-for'
> > > rather than await, but I'm a but worried that that is just a way to
> > > ignore some underlying problem.
>
> > > I've the simulation still online and in limbo (long live emacs --
> > > daemon), so I can answer additional questions.
>
> > > I'll paste part of the program, the output, the agent-errors and some
> > > additional things I tried below.
>
> > Generally, you can get retry limit failures when a long-running
> > transaction contends for the same refs as short-running transactions.
> > It is hard to see what is going on with your sim without all the
> > source.
>
> > How many cores?
> > What is the memory utilization?
> > Do you have any blocking calls anywhere?
> > What does inc-year do?
>
> > Calls like 'gather' in a dosync can cause congestion, as I presume it
> > does a scan of all refs?
>
> > > I started mucking with it a bit more and find that I can't change a
> > > single ref. Everything seems to be locked. If I make 'death' do a
> > > println each time it is tried, I see that it is indeed trying to apply
> > > itself to ref 1 about several thousand times.
>
> > I don't like the sound of that. If you could create a reproducible
> > test case I'll chase it down.
>
> > Rich
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to