Seems fairly reproducable. With a population of a 1000 people:
126288 living: 939 infected: 933 ave VL: 5.5225080385852285 pro alleles in population: 6 3.9313139960273147 (621 467 404 360 24 2) tap alleles in population: 4 2.7024232960074537 (830 718 317 13) mhc alleles in population: 5 3.8634863880746124 (706 490 288 280 114) "Elapsed time: 28014.93 msecs" With a population of 10,000 people: 12626 living: 9068 infected: 8980 ave VL: 5.234532293986656 pro alleles in population: 22 4.068465156597464 (8164 2554 1903 1143 1022 717 712 483 422 391 274 160 84 74 9 9 5 3 3 2 1 1) tap alleles in population: 21 3.2913665666051255 (9429 2163 1495 1461 891 596 487 443 423 361 289 49 23 8 6 4 2 2 2 1 1) mhc alleles in population: 17 11.51415636214343 (2636 2010 1914 1834 1400 1322 1260 1234 1162 998 894 506 454 292 198 20 2) "Elapsed time: 984358.517 msecs" So. It actually happens 10 times earlier with 10,000 people than with a 1000 ones. Puzzling. On Dec 22, 3:33 pm, bOR_ <boris.sch...@gmail.com> wrote: > * So far it happened in both instances that I ran the simulation for > more than 100k simulated years, so while this is reproducable, it does > take a number of hours to get there. I can see if I can get the effect > faster with a smaller population or something. > > * When I start the simulation, the memory usage is 2.4% of the > available memory (16gb), and it is happily running on 8 Intel(R) Xeon > (R) CPU X5482 @ 3.20GHz 's. > (from 'top'). > > * inc-year: > > (defn inc-year > [_] > (dosync (commute year inc))) > > *Whole source is > here:http://clojure.googlegroups.com/web/eden.clj?gsc=rQ4WoRYAAAB68Q78LH5o... > > *gather indeed scans all refs, but is only called once every 1000 > years, and right after an 'await', so I figured everything should have > been free then. > > On Dec 22, 2:56 pm, Rich Hickey <richhic...@gmail.com> wrote: > > > On Dec 22, 7:41 am, bOR_ <boris.sch...@gmail.com> wrote: > > > > Hi all, > > > > Long post, but it boils down that I'm running into a transaction > > > failed after retry limit after running my simulation for a couple of > > > hours. I chatted briefly with fyuryu in #clojure, and am now pasting > > > some of the hopefully relevant information into this post. Hope anyone > > > can shed a light. The recommendation of fyuryu was to use 'await-for' > > > rather than await, but I'm a but worried that that is just a way to > > > ignore some underlying problem. > > > > I've the simulation still online and in limbo (long live emacs -- > > > daemon), so I can answer additional questions. > > > > I'll paste part of the program, the output, the agent-errors and some > > > additional things I tried below. > > > Generally, you can get retry limit failures when a long-running > > transaction contends for the same refs as short-running transactions. > > It is hard to see what is going on with your sim without all the > > source. > > > How many cores? > > What is the memory utilization? > > Do you have any blocking calls anywhere? > > What does inc-year do? > > > Calls like 'gather' in a dosync can cause congestion, as I presume it > > does a scan of all refs? > > > > I started mucking with it a bit more and find that I can't change a > > > single ref. Everything seems to be locked. If I make 'death' do a > > > println each time it is tried, I see that it is indeed trying to apply > > > itself to ref 1 about several thousand times. > > > I don't like the sound of that. If you could create a reproducible > > test case I'll chase it down. > > > Rich --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---