Is this a bug? Python intermittently stops dead for seconds
Below is a simple program that will cause python to intermittently stop executing for a few seconds. it's 100% reproducible on my machine. I'd be tempted to say this is a nasty garbage collection performance issue except that there is no major memory to be garbage collected in this script. I'd be tempted to say it was a unix virtual memory issue except this is occuring at around 1/5th of my physical memory size. So something else unexplained is going on Class Foo instances create and hold a list of size nfoo of integers. (nfoo =10) Class Bar instances create and hold a list of size nbar of Foo objects. (nbar =100) When the code runs it starts creating and storing Bar objects in a list while watching for real-time glitches in how long it takes to create the next Foo object. If this is longer than 1/2 of a second then it reports it. On my computer after creating 1500 Bar objects, the rate of creation of new Foo suddenly has a periodic glitch. This glitch re- occurs about every 300 Bar Objects, and the duration of the glitch keeps getting longer--growing to many seconds Platform: 800Mhz powermac g 4 1Gb of memory python: python 2.4.2 Note: since I an using absolute time threshold for reporting the glitches, the first one may take more iterations before it occurs on a fast computer. You may need to increase nbar or nfoo. import sys from time import time class Foo(object): def __init__(me,nfoo): me.memory = [1]*nfoo class Bar(object): def __init__(me,nbar,nfoo): tgx.set_tag("Bar") # timer b = [None]*nbar for i in xrange(nbar): b[i]=Foo(nfoo) # make a foo, add it to list tgx.jump("Bar"+`i`) #timer me.b = b #store the list in my instance memory # just a utility class to time things. class gtime: def __init__(me,f=sys.stderr): me.last = time() me.f=f me.tag = "" me.ticks = 0 def set_tag(me,tag): me.tag = tag me.ticks = 0 def reset(me): me.ticks = 0 def jump(me,tag="NONE",allowed_gap=0.5): me.t = time() me.ticks +=1 if me.t-me.last>allowed_gap: print >>me.f,"Big Gap:",me.t-me.last,"seconds ",me.tag,tag,me.ticks me.last = time() tgx = gtime() # instance of the timer # main loop nbar = 100 nfoo = 10 ad_nauseum = 2 final = [None]*ad_nauseum for i in xrange(ad_nauseum ): if i%100 == 0 :print >>sys.stderr,"Bars Made: ",i final[i] = Bar(nbar,nfoo) sample Output: Bars Made: 0 Bars Made: 100 Bars Made: 200 Bars Made: 300 Bars Made: 400 Bars Made: 500 Bars Made: 600 Bars Made: 700 Bars Made: 800 Bars Made: 900 Bars Made: 1000 Bars Made: 1100 Bars Made: 1200 Bars Made: 1300 Bars Made: 1400 Bars Made: 1500 Big Gap: 0.618862867355 secondsBar Bar5 6 Bars Made: 1600 Bars Made: 1700 Bars Made: 1800 Big Gap: 0.748915195465 secondsBar Bar76 77 Bars Made: 1900 Bars Made: 2000 Bars Made: 2100 Big Gap: 0.809149980545 secondsBar Bar45 46 Bars Made: 2200 Bars Made: 2300 Bars Made: 2400 Big Gap: 0.898494958878 secondsBar Bar15 16 Bars Made: 2500 Bars Made: 2600 Bars Made: 2700 Big Gap: 1.01110696793 secondsBar Bar86 87 Bars Made: 2800 Bars Made: 2900 Bars Made: 3000 Big Gap: 1.12396192551 secondsBar Bar55 56 Bars Made: 3100 Bars Made: 3200 Bars Made: 3300 Big Gap: 1.39006495476 secondsBar Bar25 26 Bars Made: 3400 Bars Made: 3500 Bars Made: 3600 Big Gap: 1.55699706078 secondsBar Bar96 97 Bars Made: 3700 Bars Made: 3800 Bars Made: 3900 Big Gap: 1.49929594994 secondsBar Bar65 66 Bars Made: 4000 Bars Made: 4100 Bars Made: 4200 Big Gap: 1.64840602875 secondsBar Bar35 36 Bars Made: 4300 Bars Made: 4400 Bars Made: 4500 Big Gap: 1.728484869 secondsBar Bar5 6 Bars Made: 4600 Bars Made: 4700 Bars Made: 4800 -- http://mail.python.org/mailman/listinfo/python-list
Re: Is this a bug? Python intermittently stops dead for seconds
Steve and other good folks who replied: I want to clarify that, on my computer, the first instance of the gap occurs way before the memory if filled. (at about 20% of physical ram). Additionally the process monitor shows no page faults. Yes if you let the as-written demo program run to completetion (all 20,000 iterations) then on many computers it would not be surprising that your computer eventually goes into forced page swapping at some point. That would be expected and is not the issue than the one I am concerned with. in my case starts glicthing at around iteration 1000. 1000(bars) x 100(foos)x(10 integers in array) is nominally 100,000 class objects and 1,000,000 array elements. (note that the array if filled as [1]*10, so there is actually only one "integer", but 10 array elements refering to it, per foo class.) However steve may have put his finger on the reason why the duration grows with time. Here is my current hypothesis. The design of the program does not have and points where significant amounts of memory are released: all objects have held references till the end. But prehaps there are some implicitly created objects of the same size created along the way??? For example when I write me.memory = [1]*nfoo perhaps internally, python is allocating an array of size foo and then __copying__ it into me.memory??? Since there is no reference to the intermediate it would then marked for future garbage collection. If that were true then the memory would have interleaved entities of things to GC and things with references held in me.memory. Then to remove these would require GC to scan the entire set of existing objects, which is growing. Turning off GC would prevent this. In any case I don't think what I'm doing is very unusual. The original program that trigger my investigation of the bug was doing this: foo was an election ballot holding 10 outcomes, and bar was a set of 100 ballots from 100 voting machines, and the total array was holding the ballot sets from a few thousand voting machines. Almost any inventory program is likely to have such a simmilar set of nested array, so it hardly seems unusual. -- http://mail.python.org/mailman/listinfo/python-list
Re: Is this a bug? Python intermittently stops dead for seconds
By the way if you are on a fast computer, and an OS whose time.time() function can resolve less than 0.5 seconds then you can see this problem on your machine at lower memory utilizations by changing the value of the default "allowed_gap" in the gtime class from 0.5 seconds down to say 0.1 second. This is the threshold for which the computer program flags the time it takes to create a "foo" object. on a fast computer it should take much less than 0.1 sec. -Original Message- >From: charlie strauss <[EMAIL PROTECTED]> >Sent: Oct 1, 2006 10:33 AM >To: Steve Holden <[EMAIL PROTECTED]>, python-list@python.org >Subject: Re: Is this a bug? Python intermittently stops dead for seconds > >Steve and other good folks who replied: > >I want to clarify that, on my computer, the first instance of the gap occurs >way before the memory if filled. (at about 20% of physical ram). Additionally >the process monitor shows no page faults. > > Yes if you let the as-written demo program run to completetion (all 20,000 > iterations) then on many computers it would not be surprising that your > computer eventually goes into forced page swapping at some point. That would > be expected and is not the issue than the one I am concerned with. > >in my case starts glicthing at around iteration 1000. > >1000(bars) x 100(foos)x(10 integers in array) > >is nominally >100,000 class objects and >1,000,000 array elements. > >(note that the array if filled as [1]*10, so there is actually only one >"integer", but 10 array elements refering to it, per foo class.) > > >However steve may have put his finger on the reason why the duration grows >with time. Here is my current hypothesis. The design of the program does not >have and points where significant amounts of memory are released: all objects >have held references till the end. But prehaps there are some implicitly >created objects of the same size created along the way??? For example when I >write > >me.memory = [1]*nfoo > >perhaps internally, python is allocating an array of size foo and then >__copying__ it into me.memory??? Since there is no reference to the >intermediate it would then marked for future garbage collection. > >If that were true then the memory would have interleaved entities of things to >GC and things with references held in me.memory. > >Then to remove these would require GC to scan the entire set of existing >objects, which is growing. > >Turning off GC would prevent this. > > >In any case I don't think what I'm doing is very unusual. The original >program that trigger my investigation of the bug was doing this: > >foo was an election ballot holding 10 outcomes, and bar was a set of 100 >ballots from 100 voting machines, and the total array was holding the ballot >sets from a few thousand voting machines. > >Almost any inventory program is likely to have such a simmilar set of nested >array, so it hardly seems unusual. > > > > > >-- >http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: Is this a bug? Python intermittently stops dead for seconds
>> >I think the point you are missing is that the garbage collector is >triggered from time to time to ensure that no cyclical garbage remains >uncollected, IIRC. The more data that's been allocated, the longer it >takes the collector to scan all of memory to do its job. > >If you can find a way to avoid the behaviour I'm sure the development >team would be interested to hear it :-) > >I think you'll find that most programs that eat through memory in this >way will exhibit pretty much the same behaviour. If you *know* your >program isn't creating data cycles, just turn the GC off and rely on >reference counting. That won't save you from paging when you eventually >exhaust physical memory. Nothing can. Could you clarify that for me. GC really has three components two it: 1) finding and freeing unrefernced memory by refer counts 2) cycle removal and 3) defragementing the storage stack. If I turn off GC, don't I lose all of these? >From a user perspective, turning off GC and then managing it yourself is >unappealing. What would be preferrable would be to be able to simply turn >down it's threshold. That is, what I really want is to tell GC it can hold >off and checks other than reference counts until X% of the memory is filled. >At some point I want it to kick in, and I don't want to have to >programatically manage that, but simply specify a simple tolerance. Even better , I'd like to keep 1 and 3 and turn off just 2 and just use weak reference in the few cases I really need them. -- http://mail.python.org/mailman/listinfo/python-list
Re: Is this a bug? Python intermittently stops dead for seconds
Steve, digging into the gc docs a bit more, I think the behaviour I am seeing is still not expected. Namely, the program I offered has no obvious place where objects are deallocated. The way GC is supposed to work is thate there are three levels of objects level0: newly created objects level1: objects that survived 1 round of garbage collection level2: objects that survivied 2+ rounds of gargbage collection Since all of my numerous objects are level2 objects, and none of them are every deallocated, then I should never trip the GC for these. Your explanation would require this to be tripped so I can't explain it. For your explanation to be correct then there as to be some non-obvious step in the program that is deallocating level2 items in sufficient numbers to trip the GC. -- http://mail.python.org/mailman/listinfo/python-list
Re: Is this a bug? Python intermittently stops dead for seconds
On Oct 1, 2006, at 9:48 AM, Fredrik Lundh wrote: > charlie strauss wrote: > >> level0: newly created objects >> level1: objects that survived 1 round of garbage collection >> level2: objects that survivied 2+ rounds of gargbage collection >> >> Since all of my numerous objects are level2 objects, and none of >> them are every deallocated, then I should never trip the GC for >> these. > > your understanding of generational GC is a bit fuzzy, it seems. > that an > object is promoted to a higher level means that it's not inspected > quite > as often as lower-level objects, not that it's never inspected at all. As I understand it, level2 (and level1) objects only undergo gc when more than 10 of them is deallocated. Since I never deallocate, this should not be tripped right? In any case regarding your other comments: >> Could you clarify that for me. GC really has three components >> two it: 1) finding and freeing unrefernced memory by refer >> refer counts 2) cycle removal and 3) defragementing the >> storage stack. If I turn off GC, don't I lose all of these? >> > > CPython always does (1), only does (2) if cycle-breaking GC isn't > disabled, and never does (3). Never does 3? then how does it compact it's memory allocation area? Surely it does not rely on the OS to manage every small object as a separate memory allocation. And just to be clear: are you saying that when I do a gc.disable this only turns off 2 and not 1? The docs don't say that as far as I can tell. > in your case, it's (2) that takes more and more time, simply because > you're creating tons of non-trivial objects. to see what's going > on in > there, add > > import gc > gc.set_debug(gc.DEBUG_STATS) > > to the top of your program, and look at the status messages that > appear > just before each "Big Gap" message. Could you be a bit more explicit. I'm new to the gc module. Sorry to be slow but I don't know what "looking at the status message" means. > > > > -- > http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list