Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Can you distill the program into something reproducible? Maybe with something slightly less than 45Gb but still exhibiting some degradation of exit performance? I can try to point our commercial profiling tools at it and see what it is doing. K -Original Message- From: [email protected] [mailto:[email protected]] On Behalf Of Mike Coleman Sent: 19. desember 2008 23:30 To: [email protected] Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) I have a program that creates a huge (45GB) defaultdict. (The keys are short strings, the values are short lists of pairs (string, int).) Nothing but possibly the strings and ints is shared. The program takes around 10 minutes to run, but longer than 20 minutes to exit (I gave up at that point). That is, after executing the final statement (a print), it is apparently spending a huge amount of time cleaning up before exiting. I haven't installed any exit handlers or anything like that, all files are already closed and stdout/stderr flushed, and there's nothing special going on. I have done 'gc.disable()' for performance (which is hideous without it)--I have no reason to think there are any loops. Currently I am working around this by doing an os._exit(), which is immediate, but this seems like a bit of hack. Is this something that needs fixing, or that has already been fixed? Mike ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/kristjan%40ccpgames.com ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program havin g huge (45G) dict (python 2.5.2)
On Sat, 20 Dec 2008 09:02:38 pm Kristján Valur Jónsson wrote: > Can you distill the program into something reproducible? > Maybe with something slightly less than 45Gb but still exhibiting > some degradation of exit performance? I can try to point our > commercial profiling tools at it and see what it is doing. K In November 2007, a similar problem was reported on the comp.lang.python newsgroup. 370MB was large enough to demonstrate the problem. I don't know if a bug was ever reported. The thread starts here: http://mail.python.org/pipermail/python-list/2007-November/465498.html or if you prefer Google Groups: http://preview.tinyurl.com/97xsso and it describes extremely long times to populate and destroy large dicts even with garbage collection turned off. My summary at the time was: "On systems with multiple CPUs or 64-bit systems, or both, creating and/or deleting a multi-megabyte dictionary in recent versions of Python (2.3, 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes, compared to seconds if the system only has a single CPU. Turning garbage collection off doesn't help." I make no guarantee that the above is a correct description of the problem, only that this is what I believed at the time. I'm afraid it is a very long thread, with multiple red herrings, lots of people unable to reproduce the problem, and the usual nonsense that happens on comp.lang.python. I was originally one of the skeptics until I reproduced the original posters problem. I generated a sample file 8 million key/value pairs as a 370MB text file. Reading it into a dict took two and a half minutes on my relatively slow computer. But deleting the dict took more than 30 minutes even with garbage collection switched off. Sample code reproducing the problem on my machine is here: http://mail.python.org/pipermail/python-list/2007-November/465513.html According to this post of mine: http://mail.python.org/pipermail/python-list/2007-November/466209.html deleting 8 million (key, value) pairs stored as a list of tuples was very fast. It was only if they were stored as a dict that deleting it was horribly slow. Please note that other people have tried and failed to replicate the problem. I suspect the fault (if it is one, and not human error) is specific to some combinations of Python version and hardware. Even if this is a Will Not Fix, I'd be curious if anyone else can reproduce the problem. Hope this is helpful, Steven. > -Original Message- > From: [email protected] > [mailto:[email protected]] On > Behalf Of Mike Coleman Sent: 19. desember 2008 23:30 > To: [email protected] > Subject: [Python-Dev] extremely slow exit for program having huge > (45G) dict (python 2.5.2) > > I have a program that creates a huge (45GB) defaultdict. (The keys > are short strings, the values are short lists of pairs (string, > int).) Nothing but possibly the strings and ints is shared. > > The program takes around 10 minutes to run, but longer than 20 > minutes to exit (I gave up at that point). That is, after executing > the final statement (a print), it is apparently spending a huge > amount of time cleaning up before exiting. I haven't installed any > exit handlers or anything like that, all files are already closed and > stdout/stderr flushed, and there's nothing special going on. I have > done > 'gc.disable()' for performance (which is hideous without it)--I have > no reason to think there are any loops. > > Currently I am working around this by doing an os._exit(), which is > immediate, but this seems like a bit of hack. Is this something that > needs fixing, or that has already been fixed? > > Mike -- Steven D'Aprano ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Mike Coleman wrote: I have a program that creates a huge (45GB) defaultdict. (The keys are short strings, the values are short lists of pairs (string, int).) Nothing but possibly the strings and ints is shared. The program takes around 10 minutes to run, but longer than 20 minutes to exit (I gave up at that point). That is, after executing the final statement (a print), it is apparently spending a huge amount of time cleaning up before exiting. I haven't installed any exit handlers or anything like that, all files are already closed and stdout/stderr flushed, and there's nothing special going on. I have done 'gc.disable()' for performance (which is hideous without it)--I have no reason to think there are any loops. Currently I am working around this by doing an os._exit(), which is immediate, but this seems like a bit of hack. Is this something that needs fixing, or that has already been fixed? You don't mention the platform, but... This behaviour was not unknown in the distant past, with much smaller datasets. Most of the problems then related to the platform malloc() doing funny things as stuff was free()ed, like coalescing free space. [I once sat and watched a Python script run in something like 30 seconds and then take nearly 10 minutes to terminate, as you describe (Python 2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of hundred MB of memory - the Solaris 2.5 malloc() had some undesirable properties from Python's point of view] PyMalloc effectively removed this as an issue for most cases and platform malloc()s have also become considerably more sophisticated since then, but I wonder whether the sheer size of your dataset is unmasking related issues. Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus accumulates (2.3 & 2.4 never free()ed arenas). Your platform malloc() might have odd behaviour with 45GB of arenas returned to it piecemeal. This is something that could be checked with a small C program. Calling os._exit() circumvents the free()ing of the arenas. Also consider that, with the exception of small integers (-1..256), no interning of integers is done. If your data contains large quantities of integers with non-unique values (that aren't in the small integer range) you may find it useful to do your own interning. -- - Andrew I MacIntyre "These thoughts are mine alone..." E-mail: [email protected] (pref) | Snail: PO Box 370 [email protected] (alt) |Belconnen ACT 2616 Web:http://www.andymac.org/ |Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Andrew MacIntyre wrote: > Mike Coleman wrote: >> I have a program that creates a huge (45GB) defaultdict. (The keys >> are short strings, the values are short lists of pairs (string, int).) >> Nothing but possibly the strings and ints is shared. >> >> The program takes around 10 minutes to run, but longer than 20 minutes >> to exit (I gave up at that point). That is, after executing the final >> statement (a print), it is apparently spending a huge amount of time >> cleaning up before exiting. I haven't installed any exit handlers or >> anything like that, all files are already closed and stdout/stderr >> flushed, and there's nothing special going on. I have done >> 'gc.disable()' for performance (which is hideous without it)--I have >> no reason to think there are any loops. >> >> Currently I am working around this by doing an os._exit(), which is >> immediate, but this seems like a bit of hack. Is this something that >> needs fixing, or that has already been fixed? > > You don't mention the platform, but... > > This behaviour was not unknown in the distant past, with much smaller > datasets. Most of the problems then related to the platform malloc() > doing funny things as stuff was free()ed, like coalescing free space. > > [I once sat and watched a Python script run in something like 30 seconds > and then take nearly 10 minutes to terminate, as you describe (Python > 2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of > hundred MB of memory - the Solaris 2.5 malloc() had some undesirable > properties from Python's point of view] > > PyMalloc effectively removed this as an issue for most cases and platform > malloc()s have also become considerably more sophisticated since then, > but I wonder whether the sheer size of your dataset is unmasking related > issues. > > Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus > accumulates (2.3 & 2.4 never free()ed arenas). Your platform malloc() > might have odd behaviour with 45GB of arenas returned to it piecemeal. > This is something that could be checked with a small C program. > Calling os._exit() circumvents the free()ing of the arenas. > > Also consider that, with the exception of small integers (-1..256), no > interning of integers is done. If your data contains large quantities > of integers with non-unique values (that aren't in the small integer > range) you may find it useful to do your own interning. > It's a pity a simplistic approach that redefines all space reclamation activities as null functions won't work. I hate to think of all the cycles that are being wasted reclaiming space just because a program has terminated, when in fact an os.exit() call would work just as well from the user's point of view. Unfortunately there are doubtless programs out there that do rely on actions being taken at shutdown. Maybe os.exit() could be more widely advertised, though ... regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Distutils maintenance
Benjamin Peterson schrieb: > On Fri, Dec 19, 2008 at 12:55 PM, Tarek Ziadé wrote: >> Hello >> >> I would like to request a commit access to work specifically on >> distutils maintenance. > > +1 > > We are currently without an active distutils maintainer, and many > stale distutil tickets are in need of attention I'm sure Tarek could > provide. Tarek has also been providing many useful patches of his own. FWIW, +1. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1
Barry Warsaw schrieb: > I'd like to get Python 3.0.1 out before the end of the year. There > are no showstoppers, but I haven't yet looked at the deferred blockers > or the buildbots. > > Do you think we can get 3.0.1 out on December 24th? Or should we wait > until after Christmas and get it out, say on the 29th? Do we need an > rc? > > This question goes mostly to Martin and Georg. What would work for > you guys? Since the 24th is the most important Christmas day around here, I'll not be available then :) Either 23rd or 29th is fine with me. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Steve> Unfortunately there are doubtless programs out there that do rely Steve> on actions being taken at shutdown. Indeed. I believe any code which calls atexit.register. Steve> Maybe os.exit() could be more widely advertised, though ... That would be os._exit(). Calling it avoids calls to exit functions registered with atexit.register(). I believe it is both safe, and reasonable programming practice for modules to register exit functions. Both the logging and multiprocessing modules call it. It's incumbent on the application programmer to know these details of the modules the app uses (perhaps indirectly) to know whether or not it's safe/wise to call os._exit(). -- Skip Montanaro - [email protected] - http://smontanaro.dyndns.org/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Call PyType_Ready on builtin types during interpreter startup?
On Sat, Dec 20, 2008, Nick Coghlan wrote: > > It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of > the builtin types - they're left to have it called implicitly when an > operation using them needs tp_dict filled in. This seems like a release blocker for 3.0.1 to me -- Aahz ([email protected]) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
On Sat, Dec 20, 2008 at 4:02 AM, Kristján Valur Jónsson wrote: > Can you distill the program into something reproducible? > Maybe with something slightly less than 45Gb but still exhibiting some > degradation of exit performance? > I can try to point our commercial profiling tools at it and see what it is > doing. I will try next week to see if I can come up with a smaller, submittable example. Thanks. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Andrew, this is on an (intel) x86_64 box with 64GB of RAM. I don't recall the maker or details of the architecture off the top of my head, but it would be something "off the rack" from Dell or maybe HP. There were other users on the box at the time, but nothing heavy or that gave me any reason to think was affecting my program. It's running CentOS 5 I think, so that might make glibc several years old. Your malloc idea sounds plausible to me. If it is a libc problem, it would be nice if there was some way we could tell malloc to "live for today because there is no tomorrow" in the terminal phase of the program. I'm not sure exactly how to attack this. Callgrind is cool, but no way will work on something this size. Timed ltrace output might be interesting. Or maybe a gprof'ed Python, though that's more work. Regarding interning, I thought this only worked with strings. Is there some way to intern integers? I'm probably creating 300M integers more or less uniformly distributed across range(1). Mike On Sat, Dec 20, 2008 at 4:08 AM, Andrew MacIntyre wrote: > Mike Coleman wrote: >> >> I have a program that creates a huge (45GB) defaultdict. (The keys >> are short strings, the values are short lists of pairs (string, int).) >> Nothing but possibly the strings and ints is shared. >> >> The program takes around 10 minutes to run, but longer than 20 minutes >> to exit (I gave up at that point). That is, after executing the final >> statement (a print), it is apparently spending a huge amount of time >> cleaning up before exiting. I haven't installed any exit handlers or >> anything like that, all files are already closed and stdout/stderr >> flushed, and there's nothing special going on. I have done >> 'gc.disable()' for performance (which is hideous without it)--I have >> no reason to think there are any loops. >> >> Currently I am working around this by doing an os._exit(), which is >> immediate, but this seems like a bit of hack. Is this something that >> needs fixing, or that has already been fixed? > > You don't mention the platform, but... > > This behaviour was not unknown in the distant past, with much smaller > datasets. Most of the problems then related to the platform malloc() > doing funny things as stuff was free()ed, like coalescing free space. > > [I once sat and watched a Python script run in something like 30 seconds > and then take nearly 10 minutes to terminate, as you describe (Python > 2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of > hundred MB of memory - the Solaris 2.5 malloc() had some undesirable > properties from Python's point of view] > > PyMalloc effectively removed this as an issue for most cases and platform > malloc()s have also become considerably more sophisticated since then, > but I wonder whether the sheer size of your dataset is unmasking related > issues. > > Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus > accumulates (2.3 & 2.4 never free()ed arenas). Your platform malloc() > might have odd behaviour with 45GB of arenas returned to it piecemeal. > This is something that could be checked with a small C program. > Calling os._exit() circumvents the free()ing of the arenas. > > Also consider that, with the exception of small integers (-1..256), no > interning of integers is done. If your data contains large quantities > of integers with non-unique values (that aren't in the small integer > range) you may find it useful to do your own interning. > > -- > - > Andrew I MacIntyre "These thoughts are mine alone..." > E-mail: [email protected] (pref) | Snail: PO Box 370 > [email protected] (alt) |Belconnen ACT 2616 > Web:http://www.andymac.org/ |Australia > ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Mike Coleman wrote: ... Regarding interning, I thought this only worked with strings. Is there some way to intern integers? I'm probably creating 300M integers more or less uniformly distributed across range(1)? held = list(range(1)) ... troublesome_dict[string] = held[number_to_hold] ... --Scott David Daniels [email protected] ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
You can always try poor-man's profiling, which is surprisingly useful in the face of massive performance problems. Just attach a debugger to the program, and when it suffering from a performance problem, break the execution on a regular basis. You are statistically very likely to get a callstack representative of the problem you are having. Do this a few times and you will get a fair impression of what the program is spending its time on. >From the debugger, you can also examine the python callstack of the program by >examinging the 'f' local variable in the Frame Evaluation function. Have fun, K -Original Message- From: [email protected] [mailto:[email protected]] On Behalf Of Mike Coleman Sent: 20. desember 2008 17:09 To: Andrew MacIntyre Cc: Python Dev Subject: Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) I'm not sure exactly how to attack this. Callgrind is cool, but no way will work on something this size. Timed ltrace output might be interesting. Or maybe a gprof'ed Python, though that's more work. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] VM imaging based launch optimizations for CPython?
Hi fellow snakemen and lizard ladies, We have been recently done lots of Python work on Nokia Series 60 phones and even managed to roll out some commercial Python based applications. In the future we plan to create some iPhone Python apps also. Python runs fine in phones - after it has been launched. Currently the biggest issue preventing the world dominance of Python based mobile applications is the start up time. We cope with the issue by using fancy splash screens and progress indicators, but it does't cure the fact that it takes a minute to show the main user interface of the application. Most of the time is spend in import executing opcodes and forming function and class structures in memory - something which cannot be easily boosted. Now, we have been thinking. Maemo has fork() based Python launcher ( http://blogs.gnome.org/johan/2007/01/18/introducing-python-launcher/) which greatly speed ups the start up time by holding Python in memory all the time. We cannot afford such luxury on Symbian and iPhone, since we do not control the operating system. So how about this 1. A Python application is launched normally 2. After VM has initialized module importing and reached a static launch state (meaning that the state is same on every launch) the VM state is written on to disk 3. Application continues execution and starts doing dynamic stuff 4. On the following launches, special init code is used which directly blits VM image from disk back to memory and we have reached the static state again without going whoops of executing import related opcodes 5. Also, I have heard a suggestion that VM image could be defragmented and analyzed offline Any opinions? Cheers, Mikko -- Mikko Ohtamaa Red Innovation Ltd. Oulu, Finland http://www.redinnovation.com ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having hug e (45G) dict (python 2.5.2)
Steven D'Aprano pearwood.info> writes: > > In November 2007, a similar problem was reported on the comp.lang.python > newsgroup. 370MB was large enough to demonstrate the problem. I don't > know if a bug was ever reported. Do you still reproduce it on trunk? I've tried your scripts on my machine and they work fine, even if I leave garbage collecting enabled during the process. (dual core 64-bit machine but in 32-bit mode) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
On 2008-12-20 17:57, Mike Coleman wrote: > On Sat, Dec 20, 2008 at 4:02 AM, Kristján Valur Jónsson > wrote: >> Can you distill the program into something reproducible? >> Maybe with something slightly less than 45Gb but still exhibiting some >> degradation of exit performance? >> I can try to point our commercial profiling tools at it and see what it is >> doing. > > I will try next week to see if I can come up with a smaller, > submittable example. Thanks. These long exit times are usually caused by the garbage collection of objects. This can be a very time consuming task. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 20 2008) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg wrote: > These long exit times are usually caused by the garbage collection > of objects. This can be a very time consuming task. In that case, the question would be "why is the interpreter collecting garbage when it knows we're trying to exit anyway?". -- Cheers, Leif ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Leif Walsh wrote: On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg wrote: These long exit times are usually caused by the garbage collection of objects. This can be a very time consuming task. In that case, the question would be "why is the interpreter collecting garbage when it knows we're trying to exit anyway?". Because finalizers are only called when an object is destroyed presumably. Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Leif> In that case, the question would be "why is the interpreter Leif> collecting garbage when it knows we're trying to exit anyway?". Because useful side effects are sometimes performed as a result of this activity (flushing disk buffers, closing database connections, etc). Skip ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
[M.-A. Lemburg] >> These long exit times are usually caused by the garbage collection >> of objects. This can be a very time consuming task. [Leif Walsh] > In that case, the question would be "why is the interpreter collecting > garbage when it knows we're trying to exit anyway?". Because user-defined destructors (like __del__ methods and weakref callbacks) may be associated with garbage, and users presumably want those to execute. Doing so requires identifying identifying garbage and releasing it, same as if the interpreter didn't happen to be exiting. BTW, the original poster should try this: use whatever tools the OS supplies to look at CPU and disk usage during the long exit. What I /expect/ is that almost no CPU time is being used, while the disk is grinding itself to dust. That's what happens when a large number of objects have been swapped out to disk, and exit processing has to page them all back into memory again (in order to decrement their refcounts). Python's cyclic gc (the `gc` module) has nothing to do with this -- it's typically the been-there-forever refcount-based non-cyclic gc that accounts for supernaturally long exit times. If that is the case here, there's no evident general solution. If you have millions of objects still alive at exit, refcount-based reclamation has to visit all of them, and if they've been swapped out to disk it can take a very long time to swap them all back into memory again. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
On 2008-12-20 21:20, Leif Walsh wrote: > On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg wrote: >> These long exit times are usually caused by the garbage collection >> of objects. This can be a very time consuming task. > > In that case, the question would be "why is the interpreter collecting > garbage when it knows we're trying to exit anyway?". It cannot know until the very end, because there may still be some try: ... except SystemExit: ... somewhere in the code waiting to trigger and stop the system exit. If you want a really fast exit, try this: import os os.kill(os.getpid(), 9) But you better know what you're doing if you take this approach... -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 20 2008) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
(@Skip, Michael, Tim) On Sat, Dec 20, 2008 at 3:26 PM, wrote: > Because useful side effects are sometimes performed as a result of this > activity (flushing disk buffers, closing database connections, etc). Of course they are. But what about the case given above: On Sat, Dec 20, 2008 at 5:55 AM, Steven D'Aprano wrote: > I was originally one of the skeptics until I reproduced the original > posters problem. I generated a sample file 8 million key/value pairs as > a 370MB text file. Reading it into a dict took two and a half minutes > on my relatively slow computer. But deleting the dict took more than 30 > minutes even with garbage collection switched off. It might be a semantic change that I'm looking for here, but it seems to me that if you turn off the garbage collector, you should be able to expect that either it also won't run on exit, or it should have a way of letting you tell it not to run on exit. If I'm running without a garbage collector, that assumes I'm at least cocky enough to think I know when I'm done with my objects, so I should know to delete the objects that have __del__ functions I care about before I exit. Well, maybe; I'm sure one of you could drag out a programmer that would make that mistake, but turning off the garbage collector to me seems to send the experience message, at least a little. Does the garbage collector run any differently when the process is exiting? It seems that it wouldn't need to do anything more that run through all objects in the heap and delete them, which doesn't require anything fancy, and should be able to sort by address to aid with caching. If it's already this fast, then I guess it really is the sheer number of function calls necessary that are causing such a slowdown in the cases we've seen, but I find this hard to believe. -- Cheers, Leif ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
[Mike Coleman]
>> ... Regarding interning, I thought this only worked with strings.
Implementation details. Recent versions of CPython also, e.g.,
"intern" the empty tuple, and very small integers.
>> Is there some way to intern integers? I'm probably creating 300M
>> integers more or less uniformly distributed across range(1)?
Interning would /vastly/ reduce memory use for ints in that case, from
gigabytes down to less than half a megabyte.
[Scott David Daniels]
> held = list(range(1))
> ...
>troublesome_dict[string] = held[number_to_hold]
> ...
More generally, but a bit slower, for objects usable as dict keys,
change code of the form:
x = whatever_you_do_to_get_a_new_object()
use(x)
to:
x = whatever_you_do_to_get_a_new_object()
x = intern_it(x, x)
use(x)
where `intern_it` is defined like so once at the start of the program:
intern_it = {}.setdefault
This snippet may make the mechanism clearer:
>>> intern_it = {}.setdefault
>>> x = 3000
>>> id(intern_it(x, x))
36166156
>>> x = 1000 + 2000
>>> id(intern_it(x, x))
36166156
>>> x = "works for computed strings too"
>>> id(intern_it(x, x))
27062696
>>> x = "works for computed strings t" + "o" * 2
>>> id(intern_it(x, x))
27062696
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
[Leif Walsh] > ... > It might be a semantic change that I'm looking for here, but it seems > to me that if you turn off the garbage collector, you should be able > to expect that either it also won't run on exit, It won't then, but "the garbage collector" is the gc module, and that only performs /cyclic/ garbage collection. There is no way to stop refcount-based garbage collection. Read my message again. > or it should have a > way of letting you tell it not to run on exit. If I'm running without > a garbage collector, that assumes I'm at least cocky enough to think I > know when I'm done with my objects, so I should know to delete the > objects that have __del__ functions I care about before I exit. Well, > maybe; I'm sure one of you could drag out a programmer that would make > that mistake, but turning off the garbage collector to me seems to > send the experience message, at least a little. This probably isn't a problem with cyclic gc (reread my msg). > Does the garbage collector run any differently when the process is > exiting? No. > It seems that it wouldn't need to do anything more that run > through all objects in the heap and delete them, which doesn't require > anything fancy, Reread my msg -- already explained the likely cause here (if "all the objects in the heap" have in fact been swapped out to disk, it can take an enormously long time to just "run through" them all). > and should be able to sort by address to aid with > caching. That one isn't possible. There is no list of "all objects" to /be/ sorted. The only way to find all the objects is to traverse the object graph from its roots, which is exactly what non-cyclic gc does anyway. > If it's already this fast, then I guess it really is the > sheer number of function calls necessary that are causing such a > slowdown in the cases we've seen, but I find this hard to believe. My guess remains that CPU usage is trivial here, and 99.99+% of the wall-clock time is consumed waiting for disk reads. Either that, or that platform malloc is going nuts. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having hug e (45G) dict (python 2.5.2)
Leif Walsh gmail.com> writes: > > It might be a semantic change that I'm looking for here, but it seems > to me that if you turn off the garbage collector, you should be able > to expect that either it also won't run on exit, or it should have a > way of letting you tell it not to run on exit. [...] I'm skeptical that it's a garbage collector problem. The script creates one dict containing lots of strings and ints. The thing is, strings and ints aren't tracked by the GC as they are simple atomic objects. Therefore, the /only/ object created by the script which is tracked by the GC is the dict. Moreover, since there is no cycle created, the dict should be directly destroyed when its last reference dies (the "del" statement), not go through the garbage collection process. Given that the problem is reproduced on certain systems and not others, it can be related to an interaction between allocation patterns of the dict implementation, the Python memory allocator, and the implementation of the C malloc() / free() functions. I'm no expert enough to find out more on the subject. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
It appears that this bug was already reported: http://bugs.python.org/issue4705 Any chance that it gets in the next 3.0.x bugfix release? Just as a note, if I do: sys.stdout._line_buffering = True, it also works, but doesn't seem right as it's accessing an internal attribute. Note 2: the solution that said to pass 'wb' does not work, because I need the output as text and not binary or text becomes garbled when it's not ascii. Thanks, Fabio On Fri, Dec 19, 2008 at 9:03 PM, Guido van Rossum wrote: > Fror truly unbuffered text output you'd have to make changes to the > io.TextIOWrapper class to flush after each write() call. That's an API > change -- the constructor currently has a line_buffering option but no > option for completely unbuffered mode. It would also require some > changes to io.open() which currently rejects buffering=0 in text mode. > All that suggests that it should wait until 3.1. > > However it might make sense to at least turn on line buffering when -u > or PYTHONUNBUFFERED is given; that doesn't require API changes and so > can be considered a bug fix. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > On Fri, Dec 19, 2008 at 2:47 PM, Antoine Pitrou wrote: >> >>> Well, ``python -h`` still lists it. >> >> Precisely, it says: >> >> -u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x >> see man page for details on internal buffering relating to '-u' >> >> Note the "binary". And indeed: >> >> ./python -u >> Python 3.1a0 (py3k:67839M, Dec 18 2008, 17:56:54) >> [GCC 4.3.2] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. > import sys > sys.stdout.buffer.write(b"y") >> y1 > >> >> I don't know what it would take to enable unbuffered text IO while keeping >> the >> current TextIOWrapper implementation... >> >> Regards >> >> Antoine. >> >> >> ___ >> Python-Dev mailing list >> [email protected] >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> http://mail.python.org/mailman/options/python-dev/guido%40python.org >> > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fabiofz%40gmail.com > ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] VM imaging based launch optimizations for CPython?
> Any opinions? I would use a different marshal implementation. Instead of defining a stream format for marshal, make marshal dump its graph of objects along with the actual memory layout. On load, copying can be avoided; just a few pointers need to be updated. The resulting marshal files would be platform-specific (wrt. endianness and pointer width). On marshaling, you copy all objects into a contiguous block of memory (8-aligned), and dump that. On unmarshaling, you just map that block. If the target supports true memory mapping with page boundaries, you might be able to store multiple .pyc files into a single page. This reformatting could be done offline also. A few things need to be considered: - compatibility. The original marshal code would probably need to be preserved for the "marshal" module. - relative pointers. Code objects, tuples, etc. contain pointers. Assuming the marshaled object cannot be loaded back into the same address, you need to adjust pointers. A common trick is to put a desired load address into the memory block, then try to load into that address. If the address is already taken, load into a different address, and walk though all objects, adjusting pointers. - type references. On loading, you will need to patch all ob_type fields. Put the marshal codes into the ob_type field on marshalling, then switch on unmarshalling. - references to interned strings. On loading, you can either intern them all, or you have a "fast interning" algorithm that assigns a fixed table of interned-string numbers. - reference counting. Make sure all these objects start out with a reference count of 1, so they will never become garbage. If you use a container file for multiple .pyc files, you can have additional savings by sharing strings across modules; this should help in particular for reference to builtin symbols, and for common method names. A fixed interning might become unnecessary as the unique single string object in the container will either become the interned string itself, or point it it after being interned once. With such a container system, unmarshalling should be lazy; e.g. for each object, the value of ob_type can be used to determine whether the object was unmarshalled. Of course, you still have the actual interpretation of the top-level module code - if it's not the marshalling but this part that actually costs performance, this efficient marshalling algorithm won't help. It would be interesting to find out which modules have a particularly high startup cost - perhaps they can be rewritten. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
On Sat, Dec 20, 2008 at 2:50 PM, M.-A. Lemburg wrote: > If you want a really fast exit, try this: > > import os > os.kill(os.getpid(), 9) > > But you better know what you're doing if you take this approach... This would work, but I think os._exit(EX_OK) is probably just as fast, and allows you to control the exit status... ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
>> I will try next week to see if I can come up with a smaller, >> submittable example. Thanks. > > These long exit times are usually caused by the garbage collection > of objects. This can be a very time consuming task. I doubt that. The long exit times are usually caused by a bad malloc implementation. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] 2.6.1 documentation not available for download
Python 2.6.1 documentation currently isn't available for download at: http://docs.python.org/ftp/python/doc/ Additionally please include version numbers in documentation archives (e.g. python-docs-html-2.6.1.tar.bz2). -- Arfrever Frehtes Taifersar Arahesis signature.asc Description: This is a digitally signed message part. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] 2.6.1 license
It might be helpful if http://www.python.org/download/releases/2.6.1/license/ said it was also the official license for the 2.6.1 release (though I don't suppose it matters that it's still called the 2.5 license, since that's its origin). Another detail to go into the release manage PEP? regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Antoine Pitrou wrote: > Leif Walsh gmail.com> writes: >> It might be a semantic change that I'm looking for here, but it seems >> to me that if you turn off the garbage collector, you should be able >> to expect that either it also won't run on exit, or it should have a >> way of letting you tell it not to run on exit. > [...] > > I'm skeptical that it's a garbage collector problem. The script creates one > dict > containing lots of strings and ints. The thing is, strings and ints aren't > tracked by the GC as they are simple atomic objects. Therefore, the /only/ > object created by the script which is tracked by the GC is the dict. Moreover, > since there is no cycle created, the dict should be directly destroyed when > its > last reference dies (the "del" statement), not go through the garbage > collection > process. > > Given that the problem is reproduced on certain systems and not others, it can > be related to an interaction between allocation patterns of the dict > implementation, the Python memory allocator, and the implementation of the C > malloc() / free() functions. I'm no expert enough to find out more on the > subject. > I believe the OP engendered a certain amount of confusion by describing object deallocation as being performed by the garbage collector. So he perhaps didn't understand that even decref'ing all the objects only referenced by the dict will take a huge amount of time unless there's enough real memory to hold it. regards Steve -- Steve Holden+1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.6.1 documentation not available for download
On Sat, Dec 20, 2008 at 4:28 PM, Arfrever Frehtes Taifersar Arahesis wrote: > Python 2.6.1 documentation currently isn't available for download at: > http://docs.python.org/ftp/python/doc/ It is avaiable here, though: http://www.python.org/ftp/python/doc/current/ > > Additionally please include version numbers in documentation > archives (e.g. python-docs-html-2.6.1.tar.bz2). > -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.6.1 license
On Sat, Dec 20, 2008 at 4:37 PM, Steve Holden wrote: > It might be helpful if > > http://www.python.org/download/releases/2.6.1/license/ > > said it was also the official license for the 2.6.1 release (though I > don't suppose it matters that it's still called the 2.5 license, since > that's its origin). I've updated the website and the PEP. -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.6.1 documentation not available for download
2008-12-20 23:46:15 Benjamin Peterson napisał(a): > On Sat, Dec 20, 2008 at 4:28 PM, Arfrever Frehtes Taifersar Arahesis > wrote: > > Python 2.6.1 documentation currently isn't available for download at: > > http://docs.python.org/ftp/python/doc/ > > It is avaiable here, though: > > http://www.python.org/ftp/python/doc/current/ I need documentation created from the 'r261' tag, not from the HEAD of the 'release26-maint' branch. -- Arfrever Frehtes Taifersar Arahesis signature.asc Description: This is a digitally signed message part. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?
On Sat, Dec 20, 2008 at 13:45, Fabio Zadrozny wrote: > It appears that this bug was already reported: > http://bugs.python.org/issue4705 > > Any chance that it gets in the next 3.0.x bugfix release? > > Just as a note, if I do: sys.stdout._line_buffering = True, it also > works, but doesn't seem right as it's accessing an internal attribute. > > Note 2: the solution that said to pass 'wb' does not work, because I > need the output as text and not binary or text becomes garbled when > it's not ascii. > Can't you decode the bytes after you receive them? -Brett > Thanks, > > Fabio > > On Fri, Dec 19, 2008 at 9:03 PM, Guido van Rossum wrote: >> Fror truly unbuffered text output you'd have to make changes to the >> io.TextIOWrapper class to flush after each write() call. That's an API >> change -- the constructor currently has a line_buffering option but no >> option for completely unbuffered mode. It would also require some >> changes to io.open() which currently rejects buffering=0 in text mode. >> All that suggests that it should wait until 3.1. >> >> However it might make sense to at least turn on line buffering when -u >> or PYTHONUNBUFFERED is given; that doesn't require API changes and so >> can be considered a bug fix. >> >> --Guido van Rossum (home page: http://www.python.org/~guido/) >> >> >> >> On Fri, Dec 19, 2008 at 2:47 PM, Antoine Pitrou wrote: >>> Well, ``python -h`` still lists it. >>> >>> Precisely, it says: >>> >>> -u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x >>> see man page for details on internal buffering relating to '-u' >>> >>> Note the "binary". And indeed: >>> >>> ./python -u >>> Python 3.1a0 (py3k:67839M, Dec 18 2008, 17:56:54) >>> [GCC 4.3.2] on linux2 >>> Type "help", "copyright", "credits" or "license" for more information. >> import sys >> sys.stdout.buffer.write(b"y") >>> y1 >> >>> >>> I don't know what it would take to enable unbuffered text IO while keeping >>> the >>> current TextIOWrapper implementation... >>> >>> Regards >>> >>> Antoine. >>> >>> >>> ___ >>> Python-Dev mailing list >>> [email protected] >>> http://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: >>> http://mail.python.org/mailman/options/python-dev/guido%40python.org >>> >> ___ >> Python-Dev mailing list >> [email protected] >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> http://mail.python.org/mailman/options/python-dev/fabiofz%40gmail.com >> > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Steve Holden holdenweb.com> writes: > I believe the OP engendered a certain amount of confusion by describing > object deallocation as being performed by the garbage collector. So he > perhaps didn't understand that even decref'ing all the objects only > referenced by the dict will take a huge amount of time unless there's > enough real memory to hold it. He said he has 64GB RAM so I assume all his working set was in memory, not swapped out. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
On Fri, Dec 19, 2008 at 6:29 PM, Mike Coleman wrote: > I have a program that creates a huge (45GB) defaultdict. (The keys > are short strings, the values are short lists of pairs (string, int).) > Nothing but possibly the strings and ints is shared. > > That is, after executing the final statement (a print), it is apparently > spending a > huge amount of time cleaning up before exiting. > I have done 'gc.disable()' for performance (which is hideous without it)--I > have > no reason to think there are any loops. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
[Sorry, for the previous garbage post.] > On Fri, Dec 19, 2008 at 6:29 PM, Mike Coleman wrote: > I have a program that creates a huge (45GB) defaultdict. (The keys > are short strings, the values are short lists of pairs (string, int).) > Nothing but possibly the strings and ints is shared. Could you give us more information about the dictionary. For example, how many objects does it contain? Is 45GB the actual size of the dictionary or of the Python process? > That is, after executing the final statement (a print), it is apparently > spending a huge amount of time cleaning up before exiting. Most of this time is probably spent on DECREF'ing objects in the dictionary. As other mentioned, it would useful to have self-contained example to examine the behavior more closely. > I have done 'gc.disable()' for performance (which is hideous without it)--I > have no reason to think there are any loops. Have you seen any significant difference in the exit time when the cyclic GC is disabled or enabled? -- Alexandre ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Tim Peters wrote: > If that is the case here, there's no evident general solution. If you > have millions of objects still alive at exit, refcount-based > reclamation has to visit all of them, and if they've been swapped out > to disk it can take a very long time to swap them all back into memory > again. In that case, it sounds like using os._exit() to get out of the program without visiting all that memory *is* the right answer (or as right an answer as is available at least). Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia --- ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
[email protected] wrote: > > Steve> Unfortunately there are doubtless programs out there that do rely > Steve> on actions being taken at shutdown. > > Indeed. I believe any code which calls atexit.register. > > Steve> Maybe os.exit() could be more widely advertised, though ... > > That would be os._exit(). Calling it avoids calls to exit functions > registered with atexit.register(). I believe it is both safe, and > reasonable programming practice for modules to register exit functions. > Both the logging and multiprocessing modules call it. It's incumbent on the > application programmer to know these details of the modules the app uses > (perhaps indirectly) to know whether or not it's safe/wise to call > os._exit(). You could call sys.exitfunc() just before os._exit(). -Andrew. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Call PyType_Ready on builtin types during interpreter startup?
Aahz wrote: > On Sat, Dec 20, 2008, Nick Coghlan wrote: >> It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of >> the builtin types - they're left to have it called implicitly when an >> operation using them needs tp_dict filled in. > > This seems like a release blocker for 3.0.1 to me The problem isn't actually as bad as I first thought (it turns out most of the builtin types *are* fully initialised in _Py_ReadyTypes, which is called from Py_InitializeEx). However, xrange/range are definitely missing from that function (which is the actual proximate cause of the strange range() hashing behaviour in Py3k), and I'm still hoping someone knows why the numeric types aren't being readied there when certain parts of the core need additional handling to cope with the possibility that those types aren't fully initialised (e.g. PyObject_Format has a lazy call to PyType_Ready with a comment noting that it may be asked to format floating point numbers before PyType_Ready has otherwise been called for the float type). That said, I have still added the range() hashing problem to the list of release blockers. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia --- ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Tim, I left out some details that I believe probably rule out the "swapped out" theory. The machine in question has 64GB RAM, but only 16GB swap. I'd prefer more swap, but in any case only around ~400MB of the swap was actually in use during my program's entire run. Furthermore, during my program's exit, it was using 100% CPU, and I'm 95% sure there was no significant "system" or "wait" CPU time for the system. (All observations via 'top'.) So, I think that the problem is entirely a computational one within this process. The system does have 8 CPUs. I'm not sure about it's memory architecture, but if it's some kind of NUMA box, I guess access to memory could be slower than what we'd normally expect. I'm skeptical about that being a significant factor here, though. Just to clarify, I didn't gc.disable() to address this problem, but rather because it destroys performance during the creation of the huge dict. I don't have a specific number, but I think disabling gc reduced construction from something like 70 minutes to 5 (or maybe 10). Quite dramatic. Mike >From Tim Peters: BTW, the original poster should try this: use whatever tools the OS supplies to look at CPU and disk usage during the long exit. What I /expect/ is that almost no CPU time is being used, while the disk is grinding itself to dust. That's what happens when a large number of objects have been swapped out to disk, and exit processing has to page them all back into memory again (in order to decrement their refcounts). ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Re "held" and "intern_it": Haha! That's evil and extremely evil, respectively. :-) I will add these to the Python wiki if they're not already there... Mike ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
On Sat, Dec 20, 2008 at 4:11 PM, Tim Peters wrote: > [Lots of answers] Thanks. Wish I could have offered something useful. -- Cheers, Leif ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having hug e (45G) dict (python 2.5.2)
Mike Coleman gmail.com> writes: > > Just to clarify, I didn't gc.disable() to address this problem, but > rather because it destroys performance during the creation of the huge > dict. I don't have a specific number, but I think disabling gc > reduced construction from something like 70 minutes to 5 (or maybe > 10). Quite dramatic. There's a pending patch which should fix that problem: http://bugs.python.org/issue4074 Regards Antoine. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti wrote: > Could you give us more information about the dictionary. For example, > how many objects does it contain? Is 45GB the actual size of the > dictionary or of the Python process? The 45G was the VM size of the process (resident size was similar). The dict keys were all uppercase alpha strings of length 7. I don't have access at the moment, but maybe something like 10-100M of them (not sure how redundant the set is). The values are all lists of pairs, where each pair is a (string, int). The pair strings are of length around 30, and drawn from a "small" fixed set of around 60K strings (). As mentioned previously, I think the ints are drawn pretty uniformly from something like range(1). The length of the lists depends on the redundancy of the key set, but I think there are around 100-200M pairs total, for the entire dict. (If you're curious about the application domain, see 'http://greylag.org'.) > Have you seen any significant difference in the exit time when the > cyclic GC is disabled or enabled? Unfortunately, with GC enabled, the application is too slow to be useful, because of the greatly increased time for dict creation. I suppose it's theoretically possible that with this increased time, the long time for exit will look less bad by comparison, but I'd be surprised if it makes any difference at all. I'm confident that there are no loops in this dict, and nothing for cyclic gc to collect. Mike ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having hug e (45G) dict (python 2.5.2)
Mike Coleman gmail.com> writes: > > The 45G was the VM size of the process (resident size was similar). Can you reproduce it with a smaller working set? Something between 1 and 2GB, possibly randomly-generated, and post both the generation script and the problematic script on the bug tracker? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.6.1 documentation not available for download
On Sat, Dec 20, 2008 at 5:02 PM, Arfrever Frehtes Taifersar Arahesis wrote: > 2008-12-20 23:46:15 Benjamin Peterson napisał(a): >> On Sat, Dec 20, 2008 at 4:28 PM, Arfrever Frehtes Taifersar Arahesis >> wrote: >> > Python 2.6.1 documentation currently isn't available for download at: >> > http://docs.python.org/ftp/python/doc/ >> >> It is avaiable here, though: >> >> http://www.python.org/ftp/python/doc/current/ > > I need documentation created from the 'r261' tag, not from the HEAD of > the 'release26-maint' branch. I've made documentation for 2.6.1 now. It's at http://www.python.org/ftp/python/doc/2.6.1 > -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python 3.0.1
4631 should be a release blocker. I'll have a bit of time on Monday and Tuesday to wrap it up. Jeremy On Fri, Dec 19, 2008 at 5:28 PM, Barry Warsaw wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > I'd like to get Python 3.0.1 out before the end of the year. There are no > showstoppers, but I haven't yet looked at the deferred blockers or the > buildbots. > > Do you think we can get 3.0.1 out on December 24th? Or should we wait until > after Christmas and get it out, say on the 29th? Do we need an rc? > > This question goes mostly to Martin and Georg. What would work for you > guys? > > - -Barry > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.9 (Darwin) > > iQCVAwUBSUwgEXEjvBPtnXfVAQIthgP7BDS6xfBHhADKc50ANvZ5aAfWhGSU9GH/ > DR+IRduVmvosu9gm92hupCOaLCN4IbtyFx27A8LQuPNVc4BVrhWfDKDSzpxO2MJu > xLJntkF2BRWODSbdrLGdZ6H6WDT0ZAhn6ZjlWXwxhGxQ5FwEJb7moMuY7jAIEeor > 5n6Ag5zT+e8= > =oU/g > -END PGP SIGNATURE- > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu > ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)
Mike Coleman wrote: Andrew, this is on an (intel) x86_64 box with 64GB of RAM. I don't recall the maker or details of the architecture off the top of my head, but it would be something "off the rack" from Dell or maybe HP. There were other users on the box at the time, but nothing heavy or that gave me any reason to think was affecting my program. It's running CentOS 5 I think, so that might make glibc several years old. Your malloc idea sounds plausible to me. If it is a libc problem, it would be nice if there was some way we could tell malloc to "live for today because there is no tomorrow" in the terminal phase of the program. I'm not sure exactly how to attack this. Callgrind is cool, but no way will work on something this size. Timed ltrace output might be interesting. Or maybe a gprof'ed Python, though that's more work. Some malloc()s (notably FreeBSD's) can be externally tuned at runtime via options in environment variables or other mechanisms - the malloc man page on your system might be helpful if your platform has something like this. It is likely that PyMalloc would be better with a way to disable the free()ing of empty arenas, or move to an arrangement where (like the various type free-lists in 2.6+) explicit action can force pruning of empty arenas - there are other usage patterns than yours which would benefit (performance wise) from not freeing arenas automatically. -- - Andrew I MacIntyre "These thoughts are mine alone..." E-mail: [email protected] (pref) | Snail: PO Box 370 [email protected] (alt) |Belconnen ACT 2616 Web:http://www.andymac.org/ |Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
