Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Kristján Valur Jónsson
Can you distill the program into something reproducible?
Maybe with something slightly less than 45Gb but still exhibiting some 
degradation of exit performance?
I can try to point our commercial profiling tools at it and see what it is 
doing.
K

-Original Message-
From: [email protected] 
[mailto:[email protected]] On Behalf Of Mike 
Coleman
Sent: 19. desember 2008 23:30
To: [email protected]
Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict 
(python 2.5.2)

I have a program that creates a huge (45GB) defaultdict.  (The keys
are short strings, the values are short lists of pairs (string, int).)
 Nothing but possibly the strings and ints is shared.

The program takes around 10 minutes to run, but longer than 20 minutes
to exit (I gave up at that point).  That is, after executing the final
statement (a print), it is apparently spending a huge amount of time
cleaning up before exiting.  I haven't installed any exit handlers or
anything like that, all files are already closed and stdout/stderr
flushed, and there's nothing special going on.  I have done
'gc.disable()' for performance (which is hideous without it)--I have
no reason to think there are any loops.

Currently I am working around this by doing an os._exit(), which is
immediate, but this seems like a bit of hack.  Is this something that
needs fixing, or that has already been fixed?

Mike
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/kristjan%40ccpgames.com

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program havin g huge (45G) dict (python 2.5.2)

2008-12-20 Thread Steven D'Aprano
On Sat, 20 Dec 2008 09:02:38 pm Kristján Valur Jónsson wrote:

> Can you distill the program into something reproducible?
> Maybe with something slightly less than 45Gb but still exhibiting
> some degradation of exit performance? I can try to point our
> commercial profiling tools at it and see what it is doing. K

In November 2007, a similar problem was reported on the comp.lang.python 
newsgroup. 370MB was large enough to demonstrate the problem. I don't 
know if a bug was ever reported.

The thread starts here:
http://mail.python.org/pipermail/python-list/2007-November/465498.html

or if you prefer Google Groups:
http://preview.tinyurl.com/97xsso

and it describes extremely long times to populate and destroy large 
dicts even with garbage collection turned off.

My summary at the time was:

"On systems with multiple CPUs or 64-bit systems, or both, creating 
and/or deleting a multi-megabyte dictionary in recent versions of 
Python (2.3, 2.4, 2.5 at least) takes a LONG time, of the order of 30+ 
minutes, compared to seconds if the system only has a single CPU. 
Turning garbage collection off doesn't help."

I make no guarantee that the above is a correct description of the 
problem, only that this is what I believed at the time.

I'm afraid it is a very long thread, with multiple red herrings, lots of 
people unable to reproduce the problem, and the usual nonsense that 
happens on comp.lang.python.

I was originally one of the skeptics until I reproduced the original 
posters problem. I generated a sample file 8 million key/value pairs as 
a 370MB text file. Reading it into a dict took two and a half minutes 
on my relatively slow computer. But deleting the dict took more than 30 
minutes even with garbage collection switched off. Sample code 
reproducing the problem on my machine is here:

http://mail.python.org/pipermail/python-list/2007-November/465513.html

According to this post of mine:

http://mail.python.org/pipermail/python-list/2007-November/466209.html

deleting 8 million (key, value) pairs stored as a list of tuples was 
very fast. It was only if they were stored as a dict that deleting it 
was horribly slow.

Please note that other people have tried and failed to replicate the 
problem. I suspect the fault (if it is one, and not human error) is 
specific to some combinations of Python version and hardware.

Even if this is a Will Not Fix, I'd be curious if anyone else can 
reproduce the problem.

Hope this is helpful,

Steven.



> -Original Message-
> From: [email protected]
> [mailto:[email protected]] On
> Behalf Of Mike Coleman Sent: 19. desember 2008 23:30
> To: [email protected]
> Subject: [Python-Dev] extremely slow exit for program having huge
> (45G) dict (python 2.5.2)
>
> I have a program that creates a huge (45GB) defaultdict.  (The keys
> are short strings, the values are short lists of pairs (string,
> int).) Nothing but possibly the strings and ints is shared.
>
> The program takes around 10 minutes to run, but longer than 20
> minutes to exit (I gave up at that point).  That is, after executing
> the final statement (a print), it is apparently spending a huge
> amount of time cleaning up before exiting.  I haven't installed any
> exit handlers or anything like that, all files are already closed and
> stdout/stderr flushed, and there's nothing special going on.  I have
> done
> 'gc.disable()' for performance (which is hideous without it)--I have
> no reason to think there are any loops.
>
> Currently I am working around this by doing an os._exit(), which is
> immediate, but this seems like a bit of hack.  Is this something that
> needs fixing, or that has already been fixed?
>
> Mike




-- 
Steven D'Aprano
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Andrew MacIntyre

Mike Coleman wrote:

I have a program that creates a huge (45GB) defaultdict.  (The keys
are short strings, the values are short lists of pairs (string, int).)
 Nothing but possibly the strings and ints is shared.

The program takes around 10 minutes to run, but longer than 20 minutes
to exit (I gave up at that point).  That is, after executing the final
statement (a print), it is apparently spending a huge amount of time
cleaning up before exiting.  I haven't installed any exit handlers or
anything like that, all files are already closed and stdout/stderr
flushed, and there's nothing special going on.  I have done
'gc.disable()' for performance (which is hideous without it)--I have
no reason to think there are any loops.

Currently I am working around this by doing an os._exit(), which is
immediate, but this seems like a bit of hack.  Is this something that
needs fixing, or that has already been fixed?


You don't mention the platform, but...

This behaviour was not unknown in the distant past, with much smaller
datasets.  Most of the problems then related to the platform malloc()
doing funny things as stuff was free()ed, like coalescing free space.

[I once sat and watched a Python script run in something like 30 seconds
 and then take nearly 10 minutes to terminate, as you describe (Python
 2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of
 hundred MB of memory - the Solaris 2.5 malloc() had some undesirable
 properties from Python's point of view]

PyMalloc effectively removed this as an issue for most cases and platform
malloc()s have also become considerably more sophisticated since then,
but I wonder whether the sheer size of your dataset is unmasking related
issues.

Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus
accumulates (2.3 & 2.4 never free()ed arenas).  Your platform malloc()
might have odd behaviour with 45GB of arenas returned to it piecemeal.
This is something that could be checked with a small C program.
Calling os._exit() circumvents the free()ing of the arenas.

Also consider that, with the exception of small integers (-1..256), no
interning of integers is done.  If your data contains large quantities
of integers with non-unique values (that aren't in the small integer
range) you may find it useful to do your own interning.

--
-
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: [email protected]  (pref) | Snail: PO Box 370
   [email protected] (alt) |Belconnen ACT 2616
Web:http://www.andymac.org/   |Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Steve Holden
Andrew MacIntyre wrote:
> Mike Coleman wrote:
>> I have a program that creates a huge (45GB) defaultdict.  (The keys
>> are short strings, the values are short lists of pairs (string, int).)
>>  Nothing but possibly the strings and ints is shared.
>>
>> The program takes around 10 minutes to run, but longer than 20 minutes
>> to exit (I gave up at that point).  That is, after executing the final
>> statement (a print), it is apparently spending a huge amount of time
>> cleaning up before exiting.  I haven't installed any exit handlers or
>> anything like that, all files are already closed and stdout/stderr
>> flushed, and there's nothing special going on.  I have done
>> 'gc.disable()' for performance (which is hideous without it)--I have
>> no reason to think there are any loops.
>>
>> Currently I am working around this by doing an os._exit(), which is
>> immediate, but this seems like a bit of hack.  Is this something that
>> needs fixing, or that has already been fixed?
> 
> You don't mention the platform, but...
> 
> This behaviour was not unknown in the distant past, with much smaller
> datasets.  Most of the problems then related to the platform malloc()
> doing funny things as stuff was free()ed, like coalescing free space.
> 
> [I once sat and watched a Python script run in something like 30 seconds
>  and then take nearly 10 minutes to terminate, as you describe (Python
>  2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of
>  hundred MB of memory - the Solaris 2.5 malloc() had some undesirable
>  properties from Python's point of view]
> 
> PyMalloc effectively removed this as an issue for most cases and platform
> malloc()s have also become considerably more sophisticated since then,
> but I wonder whether the sheer size of your dataset is unmasking related
> issues.
> 
> Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus
> accumulates (2.3 & 2.4 never free()ed arenas).  Your platform malloc()
> might have odd behaviour with 45GB of arenas returned to it piecemeal.
> This is something that could be checked with a small C program.
> Calling os._exit() circumvents the free()ing of the arenas.
> 
> Also consider that, with the exception of small integers (-1..256), no
> interning of integers is done.  If your data contains large quantities
> of integers with non-unique values (that aren't in the small integer
> range) you may find it useful to do your own interning.
> 
It's a pity a simplistic approach that redefines all space reclamation
activities as null functions won't work. I hate to think of all the
cycles that are being wasted reclaiming space just because a program has
terminated, when in fact an os.exit() call would work just as well from
the user's point of view.

Unfortunately there are doubtless programs out there that do rely on
actions being taken at shutdown.

Maybe os.exit() could be more widely advertised, though ...

regards
 Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Distutils maintenance

2008-12-20 Thread Georg Brandl
Benjamin Peterson schrieb:
> On Fri, Dec 19, 2008 at 12:55 PM, Tarek Ziadé  wrote:
>> Hello
>>
>> I would like to request a commit access to work specifically on
>> distutils maintenance.
> 
> +1
> 
> We are currently without an active distutils maintainer, and many
> stale distutil tickets are in need of attention I'm sure Tarek could
> provide. Tarek has also been providing many useful patches of his own.

FWIW, +1.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.0.1

2008-12-20 Thread Georg Brandl
Barry Warsaw schrieb:
> I'd like to get Python 3.0.1 out before the end of the year.  There
> are no showstoppers, but I haven't yet looked at the deferred blockers
> or the buildbots.
> 
> Do you think we can get 3.0.1 out on December 24th?  Or should we wait
> until after Christmas and get it out, say on the 29th?  Do we need an
> rc?
> 
> This question goes mostly to Martin and Georg.  What would work for
> you guys?

Since the 24th is the most important Christmas day around here, I'll not
be available then :)

Either 23rd or 29th is fine with me.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread skip

Steve> Unfortunately there are doubtless programs out there that do rely
Steve> on actions being taken at shutdown.

Indeed.  I believe any code which calls atexit.register.

Steve> Maybe os.exit() could be more widely advertised, though ...

That would be os._exit().  Calling it avoids calls to exit functions
registered with atexit.register().  I believe it is both safe, and
reasonable programming practice for modules to register exit functions.
Both the logging and multiprocessing modules call it.  It's incumbent on the
application programmer to know these details of the modules the app uses
(perhaps indirectly) to know whether or not it's safe/wise to call
os._exit().

-- 
Skip Montanaro - [email protected] - http://smontanaro.dyndns.org/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Call PyType_Ready on builtin types during interpreter startup?

2008-12-20 Thread Aahz
On Sat, Dec 20, 2008, Nick Coghlan wrote:
> 
> It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of
> the builtin types - they're left to have it called implicitly when an
> operation using them needs tp_dict filled in.

This seems like a release blocker for 3.0.1 to me
-- 
Aahz ([email protected])   <*> http://www.pythoncraft.com/

"It is easier to optimize correct code than to correct optimized code."
--Bill Harlan
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Mike Coleman
On Sat, Dec 20, 2008 at 4:02 AM, Kristján Valur Jónsson
 wrote:
> Can you distill the program into something reproducible?
> Maybe with something slightly less than 45Gb but still exhibiting some 
> degradation of exit performance?
> I can try to point our commercial profiling tools at it and see what it is 
> doing.

I will try next week to see if I can come up with a smaller,
submittable example.  Thanks.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Mike Coleman
Andrew, this is on an (intel) x86_64 box with 64GB of RAM.  I don't
recall the maker or details of the architecture off the top of my
head, but it would be something "off the rack" from Dell or maybe HP.
There were other users on the box at the time, but nothing heavy or
that gave me any reason to think was affecting my program.

It's running CentOS 5 I think, so that might make glibc several years
old.  Your malloc idea sounds plausible to me.  If it is a libc
problem, it would be nice if there was some way we could tell malloc
to "live for today because there is no tomorrow" in the terminal phase
of the program.

I'm not sure exactly how to attack this.  Callgrind is cool, but no
way will work on something this size.  Timed ltrace output might be
interesting.  Or maybe a gprof'ed Python, though that's more work.

Regarding interning, I thought this only worked with strings.  Is
there some way to intern integers?  I'm probably creating 300M
integers more or less uniformly distributed across range(1).

Mike





On Sat, Dec 20, 2008 at 4:08 AM, Andrew MacIntyre
 wrote:
> Mike Coleman wrote:
>>
>> I have a program that creates a huge (45GB) defaultdict.  (The keys
>> are short strings, the values are short lists of pairs (string, int).)
>>  Nothing but possibly the strings and ints is shared.
>>
>> The program takes around 10 minutes to run, but longer than 20 minutes
>> to exit (I gave up at that point).  That is, after executing the final
>> statement (a print), it is apparently spending a huge amount of time
>> cleaning up before exiting.  I haven't installed any exit handlers or
>> anything like that, all files are already closed and stdout/stderr
>> flushed, and there's nothing special going on.  I have done
>> 'gc.disable()' for performance (which is hideous without it)--I have
>> no reason to think there are any loops.
>>
>> Currently I am working around this by doing an os._exit(), which is
>> immediate, but this seems like a bit of hack.  Is this something that
>> needs fixing, or that has already been fixed?
>
> You don't mention the platform, but...
>
> This behaviour was not unknown in the distant past, with much smaller
> datasets.  Most of the problems then related to the platform malloc()
> doing funny things as stuff was free()ed, like coalescing free space.
>
> [I once sat and watched a Python script run in something like 30 seconds
>  and then take nearly 10 minutes to terminate, as you describe (Python
>  2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of
>  hundred MB of memory - the Solaris 2.5 malloc() had some undesirable
>  properties from Python's point of view]
>
> PyMalloc effectively removed this as an issue for most cases and platform
> malloc()s have also become considerably more sophisticated since then,
> but I wonder whether the sheer size of your dataset is unmasking related
> issues.
>
> Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus
> accumulates (2.3 & 2.4 never free()ed arenas).  Your platform malloc()
> might have odd behaviour with 45GB of arenas returned to it piecemeal.
> This is something that could be checked with a small C program.
> Calling os._exit() circumvents the free()ing of the arenas.
>
> Also consider that, with the exception of small integers (-1..256), no
> interning of integers is done.  If your data contains large quantities
> of integers with non-unique values (that aren't in the small integer
> range) you may find it useful to do your own interning.
>
> --
> -
> Andrew I MacIntyre "These thoughts are mine alone..."
> E-mail: [email protected]  (pref) | Snail: PO Box 370
>   [email protected] (alt) |Belconnen ACT 2616
> Web:http://www.andymac.org/   |Australia
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Scott David Daniels

Mike Coleman wrote:
... Regarding interning, I thought this only worked with strings. 
Is there some way to intern integers?  I'm probably creating 300M

integers more or less uniformly distributed across range(1)?


held = list(range(1))
...
troublesome_dict[string] = held[number_to_hold]
...

--Scott David Daniels
[email protected]

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Kristján Valur Jónsson
You can always try poor-man's profiling, which is surprisingly useful in the 
face of massive performance problems.
Just attach a debugger to the program, and when it suffering from a performance 
problem, break the execution on a regular basis. You are statistically very 
likely to get a callstack representative of the problem you are having.
Do this a few times and you will get a fair impression of what the program is 
spending its time on.
>From the debugger, you can also examine the python callstack of the program by 
>examinging the 'f' local variable in the Frame Evaluation function.

Have fun,

K

-Original Message-
From: [email protected] 
[mailto:[email protected]] On Behalf Of Mike 
Coleman
Sent: 20. desember 2008 17:09
To: Andrew MacIntyre
Cc: Python Dev
Subject: Re: [Python-Dev] extremely slow exit for program having huge (45G) 
dict (python 2.5.2)


I'm not sure exactly how to attack this.  Callgrind is cool, but no
way will work on something this size.  Timed ltrace output might be
interesting.  Or maybe a gprof'ed Python, though that's more work.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] VM imaging based launch optimizations for CPython?

2008-12-20 Thread Mikko Ohtamaa
Hi fellow snakemen and lizard ladies,

We have been recently done lots of Python work on Nokia Series 60 phones and
even managed to roll out some commercial Python based applications. In the
future we plan to create some iPhone Python apps also.

Python runs fine in phones - after it has been launched. Currently the
biggest issue preventing the world dominance of Python based mobile
applications is the start up time. We cope with the issue by using fancy
splash screens and progress indicators, but it does't cure the fact that it
takes a minute to show the main user interface of the application. Most of
the time is spend in import executing opcodes and forming function and class
structures in memory - something which cannot be easily boosted.

Now, we have been thinking. Maemo has fork() based Python launcher (
http://blogs.gnome.org/johan/2007/01/18/introducing-python-launcher/) which
greatly speed ups the start up time by holding Python in memory all the
time. We cannot afford such luxury on Symbian and iPhone, since we do not
control the operating system. So how about this

1. A Python application is launched normally

2. After VM has initialized module importing and reached a static launch
state (meaning that the state is same on every launch) the VM state is
written on to disk

3. Application continues execution and starts doing dynamic stuff

4. On the following launches, special init code is used which directly blits
VM image from disk back to memory and we have reached the static state again
without going whoops of executing import related opcodes

5. Also, I have heard a suggestion that VM image could be defragmented and
analyzed offline

Any opinions?

Cheers,
Mikko


-- 
Mikko Ohtamaa
Red Innovation Ltd.
Oulu, Finland
http://www.redinnovation.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having hug e (45G) dict (python 2.5.2)

2008-12-20 Thread Antoine Pitrou
Steven D'Aprano  pearwood.info> writes:
> 
> In November 2007, a similar problem was reported on the comp.lang.python 
> newsgroup. 370MB was large enough to demonstrate the problem. I don't 
> know if a bug was ever reported.

Do you still reproduce it on trunk?
I've tried your scripts on my machine and they work fine, even if I leave
garbage collecting enabled during the process.
(dual core 64-bit machine but in 32-bit mode)



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread M.-A. Lemburg
On 2008-12-20 17:57, Mike Coleman wrote:
> On Sat, Dec 20, 2008 at 4:02 AM, Kristján Valur Jónsson
>  wrote:
>> Can you distill the program into something reproducible?
>> Maybe with something slightly less than 45Gb but still exhibiting some 
>> degradation of exit performance?
>> I can try to point our commercial profiling tools at it and see what it is 
>> doing.
> 
> I will try next week to see if I can come up with a smaller,
> submittable example.  Thanks.

These long exit times are usually caused by the garbage collection
of objects. This can be a very time consuming task.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 20 2008)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2008-12-02: Released mxODBC.Connect 1.0.0  http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Leif Walsh
On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg  wrote:
> These long exit times are usually caused by the garbage collection
> of objects. This can be a very time consuming task.

In that case, the question would be "why is the interpreter collecting
garbage when it knows we're trying to exit anyway?".

-- 
Cheers,
Leif
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Michael Foord

Leif Walsh wrote:

On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg  wrote:
  

These long exit times are usually caused by the garbage collection
of objects. This can be a very time consuming task.



In that case, the question would be "why is the interpreter collecting
garbage when it knows we're trying to exit anyway?".

  


Because finalizers are only called when an object is destroyed presumably.

Michael

--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread skip

Leif> In that case, the question would be "why is the interpreter
Leif> collecting garbage when it knows we're trying to exit anyway?".

Because useful side effects are sometimes performed as a result of this
activity (flushing disk buffers, closing database connections, etc).

Skip
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Tim Peters
[M.-A. Lemburg]
>> These long exit times are usually caused by the garbage collection
>> of objects. This can be a very time consuming task.

[Leif Walsh]
> In that case, the question would be "why is the interpreter collecting
> garbage when it knows we're trying to exit anyway?".

Because user-defined destructors (like __del__ methods and weakref
callbacks) may be associated with garbage, and users presumably want
those to execute.  Doing so requires identifying identifying garbage
and releasing it, same as if the interpreter didn't happen to be
exiting.

BTW, the original poster should try this:  use whatever tools the OS
supplies to look at CPU and disk usage during the long exit.  What I
/expect/ is that almost no CPU time is being used, while the disk is
grinding itself to dust.  That's what happens when a large number of
objects have been swapped out to disk, and exit processing has to page
them all back into memory again (in order to decrement their
refcounts).  Python's cyclic gc (the `gc` module) has nothing to do
with this -- it's typically the been-there-forever refcount-based
non-cyclic gc that accounts for supernaturally long exit times.

If that is the case here, there's no evident general solution.  If you
have millions of objects still alive at exit, refcount-based
reclamation has to visit all of them, and if they've been swapped out
to disk it can take a very long time to swap them all back into memory
again.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread M.-A. Lemburg
On 2008-12-20 21:20, Leif Walsh wrote:
> On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg  wrote:
>> These long exit times are usually caused by the garbage collection
>> of objects. This can be a very time consuming task.
> 
> In that case, the question would be "why is the interpreter collecting
> garbage when it knows we're trying to exit anyway?".

It cannot know until the very end, because there may still be
some try: ... except SystemExit: ... somewhere in the code
waiting to trigger and stop the system exit.

If you want a really fast exit, try this:

import os
os.kill(os.getpid(), 9)

But you better know what you're doing if you take this approach...

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 20 2008)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2008-12-02: Released mxODBC.Connect 1.0.0  http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Leif Walsh
(@Skip, Michael, Tim)

On Sat, Dec 20, 2008 at 3:26 PM,   wrote:
> Because useful side effects are sometimes performed as a result of this
> activity (flushing disk buffers, closing database connections, etc).

Of course they are.  But what about the case given above:

On Sat, Dec 20, 2008 at 5:55 AM, Steven D'Aprano  wrote:
> I was originally one of the skeptics until I reproduced the original
> posters problem. I generated a sample file 8 million key/value pairs as
> a 370MB text file. Reading it into a dict took two and a half minutes
> on my relatively slow computer. But deleting the dict took more than 30
> minutes even with garbage collection switched off.

It might be a semantic change that I'm looking for here, but it seems
to me that if you turn off the garbage collector, you should be able
to expect that either it also won't run on exit, or it should have a
way of letting you tell it not to run on exit.  If I'm running without
a garbage collector, that assumes I'm at least cocky enough to think I
know when I'm done with my objects, so I should know to delete the
objects that have __del__ functions I care about before I exit.  Well,
maybe; I'm sure one of you could drag out a programmer that would make
that mistake, but turning off the garbage collector to me seems to
send the experience message, at least a little.

Does the garbage collector run any differently when the process is
exiting?  It seems that it wouldn't need to do anything more that run
through all objects in the heap and delete them, which doesn't require
anything fancy, and should be able to sort by address to aid with
caching.  If it's already this fast, then I guess it really is the
sheer number of function calls necessary that are causing such a
slowdown in the cases we've seen, but I find this hard to believe.

-- 
Cheers,
Leif
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Tim Peters
[Mike Coleman]
>> ... Regarding interning, I thought this only worked with strings.

Implementation details.  Recent versions of CPython also, e.g.,
"intern" the empty tuple, and very small integers.

>> Is there some way to intern integers?  I'm probably creating 300M
>> integers more or less uniformly distributed across range(1)?

Interning would /vastly/ reduce memory use for ints in that case, from
gigabytes down to less than half a megabyte.


[Scott David Daniels]
> held = list(range(1))
> ...
>troublesome_dict[string] = held[number_to_hold]
> ...

More generally, but a bit slower, for objects usable as dict keys,
change code of the form:

x = whatever_you_do_to_get_a_new_object()
use(x)

to:

x = whatever_you_do_to_get_a_new_object()
x = intern_it(x, x)
use(x)

where `intern_it` is defined like so once at the start of the program:

intern_it = {}.setdefault

This snippet may make the mechanism clearer:

>>> intern_it = {}.setdefault
>>> x = 3000
>>> id(intern_it(x, x))
36166156
>>> x = 1000 + 2000
>>> id(intern_it(x, x))
36166156
>>> x = "works for computed strings too"
>>> id(intern_it(x, x))
27062696
>>> x = "works for computed strings t" + "o" * 2
>>> id(intern_it(x, x))
27062696
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Tim Peters
[Leif Walsh]
> ...
> It might be a semantic change that I'm looking for here, but it seems
> to me that if you turn off the garbage collector, you should be able
> to expect that either it also won't run on exit,

It won't then, but "the garbage collector" is the gc module, and that
only performs /cyclic/ garbage collection.  There is no way to stop
refcount-based garbage collection.  Read my message again.


> or it should have a
> way of letting you tell it not to run on exit.  If I'm running without
> a garbage collector, that assumes I'm at least cocky enough to think I
> know when I'm done with my objects, so I should know to delete the
> objects that have __del__ functions I care about before I exit.  Well,
> maybe; I'm sure one of you could drag out a programmer that would make
> that mistake, but turning off the garbage collector to me seems to
> send the experience message, at least a little.

This probably isn't a problem with cyclic gc (reread my msg).


> Does the garbage collector run any differently when the process is
> exiting?

No.


> It seems that it wouldn't need to do anything more that run
> through all objects in the heap and delete them, which doesn't require
> anything fancy,

Reread my msg -- already explained the likely cause here (if "all the
objects in the heap" have in fact been swapped out to disk, it can
take an enormously long time to just "run through" them all).


> and should be able to sort by address to aid with
> caching.

That one isn't possible.  There is no list of "all objects" to /be/
sorted.  The only way to find all the objects is to traverse the
object graph from its roots, which is exactly what non-cyclic gc does
anyway.


>  If it's already this fast, then I guess it really is the
> sheer number of function calls necessary that are causing such a
> slowdown in the cases we've seen, but I find this hard to believe.

My guess remains that CPU usage is trivial here, and 99.99+% of the
wall-clock time is consumed waiting for disk reads.  Either that, or
that platform malloc is going nuts.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having hug e (45G) dict (python 2.5.2)

2008-12-20 Thread Antoine Pitrou
Leif Walsh  gmail.com> writes:
> 
> It might be a semantic change that I'm looking for here, but it seems
> to me that if you turn off the garbage collector, you should be able
> to expect that either it also won't run on exit, or it should have a
> way of letting you tell it not to run on exit. 
[...]

I'm skeptical that it's a garbage collector problem. The script creates one dict
containing lots of strings and ints. The thing is, strings and ints aren't
tracked by the GC as they are simple atomic objects. Therefore, the /only/
object created by the script which is tracked by the GC is the dict. Moreover,
since there is no cycle created, the dict should be directly destroyed when its
last reference dies (the "del" statement), not go through the garbage collection
process.

Given that the problem is reproduced on certain systems and not others, it can
be related to an interaction between allocation patterns of the dict
implementation, the Python memory allocator, and the implementation of the C
malloc() / free() functions. I'm no expert enough to find out more on the
subject.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?

2008-12-20 Thread Fabio Zadrozny
It appears that this bug was already reported: http://bugs.python.org/issue4705

Any chance that it gets in the next 3.0.x bugfix release?

Just as a note, if I do: sys.stdout._line_buffering = True, it also
works, but doesn't seem right as it's accessing an internal attribute.

Note 2: the solution that said to pass 'wb' does not work, because I
need the output as text and not binary or text becomes garbled when
it's not ascii.

Thanks,

Fabio

On Fri, Dec 19, 2008 at 9:03 PM, Guido van Rossum  wrote:
> Fror truly unbuffered text output you'd have to make changes to the
> io.TextIOWrapper class to flush after each write() call. That's an API
> change -- the constructor currently has a line_buffering option but no
> option for completely unbuffered mode. It would also require some
> changes to io.open() which currently rejects buffering=0 in text mode.
> All that suggests that it should wait until 3.1.
>
> However it might make sense to at least turn on line buffering when -u
> or PYTHONUNBUFFERED is given; that doesn't require API changes and so
> can be considered a bug fix.
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>
>
> On Fri, Dec 19, 2008 at 2:47 PM, Antoine Pitrou  wrote:
>>
>>> Well, ``python -h`` still lists it.
>>
>> Precisely, it says:
>>
>> -u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x
>> see man page for details on internal buffering relating to '-u'
>>
>> Note the "binary". And indeed:
>>
>> ./python -u
>> Python 3.1a0 (py3k:67839M, Dec 18 2008, 17:56:54)
>> [GCC 4.3.2] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
> import sys
> sys.stdout.buffer.write(b"y")
>> y1
>
>>
>> I don't know what it would take to enable unbuffered text IO while keeping 
>> the
>> current TextIOWrapper implementation...
>>
>> Regards
>>
>> Antoine.
>>
>>
>> ___
>> Python-Dev mailing list
>> [email protected]
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: 
>> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/fabiofz%40gmail.com
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] VM imaging based launch optimizations for CPython?

2008-12-20 Thread Martin v. Löwis
> Any opinions?

I would use a different marshal implementation. Instead of defining
a stream format for marshal, make marshal dump its graph of objects
along with the actual memory layout. On load, copying can
be avoided; just a few pointers need to be updated. The resulting
marshal files would be platform-specific (wrt. endianness and pointer
width).

On marshaling, you copy all objects into a contiguous block
of memory (8-aligned), and dump that. On unmarshaling, you just
map that block. If the target supports true memory mapping with
page boundaries, you might be able to store multiple .pyc files
into a single page. This reformatting could be done offline
also.

A few things need to be considered:
- compatibility. The original marshal code would probably
  need to be preserved for the "marshal" module.
- relative pointers. Code objects, tuples, etc. contain
  pointers. Assuming the marshaled object cannot be loaded
  back into the same address, you need to adjust pointers.
  A common trick is to put a desired load address into the
  memory block, then try to load into that address. If the
  address is already taken, load into a different address,
  and walk though all objects, adjusting pointers.
- type references. On loading, you will need to patch all
  ob_type fields. Put the marshal codes into the ob_type
  field on marshalling, then switch on unmarshalling.
- references to interned strings. On loading, you can
  either intern them all, or you have a "fast interning"
  algorithm that assigns a fixed table of interned-string
  numbers.
- reference counting. Make sure all these objects start
  out with a reference count of 1, so they will never
  become garbage.

If you use a container file for multiple .pyc files,
you can have additional savings by sharing strings
across modules; this should help in particular for
reference to builtin symbols, and for common method
names. A fixed interning might become unnecessary as
the unique single string object in the container will
either become the interned string itself, or point it
it after being interned once.
With such a container system, unmarshalling should be
lazy; e.g. for each object, the value of ob_type can
be used to determine whether the object was
unmarshalled.

Of course, you still have the actual interpretation of
the top-level module code - if it's not the marshalling
but this part that actually costs performance, this
efficient marshalling algorithm won't help. It would be
interesting to find out which modules have a particularly
high startup cost - perhaps they can be rewritten.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Mike Coleman
On Sat, Dec 20, 2008 at 2:50 PM, M.-A. Lemburg  wrote:
> If you want a really fast exit, try this:
>
> import os
> os.kill(os.getpid(), 9)
>
> But you better know what you're doing if you take this approach...

This would work, but I think os._exit(EX_OK) is probably just as fast,
and allows you to control the exit status...
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Martin v. Löwis
>> I will try next week to see if I can come up with a smaller,
>> submittable example.  Thanks.
> 
> These long exit times are usually caused by the garbage collection
> of objects. This can be a very time consuming task.

I doubt that. The long exit times are usually caused by a bad
malloc implementation.

Regards,
Martin

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] 2.6.1 documentation not available for download

2008-12-20 Thread Arfrever Frehtes Taifersar Arahesis
Python 2.6.1 documentation currently isn't available for download at:
http://docs.python.org/ftp/python/doc/

Additionally please include version numbers in documentation
archives (e.g. python-docs-html-2.6.1.tar.bz2).

-- 
Arfrever Frehtes Taifersar Arahesis


signature.asc
Description: This is a digitally signed message part.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] 2.6.1 license

2008-12-20 Thread Steve Holden
It might be helpful if

  http://www.python.org/download/releases/2.6.1/license/

said it was also the official license for the 2.6.1 release (though I
don't suppose it matters that it's still called the 2.5 license, since
that's its origin).

Another detail to go into the release manage PEP?

regards
 Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Steve Holden
Antoine Pitrou wrote:
> Leif Walsh  gmail.com> writes:
>> It might be a semantic change that I'm looking for here, but it seems
>> to me that if you turn off the garbage collector, you should be able
>> to expect that either it also won't run on exit, or it should have a
>> way of letting you tell it not to run on exit. 
> [...]
> 
> I'm skeptical that it's a garbage collector problem. The script creates one 
> dict
> containing lots of strings and ints. The thing is, strings and ints aren't
> tracked by the GC as they are simple atomic objects. Therefore, the /only/
> object created by the script which is tracked by the GC is the dict. Moreover,
> since there is no cycle created, the dict should be directly destroyed when 
> its
> last reference dies (the "del" statement), not go through the garbage 
> collection
> process.
> 
> Given that the problem is reproduced on certain systems and not others, it can
> be related to an interaction between allocation patterns of the dict
> implementation, the Python memory allocator, and the implementation of the C
> malloc() / free() functions. I'm no expert enough to find out more on the
> subject.
> 
I believe the OP engendered a certain amount of confusion by describing
object deallocation as being performed by the garbage collector. So he
perhaps didn't understand that even decref'ing all the objects only
referenced by the dict will take a huge amount of time unless there's
enough real memory to hold it.

regards
 Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.6.1 documentation not available for download

2008-12-20 Thread Benjamin Peterson
On Sat, Dec 20, 2008 at 4:28 PM, Arfrever Frehtes Taifersar Arahesis
 wrote:
> Python 2.6.1 documentation currently isn't available for download at:
> http://docs.python.org/ftp/python/doc/

It is avaiable here, though:

http://www.python.org/ftp/python/doc/current/

>
> Additionally please include version numbers in documentation
> archives (e.g. python-docs-html-2.6.1.tar.bz2).

>



-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.6.1 license

2008-12-20 Thread Benjamin Peterson
On Sat, Dec 20, 2008 at 4:37 PM, Steve Holden  wrote:
> It might be helpful if
>
>  http://www.python.org/download/releases/2.6.1/license/
>
> said it was also the official license for the 2.6.1 release (though I
> don't suppose it matters that it's still called the 2.5 license, since
> that's its origin).

I've updated the website and the PEP.



-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.6.1 documentation not available for download

2008-12-20 Thread Arfrever Frehtes Taifersar Arahesis
2008-12-20 23:46:15 Benjamin Peterson napisał(a):
> On Sat, Dec 20, 2008 at 4:28 PM, Arfrever Frehtes Taifersar Arahesis
>  wrote:
> > Python 2.6.1 documentation currently isn't available for download at:
> > http://docs.python.org/ftp/python/doc/
> 
> It is avaiable here, though:
> 
> http://www.python.org/ftp/python/doc/current/

I need documentation created from the 'r261' tag, not from the HEAD of
the 'release26-maint' branch.

-- 
Arfrever Frehtes Taifersar Arahesis


signature.asc
Description: This is a digitally signed message part.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Can't have unbuffered text I/O in Python 3.0?

2008-12-20 Thread Brett Cannon
On Sat, Dec 20, 2008 at 13:45, Fabio Zadrozny  wrote:
> It appears that this bug was already reported: 
> http://bugs.python.org/issue4705
>
> Any chance that it gets in the next 3.0.x bugfix release?
>
> Just as a note, if I do: sys.stdout._line_buffering = True, it also
> works, but doesn't seem right as it's accessing an internal attribute.
>
> Note 2: the solution that said to pass 'wb' does not work, because I
> need the output as text and not binary or text becomes garbled when
> it's not ascii.
>

Can't you decode the bytes after you receive them?

-Brett

> Thanks,
>
> Fabio
>
> On Fri, Dec 19, 2008 at 9:03 PM, Guido van Rossum  wrote:
>> Fror truly unbuffered text output you'd have to make changes to the
>> io.TextIOWrapper class to flush after each write() call. That's an API
>> change -- the constructor currently has a line_buffering option but no
>> option for completely unbuffered mode. It would also require some
>> changes to io.open() which currently rejects buffering=0 in text mode.
>> All that suggests that it should wait until 3.1.
>>
>> However it might make sense to at least turn on line buffering when -u
>> or PYTHONUNBUFFERED is given; that doesn't require API changes and so
>> can be considered a bug fix.
>>
>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>>
>>
>>
>> On Fri, Dec 19, 2008 at 2:47 PM, Antoine Pitrou  wrote:
>>>
 Well, ``python -h`` still lists it.
>>>
>>> Precisely, it says:
>>>
>>> -u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x
>>> see man page for details on internal buffering relating to '-u'
>>>
>>> Note the "binary". And indeed:
>>>
>>> ./python -u
>>> Python 3.1a0 (py3k:67839M, Dec 18 2008, 17:56:54)
>>> [GCC 4.3.2] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>> import sys
>> sys.stdout.buffer.write(b"y")
>>> y1
>>
>>>
>>> I don't know what it would take to enable unbuffered text IO while keeping 
>>> the
>>> current TextIOWrapper implementation...
>>>
>>> Regards
>>>
>>> Antoine.
>>>
>>>
>>> ___
>>> Python-Dev mailing list
>>> [email protected]
>>> http://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe: 
>>> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>
>> ___
>> Python-Dev mailing list
>> [email protected]
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: 
>> http://mail.python.org/mailman/options/python-dev/fabiofz%40gmail.com
>>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Antoine Pitrou
Steve Holden  holdenweb.com> writes:
> I believe the OP engendered a certain amount of confusion by describing
> object deallocation as being performed by the garbage collector. So he
> perhaps didn't understand that even decref'ing all the objects only
> referenced by the dict will take a huge amount of time unless there's
> enough real memory to hold it.

He said he has 64GB RAM so I assume all his working set was in memory, not
swapped out.



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Alexandre Vassalotti
On Fri, Dec 19, 2008 at 6:29 PM, Mike Coleman  wrote:
> I have a program that creates a huge (45GB) defaultdict.  (The keys
> are short strings, the values are short lists of pairs (string, int).)
>  Nothing but possibly the strings and ints is shared.
>



> That is, after executing the final statement (a print), it is apparently 
> spending a
> huge amount of time cleaning up before exiting.


> I have done 'gc.disable()' for performance (which is hideous without it)--I 
> have
> no reason to think there are any loops.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Alexandre Vassalotti
[Sorry, for the previous garbage post.]

> On Fri, Dec 19, 2008 at 6:29 PM, Mike Coleman  wrote:
> I have a program that creates a huge (45GB) defaultdict.  (The keys
> are short strings, the values are short lists of pairs (string, int).)
> Nothing but possibly the strings and ints is shared.

Could you give us more information about the dictionary. For example,
how many objects does it contain? Is 45GB the actual size of the
dictionary or of the Python process?

> That is, after executing the final statement (a print), it is apparently
> spending a huge amount of time cleaning up before exiting.

Most of this time is probably spent on DECREF'ing objects in the
dictionary. As other mentioned, it would useful to have self-contained
example to examine the behavior more closely.

> I have done 'gc.disable()' for performance (which is hideous without it)--I
> have no reason to think there are any loops.

Have you seen any significant difference in the exit time when the
cyclic GC is disabled or enabled?

-- Alexandre
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Nick Coghlan
Tim Peters wrote:
> If that is the case here, there's no evident general solution.  If you
> have millions of objects still alive at exit, refcount-based
> reclamation has to visit all of them, and if they've been swapped out
> to disk it can take a very long time to swap them all back into memory
> again.

In that case, it sounds like using os._exit() to get out of the program
without visiting all that memory *is* the right answer (or as right an
answer as is available at least).

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Andrew Bennetts
[email protected] wrote:
> 
> Steve> Unfortunately there are doubtless programs out there that do rely
> Steve> on actions being taken at shutdown.
> 
> Indeed.  I believe any code which calls atexit.register.
> 
> Steve> Maybe os.exit() could be more widely advertised, though ...
> 
> That would be os._exit().  Calling it avoids calls to exit functions
> registered with atexit.register().  I believe it is both safe, and
> reasonable programming practice for modules to register exit functions.
> Both the logging and multiprocessing modules call it.  It's incumbent on the
> application programmer to know these details of the modules the app uses
> (perhaps indirectly) to know whether or not it's safe/wise to call
> os._exit().

You could call sys.exitfunc() just before os._exit().

-Andrew.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Call PyType_Ready on builtin types during interpreter startup?

2008-12-20 Thread Nick Coghlan
Aahz wrote:
> On Sat, Dec 20, 2008, Nick Coghlan wrote:
>> It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of
>> the builtin types - they're left to have it called implicitly when an
>> operation using them needs tp_dict filled in.
> 
> This seems like a release blocker for 3.0.1 to me

The problem isn't actually as bad as I first thought (it turns out most
of the builtin types *are* fully initialised in _Py_ReadyTypes, which is
called from Py_InitializeEx). However, xrange/range are definitely
missing from that function (which is the actual proximate cause of the
strange range() hashing  behaviour in Py3k), and I'm still hoping
someone knows why the numeric types aren't being readied there when
certain parts of the core need additional handling to cope with the
possibility that those types aren't fully initialised (e.g.
PyObject_Format has a lazy call to PyType_Ready with a comment noting
that it may be asked to format floating point numbers before
PyType_Ready has otherwise been called for the float type).

That said, I have still added the range() hashing problem to the list of
release blockers.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Mike Coleman
Tim, I left out some details that I believe probably rule out the
"swapped out" theory.  The machine in question has 64GB RAM, but only
16GB swap.  I'd prefer more swap, but in any case only around ~400MB
of the swap was actually in use during my program's entire run.
Furthermore, during my program's exit, it was using 100% CPU, and I'm
95% sure there was no significant "system" or "wait" CPU time for the
system.  (All observations via 'top'.)  So, I think that the problem
is entirely a computational one within this process.

The system does have 8 CPUs.  I'm not sure about it's memory
architecture, but if it's some kind of NUMA box, I guess access to
memory could be slower than what we'd normally expect.  I'm skeptical
about that being a significant factor here, though.

Just to clarify, I didn't gc.disable() to address this problem, but
rather because it destroys performance during the creation of the huge
dict.  I don't have a specific number, but I think disabling gc
reduced construction from something like 70 minutes to 5 (or maybe
10).  Quite dramatic.

Mike


>From Tim Peters:
BTW, the original poster should try this:  use whatever tools the OS
supplies to look at CPU and disk usage during the long exit.  What I
/expect/ is that almost no CPU time is being used, while the disk is
grinding itself to dust.  That's what happens when a large number of
objects have been swapped out to disk, and exit processing has to page
them all back into memory again (in order to decrement their
refcounts).
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Mike Coleman
Re "held" and "intern_it":  Haha!  That's evil and extremely evil,
respectively.  :-)

I will add these to the Python wiki if they're not already there...

Mike
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Leif Walsh
On Sat, Dec 20, 2008 at 4:11 PM, Tim Peters  wrote:
> [Lots of answers]

Thanks.  Wish I could have offered something useful.

-- 
Cheers,
Leif
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having hug e (45G) dict (python 2.5.2)

2008-12-20 Thread Antoine Pitrou
Mike Coleman  gmail.com> writes:
> 
> Just to clarify, I didn't gc.disable() to address this problem, but
> rather because it destroys performance during the creation of the huge
> dict.  I don't have a specific number, but I think disabling gc
> reduced construction from something like 70 minutes to 5 (or maybe
> 10).  Quite dramatic.

There's a pending patch which should fix that problem:
http://bugs.python.org/issue4074

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Mike Coleman
On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti
 wrote:
> Could you give us more information about the dictionary. For example,
> how many objects does it contain? Is 45GB the actual size of the
> dictionary or of the Python process?

The 45G was the VM size of the process (resident size was similar).

The dict keys were all uppercase alpha strings of length 7.  I don't
have access at the moment, but maybe something like 10-100M of them
(not sure how redundant the set is).  The values are all lists of
pairs, where each pair is a (string, int).  The pair strings are of
length around 30, and drawn from a "small" fixed set of around 60K
strings ().  As mentioned previously, I think the ints are drawn
pretty uniformly from something like range(1).  The length of the
lists depends on the redundancy of the key set, but I think there are
around 100-200M pairs total, for the entire dict.

(If you're curious about the application domain, see 'http://greylag.org'.)

> Have you seen any significant difference in the exit time when the
> cyclic GC is disabled or enabled?

Unfortunately, with GC enabled, the application is too slow to be
useful, because of the greatly increased time for dict creation.  I
suppose it's theoretically possible that with this increased time, the
long time for exit will look less bad by comparison, but I'd be
surprised if it makes any difference at all.  I'm confident that there
are no loops in this dict, and nothing for cyclic gc to collect.

Mike
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having hug e (45G) dict (python 2.5.2)

2008-12-20 Thread Antoine Pitrou
Mike Coleman  gmail.com> writes:
> 
> The 45G was the VM size of the process (resident size was similar).

Can you reproduce it with a smaller working set? Something between 1 and 2GB,
possibly randomly-generated, and post both the generation script and the
problematic script on the bug tracker?



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.6.1 documentation not available for download

2008-12-20 Thread Benjamin Peterson
On Sat, Dec 20, 2008 at 5:02 PM, Arfrever Frehtes Taifersar Arahesis
 wrote:
> 2008-12-20 23:46:15 Benjamin Peterson napisał(a):
>> On Sat, Dec 20, 2008 at 4:28 PM, Arfrever Frehtes Taifersar Arahesis
>>  wrote:
>> > Python 2.6.1 documentation currently isn't available for download at:
>> > http://docs.python.org/ftp/python/doc/
>>
>> It is avaiable here, though:
>>
>> http://www.python.org/ftp/python/doc/current/
>
> I need documentation created from the 'r261' tag, not from the HEAD of
> the 'release26-maint' branch.

I've made documentation for 2.6.1 now. It's at
http://www.python.org/ftp/python/doc/2.6.1
>



-- 
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3.0.1

2008-12-20 Thread Jeremy Hylton
4631 should be a release blocker.  I'll have a bit of time on Monday
and Tuesday to wrap it up.

Jeremy

On Fri, Dec 19, 2008 at 5:28 PM, Barry Warsaw  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> I'd like to get Python 3.0.1 out before the end of the year.  There are no
> showstoppers, but I haven't yet looked at the deferred blockers or the
> buildbots.
>
> Do you think we can get 3.0.1 out on December 24th?  Or should we wait until
> after Christmas and get it out, say on the 29th?  Do we need an rc?
>
> This question goes mostly to Martin and Georg.  What would work for you
> guys?
>
> - -Barry
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.9 (Darwin)
>
> iQCVAwUBSUwgEXEjvBPtnXfVAQIthgP7BDS6xfBHhADKc50ANvZ5aAfWhGSU9GH/
> DR+IRduVmvosu9gm92hupCOaLCN4IbtyFx27A8LQuPNVc4BVrhWfDKDSzpxO2MJu
> xLJntkF2BRWODSbdrLGdZ6H6WDT0ZAhn6ZjlWXwxhGxQ5FwEJb7moMuY7jAIEeor
> 5n6Ag5zT+e8=
> =oU/g
> -END PGP SIGNATURE-
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

2008-12-20 Thread Andrew MacIntyre

Mike Coleman wrote:

Andrew, this is on an (intel) x86_64 box with 64GB of RAM.  I don't
recall the maker or details of the architecture off the top of my
head, but it would be something "off the rack" from Dell or maybe HP.
There were other users on the box at the time, but nothing heavy or
that gave me any reason to think was affecting my program.

It's running CentOS 5 I think, so that might make glibc several years
old.  Your malloc idea sounds plausible to me.  If it is a libc
problem, it would be nice if there was some way we could tell malloc
to "live for today because there is no tomorrow" in the terminal phase
of the program.

I'm not sure exactly how to attack this.  Callgrind is cool, but no
way will work on something this size.  Timed ltrace output might be
interesting.  Or maybe a gprof'ed Python, though that's more work.


Some malloc()s (notably FreeBSD's) can be externally tuned at runtime
via options in environment variables or other mechanisms - the malloc
man page on your system might be helpful if your platform has something
like this.

It is likely that PyMalloc would be better with a way to disable the
free()ing of empty arenas, or move to an arrangement where (like the
various type free-lists in 2.6+) explicit action can force pruning of
empty arenas - there are other usage patterns than yours which would
benefit (performance wise) from not freeing arenas automatically.

--
-
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: [email protected]  (pref) | Snail: PO Box 370
   [email protected] (alt) |Belconnen ACT 2616
Web:http://www.andymac.org/   |Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com