Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Bob Ippolito
On Jan 3, 2005, at 2:16 AM, Tim Peters wrote:
[Bob Ippolito]
...
Your expectation is not correct for Darwin's memory allocation scheme.
It seems that Darwin creates allocations of immutable size.  The only
way ANY part of an allocation will ever be used by ANYTHING else is if
free() is called with that allocation.
Ya, I understood that.  My conclusion was that Darwin's realloc()
implementation isn't production-quality.  So it goes.
Whatever that means.
 free() can be called either explicitly, or implicitly by calling 
realloc() with
a size larger than the size of the allocation.  In that case, it will 
create a new
allocation of at least the requested size, copy the contents of the
original allocation into the new allocation (probably with
copy-on-write pages if it's large enough, so it might be cheap), and
free() the allocation.
Really?  Another near-universal "quality of implementation"
expectation is that a growing realloc() will strive to extend
in-place.  Like realloc(malloc(100), 101).  For example, the
theoretical guarantee that one-at-a-time list.append() has amortized
linear time doesn't depend on that, but pragmatically it's greatly
helped by a reasonable growing realloc() implementation.
I said that it created allocations of fixed size, not that it created 
allocations of exactly the size you asked it to.  Yes, it will extend 
in-place for many cases, including the given.

 In the case where realloc() specifies a size that is not greater 
than the
allocation's size, it will simply return the given allocation and 
cause no side-
effects whatsoever.

Was this a good decision?  Probably not!
Sounds more like a bug (or two) to me than "a decision", but I don't 
know.
You said yourself that it is standards compliant ;)  I have filed it as 
a bug, but it is probably unlikely to be backported to current versions 
of Mac OS X unless a case can be made that it is indeed a security 
flaw.

 However, it is our (in the "I know you use Windows but I am not the 
only
one that uses Mac OS X sense) problem so long as Darwin is a supported
platform, because it is highly unlikely that Apple will backport any 
"fix" to
the allocator unless we can prove it has some security implications in
software shipped with their OS. ...
Is there any known case where Python performs poorly on this OS, for
this reason, other than the "pass giant numbers to recv() and then
shrink the string because we didn't get anywhere near that many bytes"
case?  Claiming rampant performance problems should require evidence
too .
Known case?  No.  Do I want to search Python application-space to find 
one?  No.

Presumably this can happen at other places (including third party
extensions), so a better place to do this might be _PyString_Resize().
list_resize() is another reasonable place to put this.  I'm sure there
are other places that use realloc() too, and the majority of them do
this through obmalloc.  So maybe instead of trying to track down all
the places where this can manifest, we should just "gunk up" Python 
and
patch PyObject_Realloc()?
There is no "choke point" for allocations in Python -- some places
call the system realloc() directly.  Maybe the latter matter on Darwin
too, but maybe they don't.  The scope of this hack spreads if they do.
 I have no idea how often realloc() is called directly by 3rd-party
extension modules.  It's called directly a lot in Zope's C code, but
AFAICT only to grow vectors, never to shrink them.
In the case of Python, "some places" means "nowhere relevant".  Four 
standard library extension modules relevant to the platform use realloc 
directly:

_sre
Uses realloc only to grow buffers.
cPickle
Uses realloc only to grow buffers.
cStringIO
Uses realloc only to grow buffers.
regexpr:
Uses realloc only to grow buffers.
If Zope doesn't use the allocator that Python gives it, then it can 
deal with its own problems.  I would expect most extensions to use 
Python's allocator.

Since we are both pretty confident that other allocators aren't like 
Darwin,
this "gunk" can be #ifdef'ed to the __APPLE__ case.
#ifdef's are a last resort:  they almost never go away, so they
complicate the code forever after, and typically stick around for
years even after the platform problems they intended to address have
been fixed.  For obvious reasons, they're also an endless source of
platform-specific bugs.
They're also the only good way to deal with platform-specific 
inconsistencies.  In this specific case, it's not even possible to 
determine if a particular allocator implementation is stupid or not 
without at least using a platform-allocator-specific function to query 
the size reserved by a given allocation.

Note that pymalloc already does a memcpy+free when in
PyObject_Realloc(p, n) p was obtained from the system malloc or
realloc but n is small enough to meet the "small object" threshold
(pymalloc "takes over" small blocks that result from a
PyObject_Realloc()).  That's a reasonable strategy *b

Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Guido van Rossum
Coming late to this thread.

I don't see the point of lying awake at night worrying about potential
memory losses unless you've heard someone complain about it. As Tim
has been trying to explain, here are plenty of other things in Python
that we *could* speed up if there was a need; since every speedup
uglifies the code somewhat, we'd end up with very ugly code if we did
them all. Remember, don't optimize prematurely.

Here's one theoretical reason why even with socket.recv() it probably
doesn't matter in practice: the overallocated string will usually be
freed as soon as the data has been parsed from it, and this will free
the overallocation as well!

OTOH, if you want to do more research, checking the usage patterns for
StringRealloc and TupleRealloc would be useful. I could imagine code
in either that makes a copy if the new size is less than some fraction
of the old size. Most code that I recall writing using these tends to
start with a guaranteed-to-fit overallocation, and a single resize at
the end.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Bob Ippolito
On Jan 3, 2005, at 11:15 AM, Guido van Rossum wrote:
Coming late to this thread.
I don't see the point of lying awake at night worrying about potential
memory losses unless you've heard someone complain about it. As Tim
has been trying to explain, here are plenty of other things in Python
that we *could* speed up if there was a need; since every speedup
uglifies the code somewhat, we'd end up with very ugly code if we did
them all. Remember, don't optimize prematurely.
We *have* had someone complain about it: http://python.org/sf/1092502
Here's one theoretical reason why even with socket.recv() it probably
doesn't matter in practice: the overallocated string will usually be
freed as soon as the data has been parsed from it, and this will free
the overallocation as well!
That depends on how socket.recv is used.  Sometimes, a list of strings 
is used rather than a cStringIO (or equivalent), which can cause 
problems (see above referenced bug).

-bob
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: Zipfile needs?

2005-01-03 Thread Scott David Daniels
Brett C. wrote:
Scott David Daniels wrote:
I'm hoping to add BZIP2 compression to zipfile for 2.5.  My primary
motivation is that Project Gutenberg seems to be starting to use BZIP2
compression for some of its zips.  What other wish list things do
people around here have for zipfile?  I thought I'd collect input here
and make a PEP.
Encryption/decryption support.  Will most likely require a C extension 
since the algorithm relies on ints (or longs, don't remember) wrapping 
around when the value becomes too large.
I'm trying to use byte-block streams (iterators taking iterables) as
the basic structure of getting data in and out.  I think the encryption/
decryption can then be plugged in at the right point.  If it can be set
up properly, you can import the encryption separately and connect it to
zipfiles with a call.  Would this address what you want?  I believe
there is an issue actually building in the encryption/decryption in
terms of redistribution.
--
-- Scott David Daniels
[EMAIL PROTECTED]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread bacchusrx
On Thu, Jan 01, 1970 at 12:00:00AM +, Tim Peters wrote:
> Is there any known case where Python performs poorly on this OS, for
> this reason, other than the "pass giant numbers to recv() and then
> shrink the string because we didn't get anywhere near that many bytes"
> case?
> 
> [...]
> 
> I agree the socket-abuse case should be fiddled, and for more reasons
> than just Darwin's realloc() quirks. [...] Yes, in the socket-abuse
> case, where the program routinely malloc()s strings millions of bytes
> larger than the socket can deliver, it would obviously help.  That's
> not typically program behavior (however typical it may be of that
> specific app).

Note that, with respect to http://python.org/sf/1092502, the author of
the (original) program was using the documented interface to a file
object.  It's _fileobject.read() that decides to ask for huge numbers of
bytes from recv() (specifically, in the max(self._rbufsize, left)
condition). Patched to use a fixed recv_size, you of course sidestep the
realloc() nastiness in this particular case.

bacchusrx.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Bob Ippolito
On Jan 3, 2005, at 3:23 PM, bacchusrx wrote:
On Thu, Jan 01, 1970 at 12:00:00AM +, Tim Peters wrote:
Is there any known case where Python performs poorly on this OS, for
this reason, other than the "pass giant numbers to recv() and then
shrink the string because we didn't get anywhere near that many bytes"
case?
[...]
I agree the socket-abuse case should be fiddled, and for more reasons
than just Darwin's realloc() quirks. [...] Yes, in the socket-abuse
case, where the program routinely malloc()s strings millions of bytes
larger than the socket can deliver, it would obviously help.  That's
not typically program behavior (however typical it may be of that
specific app).
Note that, with respect to http://python.org/sf/1092502, the author of
the (original) program was using the documented interface to a file
object.  It's _fileobject.read() that decides to ask for huge numbers 
of
bytes from recv() (specifically, in the max(self._rbufsize, left)
condition). Patched to use a fixed recv_size, you of course sidestep 
the
realloc() nastiness in this particular case.
While using a reasonably sized recv_size is a good idea, using a 
smaller request size simply means that it's less likely that the 
strings will be significantly resized.  It is still highly likely they 
*will* be resized and that doesn't solve the problem that 
over-allocated strings will persist until the entire request is 
fulfilled.

For example, receiving 1 byte chunks (if that's even possible) would 
exacerbate the issue even for a small request size.  If you asked for 8 
MB with a request size of 1024 bytes, and received it in 1 byte chunks, 
you would need a minimum of an impossible ~16 GB to satisfy that 
request (minimum ~8 GB to collect the strings, minimum ~8 GB to 
concatenate them) as opposed to the Python-optimal case of ~16 MB when 
always using compact representations.

Using cStringIO instead of a list of potentially over-allocated strings 
would actually have such Python-optimal memory usage characteristics on 
all platforms.

-bob
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Andrew P. Lentvorski, Jr.
On Jan 2, 2005, at 11:16 PM, Tim Peters wrote:
[Bob Ippolito]
 However, it is our (in the "I know you use Windows but I am not the 
only
one that uses Mac OS X sense) problem so long as Darwin is a supported
platform, because it is highly unlikely that Apple will backport any 
"fix" to
the allocator unless we can prove it has some security implications in
software shipped with their OS. ...
Is there any known case where Python performs poorly on this OS, for
this reason, other than the "pass giant numbers to recv() and then
shrink the string because we didn't get anywhere near that many bytes"
case?  Claiming rampant performance problems should require evidence
too .
Possibly.  When using the stock btdownloadcurses.py from 
bitconjurer.org,
I occasionally see a memory thrash on OS X.

Normally I have to be in a mode where I am aggregating lots of small
connections (10Kbps or less uploads) into a large download (10Mbps
transfer rate on a >500MB file).  When the file completes, Python sends
OS X into a long-lasting spinning ball of death.  It will emerge after
about 10 minutes or so.
I do not see this same behavior on Linux or FreeBSD.  I never filed a 
bug
because I can't reliably reproduce it (it is dependent upon the upload
characteristics of the torrent swarm).  However, it seems to fit the
bug and diagnosis.

-a
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re: Zipfile needs?

2005-01-03 Thread Brett C.
Scott David Daniels wrote:
Brett C. wrote:
Scott David Daniels wrote:
I'm hoping to add BZIP2 compression to zipfile for 2.5.  My primary
motivation is that Project Gutenberg seems to be starting to use BZIP2
compression for some of its zips.  What other wish list things do
people around here have for zipfile?  I thought I'd collect input here
and make a PEP.
Encryption/decryption support.  Will most likely require a C extension 
since the algorithm relies on ints (or longs, don't remember) wrapping 
around when the value becomes too large.

I'm trying to use byte-block streams (iterators taking iterables) as
the basic structure of getting data in and out.  I think the encryption/
decryption can then be plugged in at the right point.  If it can be set
up properly, you can import the encryption separately and connect it to
zipfiles with a call.  Would this address what you want?  I believe
there is an issue actually building in the encryption/decryption in
terms of redistribution.
Possibly.  Encryption is part of the PKZIP spec so I was just thinking of 
covering that, not adding external encryption support.  It really is not overly 
complex stuff, just will want to do it in C for speed probably as Guido 
suggested (but, as always, I would profile that first to see if performance is 
really that bad).

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Tim Peters
[Tim Peters]
>> Ya, I understood that.  My conclusion was that Darwin's realloc()
>> implementation isn't production-quality.  So it goes.

[Bob Ippolito]
> Whatever that means.

Well, it means what it said.  The C standard says nothing about
performance metrics of any kind, and a production-quality
implementation of C requires very much more than just meeting what the
standard requires.  The phrase "quality of implementation" is used in
the C Rationale (but not in the standard proper) to cover all such
issues.  realloc() pragmatics are quality-of-implementation issues;
the accuracy of fp arithmetic is another (e.g., if you get back -666.0
from the C 1.0 + 2.0, there's nothing in the standard to justify a
complaint).

>>>  free() can be called either explicitly, or implicitly by calling
>>> realloc() with a size larger than the size of the allocation.

>From later comments feigning outrage , I take it that "the size
of the allocation" here does not mean the specific number the user
passed to the previous malloc/realloc call, but means whatever amount
of address space the implementation decided to use internally.  Sorry,
but I assumed it meant the former at first.

...

>>> Was this a good decision?  Probably not!

>> Sounds more like a bug (or two) to me than "a decision", but I don't
>> know.

> You said yourself that it is standards compliant ;)  I have filed it as
> a bug, but it is probably unlikely to be backported to current versions
> of Mac OS X unless a case can be made that it is indeed a security
> flaw.

That's plausible.  If you showed me a case where Python's list.sort()
took cubic time, I'd certainly consider that to be "a bug", despite
that nothing promises better behavior.  If I wrote a malloc subsystem
and somebody pointed out "did you know that when I malloc 1024**2+1
bytes, and then realloc(1), I lose the other megabyte forever?", I'd
consider that to be "a bug" too (because, docs be damned, I wouldn't
intentionally design a malloc subsystem with such behavior; and
pymalloc does in fact copy bytes on a shrinking realloc in blocks it
controls, whenever at least a quarter of the space is given back --
and it didn't at the start, and I considered that to be "a bug" when
it was pointed out).

> ...
> Known case?  No.  Do I want to search Python application-space to find
> one?  No.

Serious problems on a platform are usually well-known to users on that
platform.  For example, it was well-known that Python's list-growing
strategy as of a few years ago fragmented address space horribly on
Win9X.  This was a C quality-of-implementation issue specific to that
platform.  It was eventually resolved by improving the list-growing
strategy on all platforms -- although it's still the case that Win9X
does worse on list-growing than other platforms, it's no longer a
disaster for most list-growing apps on Win9X.

If there's a problem with "overallocate then realloc() to cut back" on
Darwin that affects many apps, then I'd expect Darwin users to know
about that already -- lots of people have used Python on Macs since
Python's beginning, "mysterious slowdowns" and "mysterious bloat" get
noticed, and Darwin has been around for a while.

..

>> There is no "choke point" for allocations in Python -- some places
>> call the system realloc() directly.  Maybe the latter matter on Darwin
>> too, but maybe they don't.  The scope of this hack spreads if they do.

...

> In the case of Python, "some places" means "nowhere relevant".  Four
> standard library extension modules relevant to the platform use realloc
> directly:
> 
> _sre
> Uses realloc only to grow buffers.
> cPickle
> Uses realloc only to grow buffers.
> cStringIO
> Uses realloc only to grow buffers.
> regexpr:
> Uses realloc only to grow buffers.

Good!

> If Zope doesn't use the allocator that Python gives it, then it can
> deal with its own problems.  I would expect most extensions to use
> Python's allocator.

I don't know.

...
 
> They're [#ifdef's] also the only good way to deal with platform-specific
> inconsistencies.  In this specific case, it's not even possible to
> determine if a particular allocator implementation is stupid or not
> without at least using a platform-allocator-specific function to query
> the size reserved by a given allocation.

We've had bad experience on several platforms when passing large
numbers to recv().  If that were addressed, it's unclear that Darwin
realloc() behavior would remain a real issue.  OTOH, it is clear that
*just* worming around Darwin realloc() behavior won't help other
platforms with problems in the same *immediate* area of bug 1092502. 
Gross over-allocation followed by a shrinking realloc() just isn't
common in Python.  sock_recv() is an exceptionally bad case.  More
typical is, e.g., fileobject.c's get_line(), where if "a line" exceed
100 characters the buffer keeps growing by 25% until there's enough
room, then it's cut back once at the end.  That typical use for
shrinking reall

Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread bacchusrx
On Mon, Jan 03, 2005 at 03:55:19PM -0500, Bob Ippolito wrote:
> >Note that, with respect to http://python.org/sf/1092502, the author
> >of the (original) program was using the documented interface to a
> >file object.  It's _fileobject.read() that decides to ask for huge
> >numbers of bytes from recv() (specifically, in the
> >max(self._rbufsize, left) condition). Patched to use a fixed
> >recv_size, you of course sidestep the realloc() nastiness in this
> >particular case.
> 
> While using a reasonably sized recv_size is a good idea, using a
> smaller request size simply means that it's less likely that the
> strings will be significantly resized.  It is still highly likely they
> *will* be resized and that doesn't solve the problem that
> over-allocated strings will persist until the entire request is
> fulfilled.

You're right. I should have said, "you're more likely to get away with
it." The underlying issue still exists. My point is that the problem is
not analogous to the guy who tried to read 2GB directly from a socket
(as in http://python.org/sf/756104). 

Googling for MemoryError exceptions, you can find a number of spurious
problems on Darwin that are probably due to this bug: SpamBayes for
instance, or the thread at

http://mail.python.org/pipermail/python-list/2004-November/250625.html

bacchusrx.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Bob Ippolito
On Jan 3, 2005, at 4:49 PM, Tim Peters wrote:
[Tim Peters]
Ya, I understood that.  My conclusion was that Darwin's realloc()
implementation isn't production-quality.  So it goes.
[Bob Ippolito]
Whatever that means.
Well, it means what it said.  The C standard says nothing about
performance metrics of any kind, and a production-quality
implementation of C requires very much more than just meeting what the
standard requires.  The phrase "quality of implementation" is used in
the C Rationale (but not in the standard proper) to cover all such
issues.  realloc() pragmatics are quality-of-implementation issues;
the accuracy of fp arithmetic is another (e.g., if you get back -666.0
from the C 1.0 + 2.0, there's nothing in the standard to justify a
complaint).
 free() can be called either explicitly, or implicitly by calling
realloc() with a size larger than the size of the allocation.
From later comments feigning outrage , I take it that "the size
of the allocation" here does not mean the specific number the user
passed to the previous malloc/realloc call, but means whatever amount
of address space the implementation decided to use internally.  Sorry,
but I assumed it meant the former at first.
Sorry for the confusion.
Was this a good decision?  Probably not!

Sounds more like a bug (or two) to me than "a decision", but I don't
know.

You said yourself that it is standards compliant ;)  I have filed it 
as
a bug, but it is probably unlikely to be backported to current 
versions
of Mac OS X unless a case can be made that it is indeed a security
flaw.
That's plausible.  If you showed me a case where Python's list.sort()
took cubic time, I'd certainly consider that to be "a bug", despite
that nothing promises better behavior.  If I wrote a malloc subsystem
and somebody pointed out "did you know that when I malloc 1024**2+1
bytes, and then realloc(1), I lose the other megabyte forever?", I'd
consider that to be "a bug" too (because, docs be damned, I wouldn't
intentionally design a malloc subsystem with such behavior; and
pymalloc does in fact copy bytes on a shrinking realloc in blocks it
controls, whenever at least a quarter of the space is given back --
and it didn't at the start, and I considered that to be "a bug" when
it was pointed out).
I wouldn't equate "until free() is called" with "forever".  But yes, I 
consider it a bug just as you do, and have reported it appropriately.  
Practically, since it exists in Mac OS X 10.2 and Mac OS X 10.3, and 
may not ever be fixed, we should at least consider it.

...
Known case?  No.  Do I want to search Python application-space to find
one?  No.
Serious problems on a platform are usually well-known to users on that
platform.  For example, it was well-known that Python's list-growing
strategy as of a few years ago fragmented address space horribly on
Win9X.  This was a C quality-of-implementation issue specific to that
platform.  It was eventually resolved by improving the list-growing
strategy on all platforms -- although it's still the case that Win9X
does worse on list-growing than other platforms, it's no longer a
disaster for most list-growing apps on Win9X.
It does take a long time to figure such weird behavior out though.  I 
would have to guess that most people Python users on Darwin have been 
at it for less than 3 years.

The number of people using Python on Darwin who have have written or 
used code that exercised this scenario are determined enough to track 
this sort of thing down is probably very small.

If there's a problem with "overallocate then realloc() to cut back" on
Darwin that affects many apps, then I'd expect Darwin users to know
about that already -- lots of people have used Python on Macs since
Python's beginning, "mysterious slowdowns" and "mysterious bloat" get
noticed, and Darwin has been around for a while.
Most people on Mac OS X have a lot of memory, and Mac OS X generally 
does a good job about swapping in and out without causing much of a 
problem, so I'm personally not very surprised that it could go 
unnoticed this long.

Google says:
Results 1 - 10 of about 1,150 for (darwin OR Mac OR "OS X") AND 
MemoryError AND Python.
Results 1 - 10 of about 942 for malloc vm_allocate failed. (0.73 
seconds) 

Of course, in both cases, not all of these can be attributed to 
realloc()'s implementation, but I'm sure some of them can, especially 
the Python ones!

They're [#ifdef's] also the only good way to deal with 
platform-specific
inconsistencies.  In this specific case, it's not even possible to
determine if a particular allocator implementation is stupid or not
without at least using a platform-allocator-specific function to query
the size reserved by a given allocation.
We've had bad experience on several platforms when passing large
numbers to recv().  If that were addressed, it's unclear that Darwin
realloc() behavior would remain a real issue.  OTOH, it is clear that
*just* worming around Darwin realloc() behavior won't help other
platfor

Re: [Python-Dev] Re: Zipfile needs?

2005-01-03 Thread "Martin v. Löwis"
Scott David Daniels wrote:
I believe
there is an issue actually building in the encryption/decryption in
terms of redistribution.
Submitters should not worry about this too much. The issue primarily
exists in the U.S., and there are now (U.S.) official procedures to
deal with them, and the PSF can and does follow these procedures.
Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Out-of-date FAQs

2005-01-03 Thread Delaney, Timothy C (Timothy)
While grabbing the link to the copyright restrictions FAQ (for someone
on python-list) I noticed a few out-of-date FAQ entries - specifically,
"most stable version" and "Why doesn't list.sort() return the sorted
list?". Bug reports have been submitted (and acted on - Raymond, you
work too fast ;)

I think it's important that the FAQs be up-to-date with the latest
idioms, etc, so as I have the time available I intend to review all the
existing FAQs that I'm qualified for.

As a general rule, when an idiom has changed, do we want to state both
the 2.4 idiom as well as the 2.3 idiom? In the case of list.sort(), that
would mean having both:

for key in sorted(dict.iterkeys()):
...do whatever with dict[key]...

and

keys = dict.keys()
keys.sort()
for key in keys:
...do whatever with dict[key]...

Tim Delaney
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Small fix for windows.tex

2005-01-03 Thread Irmen de Jong
The current cvs docs failed to build for me, because of a small
misspelling in the windows.tex file. Here is a patch:
Index: Doc/ext/windows.tex
===
RCS file: /cvsroot/python/python/dist/src/Doc/ext/windows.tex,v
retrieving revision 1.10
diff -u -r1.10 windows.tex
--- Doc/ext/windows.tex 30 Dec 2004 10:44:32 -  1.10
+++ Doc/ext/windows.tex 3 Jan 2005 23:28:20 -
@@ -163,8 +163,8 @@
 click OK.  (Inserting them one by one is fine too.)
 Now open the \menuselection{Project \sub spam properties} dialog.
-You only need to change a few settings.  Make sure \guilable{All
-Configurations} is selected from the \guilable{Settings for:}
+You only need to change a few settings.  Make sure \guilabel{All
+Configurations} is selected from the \guilabel{Settings for:}
 dropdown list.  Select the C/\Cpp{} tab.  Choose the General
 category in the popup menu at the top.  Type the following text in
 the entry box labeled \guilabel{Additional Include Directories}:

--Irmen
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Zipfile needs?

2005-01-03 Thread Shane Holloway (IEEE)
Scott David Daniels wrote:
What other wish list things do people around here have for zipfile?  I thought 
I'd collect input here
and make a PEP.
I was working on a project based around modifying zip files, and found 
that python just doesn't implement that part.  I'd like to see the 
ability to remove a file in the archive, as well as "write over" a file 
already in the archive.

It's a tall order, but you asked.  ;)
Thanks,
-Shane Holloway
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Small fix for windows.tex

2005-01-03 Thread "Martin v. Löwis"
Irmen de Jong wrote:
The current cvs docs failed to build for me, because of a small
misspelling in the windows.tex file. Here is a patch:
Thanks, fixed.
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Out-of-date FAQs

2005-01-03 Thread Aahz
On Tue, Jan 04, 2005, Delaney, Timothy C (Timothy) wrote:
>
> As a general rule, when an idiom has changed, do we want to state both
> the 2.4 idiom as well as the 2.3 idiom? In the case of list.sort(), that
> would mean having both:
> 
> for key in sorted(dict.iterkeys()):
> ...do whatever with dict[key]...
> 
> and
> 
> keys = dict.keys()
> keys.sort()
> for key in keys:
> ...do whatever with dict[key]...

Yes.  Until last July, the company I work for was still using 1.5.2.
Our current version is 2.2.  I think that the FAQ should be usable for
anyone with a "reasonably current" version of Python, say at least two
major versions.  IOW, answers should continue to work with 2.2 during
the lifetime of 2.4.
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"19. A language that doesn't affect the way you think about programming,
is not worth knowing."  --Alan Perlis
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] Out-of-date FAQs

2005-01-03 Thread Delaney, Timothy C (Timothy)
Aahz wrote:

> Yes.  Until last July, the company I work for was still using 1.5.2.
> Our current version is 2.2.  I think that the FAQ should be usable for
> anyone with a "reasonably current" version of Python, say at least two
> major versions.  IOW, answers should continue to work with 2.2 during
> the lifetime of 2.4.

That seems reasonable to me.

Tim Delaney
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

2005-01-03 Thread Tim Peters
[Bob Ippolito]
> ...
> What about for list objects that are big at some point, then
> progressively shrink, but happen to stick around for a while?  An
> "event queue" that got clogged for some reason and then became stable?

It's less plausible that we''re going to see a lot of these
simultaneously alive.  It's possible, of course.  Note that if we do,
fiddling PyObject_Realloc() won't help:  list resizing goes thru the
PyMem_RESIZE() macro, which calls the platform realloc() directly in a
release build (BTW, I suspect that when you were looking for realloc()
calls, you were looking for the string "realloc(" -- but that's not
the only spelling; we don't even have alphabetical choke points
).

The list object itself goes thru Python's small-object allocator,
which makes sense because a list object has a small fixed size
independent of list length.  Space for list elements is allocated
seperately from the list object, and talks to the platform
malloc/free/realloc directly (in release builds, via how the PyMem_XYZ
macros resolve in release builds).

> Dictionaries?

They're not a potential problem here -- dict resizing (whether growing
or shrinking) always proceeds by allocating new space for the dict
guts, copying over elements from the original space, then freeing the
original space.  This is because the hash slot assigned to a key can
change when the table size changes, and keeping collision chains
straight is a real bitch if you try to do it in-place.  IOW, there are
implementation reasons for why CPython dicts will probably never use
realloc().

> Of course these potential problems are a lot less likely to happen.

I think so.

Guido's suggestion to look at PyString_Resize (etc) instead could be a
good one, since those methods know both the number of thingies (bytes,
list elements, tuple elements, ...) currently allocated and the number
of thingies being asked for.  That could be exploited by a portable
heuristic (like malloc+memcpy+free if the new number of thingies is at
least a quarter less than the old number of thingies, else let realloc
(however spelled) exercise its own judgment).  Since list_resize()
doesn't go thru pymalloc, that's the only clear way to worm around
realloc() quirks for lists.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Please help complete the AST branch

2005-01-03 Thread Guido van Rossum
The AST branch has been "nearly complete" for several Python versions
now. The last time a serious effort was made was in May I believe, but
it wasn't enough to merge the code back into 2.4, alas.

It would be a real shame if this code was abandoned. If we're going to
make progress with things like type inferencing, integrating
PyChecker, or optional static type checking (see my blog on Artima --
I just finished rambling part II), the AST branch would be a much
better starting point than the current mainline bytecode compiler.
(Arguably, the compiler package, written in Python, would make an even
better start for prototyping, but I don't expect that it will ever be
fast enough to be Python's only bytecode compiler.)

So, I'm pleading. Please, someone, either from the established crew of
developers or a new volunteer (or more than one!), try to help out to
complete the work on the AST branch and merge it into 2.5.

I wish I could do this myself, and I *am* committed to more time for
Python than last year, but I think I should try to focus on language
design issues more than implementation issues. (Although I  haven't
heard what Larry Wall has been told -- apparently the Perl developers
don't want Larry writing code any more. :-)

Please, anyone? Raymond? Neil? Facundo? Brett?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Please help complete the AST branch

2005-01-03 Thread Brett C.
Guido van Rossum wrote:
The AST branch has been "nearly complete" for several Python versions
now. The last time a serious effort was made was in May I believe, but
it wasn't enough to merge the code back into 2.4, alas.
It would be a real shame if this code was abandoned.
[SNIP]
So, I'm pleading. Please, someone, either from the established crew of
developers or a new volunteer (or more than one!), try to help out to
complete the work on the AST branch and merge it into 2.5.
[SNIP]
Please, anyone? Raymond? Neil? Facundo? Brett?
Funny you should send this out today.  I just did some jiggling with my 
schedule so I could take the undergrad language back-end course this quarter. 
This led to me needing to take a grad-level projects class in Spring.  And what 
was the first suggestion my professor had for that course credit in Spring?

Finish the AST branch.  I am dedicated to finishing the AST branch as soon as 
my thesis is finished, class credit or no.  I just can't delve into that large 
of a project until I get my school stuff in order.  But if I get to do it for 
my class credit I will be able to dedicate 4 units of work to it a week (about 
8 hours minimum).

Plus there is the running tradition of sprinting on the AST branch at PyCon.  I 
was planning on shedding my bug fixing drive at PyCon this year and sprinting 
with (hopefully) Jeremy, Neal, Tim, and Neil on the AST branch as a prep for 
working on it afterwards for my class credit.

Although if someone can start sooner than by all means, go for it!  I can find 
something else to get credit for (such as finishing my monster of a paper 
comparing Python to Java; 34 single-spaced pages just covering paradigm support 
and the standard libraries so far).  And obviously help would be great since it 
isn't a puny codebase (4,000 lines so far for the CST->AST and AST->bytecode code).

If anyone would like to see the current code, check out ast-branch from CVS 
(read the dev FAQ on how to check out a branch from CVS).  Read 
Python/compile.txt for an overview of how the thing works and such.

It will get done, just don't push for a 2.5 release within a month.  =)
-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Please help complete the AST branch

2005-01-03 Thread Jeff Epler
On Mon, Jan 03, 2005 at 06:02:52PM -0800, Brett C. wrote:
> Although if someone can start sooner than by all means, go for it!
> And obviously help would be great since it isn't a puny codebase
> (4,000 lines so far for the CST->AST and AST->bytecode code).

And obviously knowing a little more about the AST branch would be
helpful for those considering helping.

Is there any relatively up-to-date document about ast-branch?  googling
about it turned up some pypy stuff from 2003, and I didn't look much
further.

I just built the ast-branch for fun, and "make test" mostly worked.
8 tests failed:
test_builtin test_dis test_generators test_inspect test_pep263
test_scope test_symtable test_trace
6 skips unexpected on linux2:
test_csv test_hotshot test_bsddb test_parser test_logging
test_email
I haven't looked at any of the failures in detail, but at least
test_bsddb is due to missing development libs on this system

One more thing:  The software I work on by day has python scripting.
One part of that functionality is a tree display of a script.  I'm not
actively involved with this part of the software (yet).  Any comments on
whether ast-branch could be construed as helping make this kind of
functionality work better, faster, or easier?  The code we use currently
is based on a modified version of the parser which includes comment
information, so we need to be aware of changes in this area anyhow.

(on the other hand, I won't hold my breath for permission to do this
on the clock, because of our own release scheduling I have other
projects on my plate now, and a version of our software that uses a
post-2.3 Python is years away)

Jeff


pgpP9DEEqf26m.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Please help complete the AST branch

2005-01-03 Thread Jeremy Hylton
On Mon, 03 Jan 2005 18:02:52 -0800, Brett C. <[EMAIL PROTECTED]> wrote:
> Plus there is the running tradition of sprinting on the AST branch at PyCon.  
> I
> was planning on shedding my bug fixing drive at PyCon this year and sprinting
> with (hopefully) Jeremy, Neal, Tim, and Neil on the AST branch as a prep for
> working on it afterwards for my class credit.

I'd like to sprint on it before PyCon; we'll have to see what my
schedule allows.

> If anyone would like to see the current code, check out ast-branch from CVS
> (read the dev FAQ on how to check out a branch from CVS).  Read
> Python/compile.txt for an overview of how the thing works and such.
> 
> It will get done, just don't push for a 2.5 release within a month.  =)

I think the branch is in an awkward state, because of the new features
added to Python 2.4 after the AST branch work ceased.  The ast branch
doesn't handle generator expressions or decorators; extending the ast
to support them would be a good first step.

There are also the simple logistical questions of integrating changes.
 Since most of the AST branch changes are confined to a few files, I
suspect the best thing to do is to merge all the changes from the head
except for compile.c.  I haven't done a major CVS branch integrate in
at least nine months; if someone feels more comfortable with that, it
would also be a good step.

Perhaps interested parties should take up the discussion on the
compiler-sig.  I think we can recover the state of last May's effort
pretty quickly, and I can help outline the remaining work even if I
can't help much.  (Although I hope I can help, too.)

Jeremy
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


RE: [Python-Dev] Please help complete the AST branch

2005-01-03 Thread Tony Meyer
> Perhaps interested parties should take up the discussion on 
> the compiler-sig.

This isn't listed in the 'currently active' SIGs list on
 - is it still active, or will it now be?  If so,
perhaps it should be added to the list?

By 'discussion on', do you mean via the wiki at
?

=Tony.Meyer

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com