Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
> I don't think expecting people to tweak gc parameters when they witness
> performance problems is reasonable.

What follows from that? To me, the natural conclusion is "people who
witness performance problems just need to despair, or accept them, as
they can't do anything about it", however, I don't think this is the
conclusion that you had in mind.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Status of Issue 2331 - Backport parameter annotations

2008-06-21 Thread [EMAIL PROTECTED]
> Does it bother you that you need ()s to make instances elsewhere?
> That you have to type int('123') instead of int, '123'?

Not at all...Python never supported this.

Cheers,
David
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Issue 2722

2008-06-21 Thread Neil Muller
Could some one have a look at the suggested fix for issue 2722? While
not a particularly common problem, it can be an annoyance, and the
patch looks reasonable.

-- 
Neil Muller
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] csv module and Unicode in 2.6 and 3.0

2008-06-21 Thread skip

If I understand things correctly, the csv module in 3.0 supports Unicode,
kind of for free because all strings are Unicode.  I suspect the Unicode
examples at the bottom of the 3.0 version of the module doc need to be
removed.

Does the 2.6 version support Unicode out of the box as well or are the
Unicode examples still required there?

Skip
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Antoine Pitrou
Le samedi 21 juin 2008 à 10:33 +0200, "Martin v. Löwis" a écrit :
> > I don't think expecting people to tweak gc parameters when they witness
> > performance problems is reasonable.
> 
> What follows from that? To me, the natural conclusion is "people who
> witness performance problems just need to despair, or accept them, as
> they can't do anything about it",

To me, Amaury's answer implies that most people can't do anything about
it, indeed.

Well, they could hang themselves or switch to another language (which
some people might view as equivalent :-)), but perhaps optimistically
the various propositions that were sketched out in this thread (by Adam
Olsen and Greg Ewing) could bring an improvement. I don't know how
realistic they are, perhaps an expert would have an answer.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Kevin Jacobs <[EMAIL PROTECTED]>
On Sat, Jun 21, 2008 at 4:33 AM, "Martin v. Löwis" <[EMAIL PROTECTED]>
wrote:

> > I don't think expecting people to tweak gc parameters when they witness
> > performance problems is reasonable.
>
> What follows from that? To me, the natural conclusion is "people who
> witness performance problems just need to despair, or accept them, as
> they can't do anything about it", however, I don't think this is the
> conclusion that you had in mind.
>

I can say with complete certainty that of the 20+ programmers I've had
working for me, many who have used Python for 3+ years, not a single one
would think to question the garbage collector if they observed the kind of
quadratic time complexity I've demonstrated.  This is not because they are
stupid, but because they have only a vague idea that Python even has a
garbage collector, never mind that it could be behaving badly for such
innocuous looking code.

Maybe we should consider more carefully before declaring the status quo
sufficient.  Average developers do allocate millions of objects in bursts
and super-linear time complexity for such operations is not acceptable.
Thankfully I am around to help my programmers work around such issues or
else they'd be pushing to switch to Java, Ruby, C#, or whatever since Python
was inexplicably "too slow" for "real work".  This being open source, I'm
certainly willing to help in the effort to do so, but not if potential
solutions will be ruled out as being unnecessary.

-Kevin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
> I can say with complete certainty that of the 20+ programmers I've had
> working for me, many who have used Python for 3+ years, not a single one
> would think to question the garbage collector if they observed the kind
> of quadratic time complexity I've demonstrated.  This is not because
> they are stupid, but because they have only a vague idea that Python
> even has a garbage collector, never mind that it could be behaving badly
> for such innocuous looking code.
> 
> Maybe we should consider more carefully before declaring the status quo
> sufficient.

This was precisely my question: What follows from the above observation?

I personally didn't declare the status quo sufficient - I merely
declared it as being the status quo.

> Average developers do allocate millions of objects in
> bursts and super-linear time complexity for such operations is not
> acceptable.  Thankfully I am around to help my programmers work around
> such issues or else they'd be pushing to switch to Java, Ruby, C#, or
> whatever since Python was inexplicably "too slow" for "real work".  This
> being open source, I'm certainly willing to help in the effort to do so,
> but not if potential solutions will be ruled out as being unnecessary.

I wouldn't rule out solutions as being unnecessary. I might rule out
solutions that negatively impact existing software, for the sake of
improving other existing software.

Unfortunately, the only way to find out whether a solution will be ruled
out is to propose it first. Only then you'll see what response you get.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
> Well, they could hang themselves or switch to another language (which
> some people might view as equivalent :-)), but perhaps optimistically
> the various propositions that were sketched out in this thread (by Adam
> Olsen and Greg Ewing) could bring an improvement. I don't know how
> realistic they are, perhaps an expert would have an answer.

In general, any solution of the "do GC less often" needs to deal with
cases where lots of garbage gets produced in a short amount of time
(e.g. in a tight loop), and which run out of memory when GC is done less
often.

Given the choice of "run slower" and "run out of memory", Python should
always prefer the former.

One approach could be to measure how successful a GC run was: if GC
finds that more-and-more objects get allocated and very few (or none)
are garbage, it might conclude that this is an allocation spike, and
back off. The tricky question is how to find out that the spike is
over.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Kevin Jacobs <[EMAIL PROTECTED]>
On Sat, Jun 21, 2008 at 11:20 AM, "Martin v. Löwis" <[EMAIL PROTECTED]>
wrote:

> In general, any solution of the "do GC less often" needs to deal with
> cases where lots of garbage gets produced in a short amount of time
> (e.g. in a tight loop), and which run out of memory when GC is done less
> often.
>


Idea 1: Allow GC to run automatically no more often than n CPU seconds, n
being perhaps 5 or 10.
Idea 2: Allow GC to run no more often than f(n) CPU seconds, where n is the
time taken by the last GC round.

These limits could be reset or scaled by the GC collecting more than n% of
the generation 0 objects or maybe the number of PyMalloc arenas increasing
by a certain amount?

-Kevin


> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/jacobs%40bioinformed.com
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
> Idea 1: Allow GC to run automatically no more often than n CPU seconds,
> n being perhaps 5 or 10.

I think it's very easy to exhaust the memory with such a policy, even
though much memory would still be available. Worse, in a program
producing a lot of garbage, performance will go significantly down as
the python starts thrashing the swap space.

> Idea 2: Allow GC to run no more often than f(n) CPU seconds, where n is
> the time taken by the last GC round.

How would that take the incremental GC into account? (i.e. what is
"the time taken by the last GC round"?)

Furthermore, the GC run time might well be under the resolution of the
CPU seconds clock.

> These limits could be reset or scaled by the GC collecting more than n%
> of the generation 0 objects or maybe the number of PyMalloc arenas
> increasing by a certain amount?

I don't think any strategies based on timing will be successful.
Instead, one should count and analyze objects (although I'm unsure
how exactly that could work).

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Aahz
On Sat, Jun 21, 2008, "Martin v. L??wis" wrote:
> 
> In general, any solution of the "do GC less often" needs to deal with
> cases where lots of garbage gets produced in a short amount of time
> (e.g. in a tight loop), and which run out of memory when GC is done less
> often.
> 
> Given the choice of "run slower" and "run out of memory", Python should
> always prefer the former.

I'm not sure I agree with this.  GC IIRC was introduced primarily to
alleviate *long-term* memory starvation.  You are now IMO adding a new
goal for GC that has not been previously articulated.  I believe this
requires consensus rather than a simple declaration of principle.

Guido's opinion if he has one obviously overrules.  ;-)  Guido?
-- 
Aahz ([EMAIL PROTECTED])   <*> http://www.pythoncraft.com/

"as long as we like the same operating system, things are cool." --piranha
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
> I'm not sure I agree with this.  GC IIRC was introduced primarily to
> alleviate *long-term* memory starvation.

I don't think that's historically the case. GC would not need to be
generational if releasing short-lived objects shortly after they become
garbage was irrelevant. Of course, it was always expected that much
memory is released through mere reference counting, and that GC only
kicks in "after some time". However "after some time" was changed from
"after 5000 allocations" to "after 700 allocations" in


r17274 | jhylton | 2000-09-05 17:44:50 +0200 (Di, 05 Sep 2000) | 2 lines
Geänderte Pfade:
   M /python/trunk/Modules/gcmodule.c

compromise value for threshold0: not too high, not too low



Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Bill Janssen
> > What follows from that? To me, the natural conclusion is "people who
> > witness performance problems just need to despair, or accept them, as
> > they can't do anything about it", however, I don't think this is the
> > conclusion that you had in mind.
> >
> 
> I can say with complete certainty that of the 20+ programmers I've had
> working for me, many who have used Python for 3+ years, not a single one
> would think to question the garbage collector if they observed the kind of
> quadratic time complexity I've demonstrated.  This is not because they are
> stupid, but because they have only a vague idea that Python even has a
> garbage collector, never mind that it could be behaving badly for such
> innocuous looking code.

Perhaps this is something documentation could help.  I'm thinking of a
one-page checklist listing places they might look for performance
problems, that your programmers could work through.

Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Terry Reedy



Kevin Jacobs <[EMAIL PROTECTED]> wrote:

I can say with complete certainty that of the 20+ programmers I've had 
working for me, many who have used Python for 3+ years, not a single one 
would think to question the garbage collector if they observed the kind 
of quadratic time complexity I've demonstrated.  This is not because 
they are stupid, but because they have only a vague idea that Python 
even has a garbage collector, never mind that it could be behaving badly 
for such innocuous looking code.


As I understand it, gc is needed now more that ever because new style 
classes make reference cycles more common.  On the other hand, greatly 
increased RAM size (from some years ago) makes megaobject bursts 
possible.  Such large bursts move the hidden quadratic do-nothing drag 
out of the relatively flat part of the curve (total time just double or 
triple what it should be) to where it can really bite.  Leaving aside 
what you do for your local group, can we better warn Python programmers 
now, for the upcoming 2.5, 2.6, and 3.0 releases?


Paragraph 3 of the Reference Manual chapter on Data Model(3.0 version) says:
"Objects are never explicitly destroyed; however, when they become 
unreachable they may be garbage-collected. An implementation is allowed 
to postpone garbage collection or omit it altogether — it is a matter of 
implementation quality how garbage collection is implemented, as long as 
no objects are collected that are still reachable. (Implementation note: 
the current implementation uses a reference-counting scheme with 
(optional) delayed detection of cyclically linked garbage, which 
collects most objects as soon as they become unreachable, but is not 
guaranteed to collect garbage containing circular references. See the 
documentation of the gc module for information on controlling the 
collection of cyclic garbage.)"

I am not sure what to add here, (especially for those who do not read it;-).

The Library Manual gc section says "Since the collector supplements the 
reference counting already used in Python, you can disable the collector 
if you are sure your program does not create reference cycles."  Perhaps 
 it should also say "You should disable when creating millions of 
objects without cycles".


The installed documentation set (on Windows, at least) include some 
Python HOWTOs.  If one were added on Space Management (implementations, 
problems, and solutions), would your developers read it?


Maybe we should consider more carefully before declaring the status quo 
sufficient.  Average developers do allocate millions of objects in 
bursts and super-linear time complexity for such operations is not 
acceptable.  Thankfully I am around to help my programmers work around 
such issues or else they'd be pushing to switch to Java, Ruby, C#, or 
whatever since Python was inexplicably "too slow" for "real work".  This 
being open source, I'm certainly willing to help in the effort to do so, 
but not if potential solutions will be ruled out as being unnecessary.


To me, 'sufficient' (time-dependent) and 'necessary' are either too 
vague or  too strict to being about what you want -- change.  This is 
the third thread I have read (here + c.l.p) on default-mode gc  problems 
(but all in the last couple of years or so).  So, especially with the 
nice table someone posted recently, on time with and without gc, and 
considering that installed RAM continues to grow, I am persuaded that 
default behavior improvement that does not negatively impact the vast 
majority would be desirable.


Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Antoine Pitrou

Le samedi 21 juin 2008 à 17:49 +0200, "Martin v. Löwis" a écrit : 
> I don't think any strategies based on timing will be successful.
> Instead, one should count and analyze objects (although I'm unsure
> how exactly that could work).

Would it be helpful if the GC was informed of memory growth by the
Python memory allocator (that is, each time it either asks or gives back
a block of memory to the system allocator) ?


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
Antoine Pitrou wrote:
> Le samedi 21 juin 2008 à 17:49 +0200, "Martin v. Löwis" a écrit : 
>> I don't think any strategies based on timing will be successful.
>> Instead, one should count and analyze objects (although I'm unsure
>> how exactly that could work).
> 
> Would it be helpful if the GC was informed of memory growth by the
> Python memory allocator (that is, each time it either asks or gives back
> a block of memory to the system allocator) ?

I don't see how. The garbage collector is already informed about memory
growth; it learns exactly when a container object is allocated or
deallocated. That the allocator then requests memory from the system
only confirms what the garbage collector already knew: that there are
lots of allocated objects. From that, one could infer that it might
be time to perform garbage collection - or one could infer that all
the objects are really useful, and no garbage can be collected.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Proposal: Run GC less often

2008-06-21 Thread Martin v. Löwis
Here is my proposal for making the GC run less often.
The objective is to avoid the quadratic-time behavior
if many objects are created and none of them is garbage.

The youngest generation remains collected after 700
allocations, the middle generation after 10 young collections.
Therefore, full GC remains considered every 7000 allocations.
The two youngest generations don't cause quadratic behavior,
as the number of objects in each generation is bounded.

Currently, full GC is invoked every 10 middle collections.
Under my proposal, 10 middle collections must have passed,
PLUS the number of survivor objects from the middle generation
must exceed 10% of the number of objects in the oldest
generation.

As a consequence, garbage collection becomes less frequent
as the number of objects on the heap grows, resulting in
an amortized O(N) overhead for garbage collection (under
the access pattern stated above).

To determine the number of survivor objects, counts are
taken during the collection. Objects deallocated through
reference counting still remain accounted for as survivors
(likewise for determining the size of the oldest generation).

I made a simulation assuming an initially-empty heap,
and invocations of the collector every M=7000 objects.
The table below is in units of M and shows the number
of objects allocated, and the number of objects inspected
since the start of the simulation, for the old and the
new scheme (the asterisk indicates that a collection
took place; lines with no collection are dropped):

 total old_inspected new_inspected
   10 10* 10*
   20 30* 30*
   30 60* 60*
   40100*100*
   50150*150*
   60210*210*
   70280*280*
   80360*360*
   90450*450*
  100550*450
  102550 552*
  110660*552
  115660 667*
  120780*667
  129780 796*
  130910*796
  140   1050*796
...
  940  44650*   7724
  942  446508666*
  950  45600*   8666
  960  46560*   8666
  970  47530*   8666
  980  48510*   8666
  990  49500*   8666
 1000  50500*   8666
...
 99904995000*  95887

As you can see, the old and the new scheme would start
of equally until 100M objects have been allocated. The
table shows how the old scheme grows quadratically, and
the new scheme eventually approaches roughly a factor
of ten between the number of objects and the number of
times that GC considers an object.

Applications with a small number of objects will see no
change in behavior, also, applications that create
lots of small cycles will continue to see them collected
in the youngest or middle generations.

Please let me know what you think.

Regards,
Martin

P.S. I don't plan to implement this myself, so if you like
the idea, go ahead.
old_skipped = new_skipped = 0
old_seen = new_seen = 0
survivors = 0 # will always equal new_skipped

print " total old_inspected new_inspected"

for total in range(1, 1):
if old_skipped < 9:
old_skipped += 1
old_tag = ' '
else:
# Old Collection
old_skipped = 0
old_seen += total
old_tag = '*'
if new_skipped < 9 or survivors*10 < total:
survivors += 1
new_skipped += 1
new_tag = ' '
else:
# New Collection
survivors = 0
new_skipped = 0
new_seen += total
new_tag = '*'
if old_tag=='*' or new_tag=='*':
print "%5d %10d%s %10d%s" % (total, old_seen, old_tag, new_seen, new_tag)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: Run GC less often

2008-06-21 Thread Brett Cannon
On Sat, Jun 21, 2008 at 1:23 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Here is my proposal for making the GC run less often.
> The objective is to avoid the quadratic-time behavior
> if many objects are created and none of them is garbage.
>
[SNIP]
> Applications with a small number of objects will see no
> change in behavior, also, applications that create
> lots of small cycles will continue to see them collected
> in the youngest or middle generations.
>
> Please let me know what you think.
>

Interesting. Seems reasonable to me if the impact really is as minimal
as you say, Martin.

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Stephen J. Turnbull
"Martin v. Löwis" writes:

 > Given the choice of "run slower" and "run out of memory", Python should
 > always prefer the former.
 > 
 > One approach could be to measure how successful a GC run was: if GC
 > finds that more-and-more objects get allocated and very few (or none)
 > are garbage, it might conclude that this is an allocation spike, and
 > back off. The tricky question is how to find out that the spike is
 > over.

XEmacs implements this strategy in a way which is claimed to give
constant amortized time (ie, averaged over memory allocated).  I
forget the exact parameters, but ISTR it's just period ("time"
measured by bytes allocated) is proportional to currently allocated
memory.  Some people claim this is much more comfortable than the
traditional "GC after N bytes are allocated" algorithm but I don't
notice much difference.  I don't know how well this intuition carries
over to noninteractive applications.

In XEmacs experimenting with such strategies is pretty easy, since the
function that determines period is only a few lines long.  I assume
that would be true of Python, too.

However, isn't the real question whether there is memory pressure or
not?  If you've got an unloaded machine with 2GB of memory, even a 1GB
spike might have no observable consequences.  How about a policy of
GC-ing with decreasing period ("time" measured by bytes allocated or
number of allocations) as the fraction of memory used increases,
starting from a pretty large fraction (say 50% by default)?  The total
amount of memory could be a soft limit, defaulting to the amount of
fast memory actually available.

For interactive and maybe some batch applications, it might be
appropriate to generate a runtime warning as memory use approches some
limits, too.

Nevertheless, I think the real solution has to be for Python
programmers to be aware that there is GC, and that they can tune it.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Another Proposal: Run GC less often

2008-06-21 Thread none
Instead of collecting objects after a fixed number of allocations (700) 
You could make it dynamic like this:


# initial values
min_allocated_memory = 0
max_allocated_memory = 0
next_gc_run = 1024 * 1024

def manage_memory():
allocated_memory = get_allocated_memory()
min_allocated_memory = min(min_allocated_memory, allocated_memory)
max_allocated_memory = max(max_allocated_memory, allocated_memory)

if max_allocated_memory - min_allocated_memory > next_gc_run:
# run the gc
memory_freed, allocated_memory = run_gc()
next_gc_run = max(
allocated_memory * 1.5 - memory_freed, 1024 * 1024
)
min_allocated_memory = allocated_memory
max_allocated_memory = allocated_memory


manage_memory() should be called after every allocation and after a ref 
count of an object reaches 0 (memory is freed)



Expected behaviours:

=> As less objects contain cyclic references as less often the GC will 
run (memory_freed is small)


=> As more objects contain cyclic references as more often the GC will 
run (memory_freed is large)


=> If memory utiliaziation grows fast (burst allocations) GC will run 
less often: next_gc_run = allocated_memory * 1.5 - memory_freed


... Of course the constants: 1.5 and 1024 * 1024 are only suggestions...

- Ralf
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: Run GC less often

2008-06-21 Thread Leif Walsh
If you can get me a version of the interpreter with this change made
(I wouldn't know what to change), I can run a very
allocation/deallocation-heavy application I have lying around, and get
you some good benchmarks.

On Sat, Jun 21, 2008 at 1:23 PM, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
- Show quoted text -
> Here is my proposal for making the GC run less often.
> The objective is to avoid the quadratic-time behavior
> if many objects are created and none of them is garbage.
>
> The youngest generation remains collected after 700
> allocations, the middle generation after 10 young collections.
> Therefore, full GC remains considered every 7000 allocations.
> The two youngest generations don't cause quadratic behavior,
> as the number of objects in each generation is bounded.
>
> Currently, full GC is invoked every 10 middle collections.
> Under my proposal, 10 middle collections must have passed,
> PLUS the number of survivor objects from the middle generation
> must exceed 10% of the number of objects in the oldest
> generation.
>
> As a consequence, garbage collection becomes less frequent
> as the number of objects on the heap grows, resulting in
> an amortized O(N) overhead for garbage collection (under
> the access pattern stated above).
>
> To determine the number of survivor objects, counts are
> taken during the collection. Objects deallocated through
> reference counting still remain accounted for as survivors
> (likewise for determining the size of the oldest generation).
>
> I made a simulation assuming an initially-empty heap,
> and invocations of the collector every M=7000 objects.
> The table below is in units of M and shows the number
> of objects allocated, and the number of objects inspected
> since the start of the simulation, for the old and the
> new scheme (the asterisk indicates that a collection
> took place; lines with no collection are dropped):
>
>  total old_inspected new_inspected
>   10 10* 10*
>   20 30* 30*
>   30 60* 60*
>   40100*100*
>   50150*150*
>   60210*210*
>   70280*280*
>   80360*360*
>   90450*450*
>  100550*450
>  102550 552*
>  110660*552
>  115660 667*
>  120780*667
>  129780 796*
>  130910*796
>  140   1050*796
> ...
>  940  44650*   7724
>  942  446508666*
>  950  45600*   8666
>  960  46560*   8666
>  970  47530*   8666
>  980  48510*   8666
>  990  49500*   8666
>  1000  50500*   8666
> ...
>  99904995000*  95887
>
> As you can see, the old and the new scheme would start
> of equally until 100M objects have been allocated. The
> table shows how the old scheme grows quadratically, and
> the new scheme eventually approaches roughly a factor
> of ten between the number of objects and the number of
> times that GC considers an object.
>
> Applications with a small number of objects will see no
> change in behavior, also, applications that create
> lots of small cycles will continue to see them collected
> in the youngest or middle generations.
>
> Please let me know what you think.
>
> Regards,
> Martin
>
> P.S. I don't plan to implement this myself, so if you like
> the idea, go ahead.
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/adlaiff6%40gmail.com
>
>



--
Cheers,
Leif
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-3000] RELEASED Python 2.6b1 and 3.0b1

2008-06-21 Thread Leif Walsh
There is a typo on the download page (http://python.org/download/releases/3.0/):

"""The release plan is to have a series of alpha releases in 2007,
beta releases in 2008, and a final release in September 2008. The
alpha releases are primarily aimed at developers who want a sneak peek
at the new langauge, especially those folks who plan to port their
code to Python 3000. The hope is that by the time of the final
release, many 3rd party packages will already be available in a
3.0-compatible form.
"""

Should be

"""The release plan is to have a series of alpha releases in 2007,
beta releases in 2008, and a final release in September 2008. The
alpha releases are primarily aimed at developers who want a sneak peek
at the new *language*, especially those folks who plan to port their
code to Python 3000. The hope is that by the time of the final
release, many 3rd party packages will already be available in a
3.0-compatible form.
"""

-- 
Cheers,
Leif
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-3000] RELEASED Python 2.6b1 and 3.0b1

2008-06-21 Thread Leif Walsh
There is a typo on the download page (http://python.org/download/releases/3.0/):

"""The release plan is to have a series of alpha releases in 2007,
beta releases in 2008, and a final release in September 2008. The
alpha releases are primarily aimed at developers who want a sneak peek
at the new langauge, especially those folks who plan to port their
code to Python 3000. The hope is that by the time of the final
release, many 3rd party packages will already be available in a
3.0-compatible form.
"""

Should be

"""The release plan is to have a series of alpha releases in 2007,
beta releases in 2008, and a final release in September 2008. The
alpha releases are primarily aimed at developers who want a sneak peek
at the new *language*, especially those folks who plan to port their
code to Python 3000. The hope is that by the time of the final
release, many 3rd party packages will already be available in a
3.0-compatible form.
"""

(I really need to add my other e-mail address to these lists...sorry
if my first message got through to one of them and this is a dupe.)

--
Cheers,
Leif
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-3000] RELEASED Python 2.6b1 and 3.0b1

2008-06-21 Thread Barry Warsaw

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Jun 21, 2008, at 5:45 PM, Leif Walsh wrote:

There is a typo on the download page (http://python.org/download/releases/3.0/ 
):


Fixed, thanks!
- -Barry

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSF2G5nEjvBPtnXfVAQJbqwQAkEMtWMYdsMFWnGmQCrpEV3BXZGipmZQs
7dnWoJIVsjN7pBY/5aVMiQEl5JM3n8IdVpwZMWM9Q+VHgkB04u8f7/wS2xmR7XsA
dySDMrxUJXqXX1ze/vWKosp39Yvey6HeTsiOUzvnXjS6IKM757IhyB94o4KQF69K
mOS8tjydwE8=
=cY+d
-END PGP SIGNATURE-
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Martin v. Löwis
> XEmacs implements this strategy in a way which is claimed to give
> constant amortized time (ie, averaged over memory allocated).

See my recent proposal. The old trick is to do reorganizations
in a fixed fraction of the total size, resulting in a per-increase
amortized-constant overhead (assuming each reorganization takes time
linear with total size).

> However, isn't the real question whether there is memory pressure or
> not?  If you've got an unloaded machine with 2GB of memory, even a 1GB
> spike might have no observable consequences.  How about a policy of
> GC-ing with decreasing period ("time" measured by bytes allocated or
> number of allocations) as the fraction of memory used increases,
> starting from a pretty large fraction (say 50% by default)?

The problem with such an approach is that it is very difficult to
measure. What to do about virtual memory? What to do about other
applications that also consume memory?

On some systems (Windows in particular), the operating system indicates
memory pressure through some IPC mechanism; on such systems, it might
be reasonable to perform garbage collection (only?) when the system asks
for it. However, the system might not ask for GC while the swap space
is still not exhausted, meaning that the deferred GC would take a long
time to complete (having to page in every object).

> Nevertheless, I think the real solution has to be for Python
> programmers to be aware that there is GC, and that they can tune it.

I don't think there is a "real solution". I think programmers should
abstain from complaining if they can do something about the problem
in their own application (unless the complaint is formulated as a
patch) - wait - I think programmers should abstain from complaining,
period.

Regards,
Martin

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: Run GC less often

2008-06-21 Thread Greg Ewing

Martin v. Löwis wrote:


Under my proposal, 10 middle collections must have passed,
PLUS the number of survivor objects from the middle generation
must exceed 10% of the number of objects in the oldest
generation.


What happens if the program enters a phase where it's not
producing any new cyclic garbage, but is breaking references
among the old objects in such a way that cycles of them
are being left behind? Under this rule, the oldest
generation would never be scanned, so those cycles would
never be collected.


As a consequence, garbage collection becomes less frequent
as the number of objects on the heap grows


Wouldn't it be simpler just to base the collection frequency
directly on the total number of objects in the heap?
From what another poster said, this seems to be what
emacs does.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Nick Coghlan

Martin v. Löwis wrote:

Antoine Pitrou wrote:
Le samedi 21 juin 2008 à 17:49 +0200, "Martin v. Löwis" a écrit : 

I don't think any strategies based on timing will be successful.
Instead, one should count and analyze objects (although I'm unsure
how exactly that could work).

Would it be helpful if the GC was informed of memory growth by the
Python memory allocator (that is, each time it either asks or gives back
a block of memory to the system allocator) ?


I don't see how. The garbage collector is already informed about memory
growth; it learns exactly when a container object is allocated or
deallocated. That the allocator then requests memory from the system
only confirms what the garbage collector already knew: that there are
lots of allocated objects. From that, one could infer that it might
be time to perform garbage collection - or one could infer that all
the objects are really useful, and no garbage can be collected.


I was wondering whether it might be useful to detect the end of an 
allocation spike: if PyMalloc incremented an internal counter each time 
it requested more memory from the system, then the GC could check 
whether that number had changed on each collection cycle. If the GC goes 
a certain number of collection cycles without more memory being 
requested from the system, then it can assume the allocation spike is 
over and tighten up the default tuning again.


Such a pymalloc counter would provide a slightly more holistic view of 
overall memory usage changes than the container-focused view of the 
cyclic GC.


Cheers,
Nick.

--
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
http://www.boredomandlaziness.org
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another Proposal: Run GC less often

2008-06-21 Thread Tony Nelson
At 11:28 PM +0200 6/21/08, none wrote:
>Instead of collecting objects after a fixed number of allocations (700)
 ...

I've seen this asserted several times in this thread:  that GC is done
every fixed number of allocations.  This is not correct.  GC is done when
the surplus of allocations less deallocations exceeds a threashold.  See
Modules/gcmodule.c and look for ".count++" and ".count--".  In normal
operation, allocations and deallocations stay somewhat balanced, but when
creating a large data structure, it's allocations all the way and GC runs
often.
-- 

TonyN.:'   
  '  
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for gc.enable() and gc.disable()

2008-06-21 Thread Stephen J. Turnbull
"Martin v. Löwis" writes:
 > > XEmacs implements this strategy in a way which is claimed to give
 > > constant amortized time (ie, averaged over memory allocated).
 > 
 > See my recent proposal.

I did, crossed in the mail.  To the extent that I understand both
systems, your proposal looks like an improvement over what we've got.

 > > However, isn't the real question whether there is memory pressure or
 > > not?  If you've got an unloaded machine with 2GB of memory, even a 1GB
 > > spike might have no observable consequences.  How about a policy of
 > > GC-ing with decreasing period ("time" measured by bytes allocated or
 > > number of allocations) as the fraction of memory used increases,
 > > starting from a pretty large fraction (say 50% by default)?
 > 
 > The problem with such an approach is that it is very difficult to
 > measure.

On some systems it should be possible to get information about how
much paging is taking place, which would be an indicator of pressure.

 > What to do about virtual memory?

If you're in virtual memory, you're already in trouble.  Once you're
paging, you need to increase time between GCs, so the rule would not
be monotonic.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal: Run GC less often

2008-06-21 Thread Martin v. Löwis
> What happens if the program enters a phase where it's not
> producing any new cyclic garbage, but is breaking references
> among the old objects in such a way that cycles of them
> are being left behind? Under this rule, the oldest
> generation would never be scanned, so those cycles would
> never be collected.

Correct. However, this is the case already today.

> Wouldn't it be simpler just to base the collection frequency
> directly on the total number of objects in the heap?

Using what precise formula?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com