date:20120119

Re: [Python-Dev] Writable doc

2012-01-19 Thread Victor Stinner

> http://bugs.python.org/issue12773  :)

The bug is marked as close, whereas the bug exists in Python 3.2 and
has no been closed. The fix must be backported.

Victor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions

2012-01-19 Thread Antoine Pitrou

On Thu, 19 Jan 2012 11:12:06 +1100
Steven D'Aprano  wrote:
> Antoine Pitrou wrote:
> > Le jeudi 19 janvier 2012 à 00:25 +0900, Stephen J. Turnbull a écrit :
> >>  > You claim people won't use stable releases because of not enough
> >>  > alphas?  That sounds completely unrelated.
> >>
> >> Surely testing is related to user perceptions of stability.  More
> >> testing helps reduce bugs in released software, which improves user
> >> perception of stability, encouraging them to use the software in
> >> production.
> > 
> > I have asked a practical question, a theoretical answer isn't exactly
> > what I was waiting for.
> [...]
> > I don't care to convince *you*, since you are not involved in Python
> > development and release management (you haven't ever been a contributor
> > AFAIK). Unless you produce practical arguments, saying "I don't think
> > you can do it" is plain FUD and certainly not worth answering to.
> 
> Pardon me, but people like Stephen Turnbull are *users* of Python, exactly 
> the 
> sort of people you DO have to convince that moving to an accelerated or more 
> complex release process will result in a better product. The risk is that you 
> will lose users, or fragment the user base even more than it is now with 2.x 
> vs 3.x.

Well, you might bring some examples here, but I haven't seen any project
lose users *because* they switched to a faster release cycle (*). I
don't understand why this proposal would fragment the user base, either.
We're not proposing to drop compatibility or build Python 4.

((*) Firefox's decrease in popularity seems to be due to Chrome uptake,
and their new release cycle is arguably in response to that)

> Quite frankly, I like the simplicity and speed of the current release cycle. 
> All this talk about separate LTS releases and parallel language releases and 
> library releases makes my head spin.

Well, the PEP discussion might make your head spin, because various
possibilities are explored. Obviously the final solution will have to
be simple enough to be understood by anyone :-)

(do you find Ubuntu's release model, for example, too complicated?)

> I fear the day that people asking 
> questions on the tutor or python-list mailing lists will have to say (e.g.) 
> "I'm using Python 3.4.1 and standard library 1.2.7" in order to specify the 
> version they're using.

Yeah, that's my biggest problem with Nick's proposal. Hopefully we can
avoid parallel version schemes.

> You're hoping that a 
> more rapid release cycle will attract more developers, and there is a chance 
> that you could be right; but a more rapid release cycle WILL increase the 
> total work load. So you're betting that this change will attract enough new 
> developers that the work load per person will decrease even as the total work 
> load increases.

This is not something that we can find out without trying, I think.
As Georg pointed out, the decision is easy to revert or amend if
we find out that the new release cycle is unworkable.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 407 / splitting the stdlib

2012-01-19 Thread Antoine Pitrou

On Thu, 19 Jan 2012 11:03:15 +1000
Nick Coghlan  wrote:
> 
> 1. I believe the PEP currently proposes just taking the "no more than
> 9" limit off the minor version of the language. Feature releases would
> just come out every 6 months, with every 4th release flagged as a
> language release.

With the moratorium suggestion factored in, yes. The PEP insists on
support duration rather than the breadth of changes, though. I think
that's a more important piece of information for users.

(you don't care whether or not new language constructs were added, if
you were not planning to use them)

> I don't like this scheme because it tries to use one number (the minor
> version field) to cover two very different concepts (stdlib updates
> and language updates). While technically feasible, this is
> unnecessarily obscure and confusing for end users.

As an end user I wouldn't really care whether a release is "stdlib
changes only" or "language/builtins additions too" (especially in a
language like Python where the boundaries are somewhat blurry). I think
this distinction is useful mainly for experts and therefore not worth
complicating version numbering for.

> 2. Brett's alternative proposal is that we switch to using the major
> version for language releases and the minor version for stdlib
> releases. We would then release 3.3, 3.4, 3.5 and 3.6 at 6 month
> intervals, with 4.0 then being released in August 2014 as a new
> language version.

The main problem I see with this is that Python 3 was a big
disruptive event for the community, and calling a new version "Python
4" may make people anxious at the prospect of compatibility breakage.
Instead of spending some time advertising that "Python 4" is a safe
upgrade, perhaps we could simply call it "Python 3.X+1"?

(and, as you point out, keep "Python X+1" for when we want to change the
language in incompatible ways again)

> So in August this year, we would release 3.3+12.08, followed by
> 3.3+13.02, 3.3+13.08, 3.3+14.02 at 6 month intervals, and then the
> next language release as 3.4+14.08. If someone refers to just Python
> 3.3, then the "at least stdlib 12.08" is implied. If they refer to
> Python stdlib 12.08, 13.02, 13.08 or 14.02, then it is the dependency
> on "Python 3.3" that is implied.

If I were a casual user of a piece of software, I'd really find such a
numbering scheme complicated and intimidating. I don't think most users
want such a level of information.

Regards

Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Writable doc

2012-01-19 Thread Antoine Pitrou

On Wed, 18 Jan 2012 20:31:38 -0700
Eric Snow  wrote:
> >
> > Should I create a bug report?
> 
> http://bugs.python.org/issue12773  :)

Well done Eric :)



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions

2012-01-19 Thread Nick Coghlan

On Thu, Jan 19, 2012 at 9:07 PM, Antoine Pitrou  wrote:
>> I fear the day that people asking
>> questions on the tutor or python-list mailing lists will have to say (e.g.)
>> "I'm using Python 3.4.1 and standard library 1.2.7" in order to specify the
>> version they're using.
>
> Yeah, that's my biggest problem with Nick's proposal. Hopefully we can
> avoid parallel version schemes.

They're not really parallel - the stdlib version would fully determine
the language version. I'm only proposing two version numbers because
we're planning to start versioning *two* things (the standard library,
updated every 6 months, and the language spec, updated every 18-24
months).

Since the latter matches what we do now, I'm merely proposing that we
leave its versioning alone, and add a *new* identiifier specifically
for the interim stdlib updates.

Thinking about it though, I've realised that the sys.version string
already contains a lot more than just the language version number, so
I think it should just be updated to include the stdlib version
information, and the version_info named tuple could get a new 'stdlib'
field as a string.

That way, sys.version and sys.version_info would still fully define
the Python version, we just wouldn't be mucking with the meaning of
any of the existing fields.

For example, the current:

>>> sys.version
'3.2.2 (default, Sep  5 2011, 21:17:14) \n[GCC 4.6.1]'
>>> sys.version_info
sys.version_info(major=3, minor=2, micro=2, releaselevel='final', serial=0)

might become:

>>> sys.version
'3.3.1 (stdlib 12.08, default, Feb  18 2013, 21:17:14) \n[GCC 4.6.1]'
>>> sys.version_info
sys.version_info(major=3, minor=3, micro=1, releaselevel='final',
serial=0, stdlib='12.08')

for the maintenance release and:

>>> sys.version
'3.3.1 (stdlib 13.02, default, Feb  18 2013, 21:17:14) \n[GCC 4.6.1]'
>>> sys.version_info
sys.version_info(major=3, minor=3, micro=1, releaselevel='final',
serial=0, stdlib='13.02')

for the stdlib-only update.

Explicit-is-better-than-implicit'ly yours,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 407 / splitting the stdlib

2012-01-19 Thread Nick Coghlan

On Thu, Jan 19, 2012 at 9:17 PM, Antoine Pitrou  wrote:
> If I were a casual user of a piece of software, I'd really find such a
> numbering scheme complicated and intimidating. I don't think most users
> want such a level of information.

I think the ideal numbering scheme from a *new* user point of view is
the one Brett suggested (where major=language update, minor=stdlib
update), but (as has been noted) there are solid historical reasons we
can't use that.

While I still have misgivings, I'm starting to come around to the idea
of just allowing the minor release number to increment faster (Barry's
co-authorship of the PEP, suggesting he doesn't see such a scheme
causing any problems for Ubuntu is big factor in that). I'd still like
the core language version to be available programmatically, though,
and I'd like the PEP to consider displaying it as part of sys.version
and using it to allow things like having bytecode compatible versions
share bytecode files in the cache.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 407 / splitting the stdlib

2012-01-19 Thread Barry Warsaw

On Jan 19, 2012, at 12:17 PM, Antoine Pitrou wrote:

>The main problem I see with this is that Python 3 was a big
>disruptive event for the community, and calling a new version "Python
>4" may make people anxious at the prospect of compatibility breakage.

s/was/is/

The Python 3 transition is ongoing, and Guido himself at the time thought it
would take 5 years.  I think we're making excellent progress, but there are
still occasional battles just to convince upstream third party developers that
supporting Python 3 (let alone *switching* to Python 3) is even worth the
effort.  I think we're soon going to be at a tipping point where not
supporting Python 3 will be the minority position.  Even if a hypothetical
Python 4 were completely backward compatible, I shudder at the PR nightmare
that would entail.

I'm not saying there will never be a time for Python 4, but I sure hope it's
far enough in the future that you youngun's will be telling us about it in the
Tim Peters Home for Python Old Farts, where we'll smile blankly, bore you
again with stories of vinyl records, phones with real buttons, and Python
1.6.1 while you feed us our mush under chronologically arranged pictures of
BDFLs Van Rossum, Peterson, and Van Rossum.

-Barry
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Writable doc

2012-01-19 Thread Benjamin Peterson

2012/1/19 Victor Stinner :
>> http://bugs.python.org/issue12773  :)
>
> The bug is marked as close, whereas the bug exists in Python 3.2 and
> has no been closed. The fix must be backported.

It's not a bug; it's a feature.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-checkins] cpython: add str.casefold() (closes #13752)

2012-01-19 Thread Éric Araujo


Thanks for 0b5ce36a7a24 Benjamin.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Hashing proposal: change only string-only dicts

2012-01-19 Thread PJ Eby

On Jan 18, 2012 12:55 PM, Martin v. Löwis  wrote:
>
> Am 18.01.2012 17:01, schrieb PJ Eby:
> > On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. Löwis"  > > wrote:
> >
> > Am 17.01.2012 22:26, schrieb Antoine Pitrou:
> > > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30
bits
> > > could cache a "hash perturbation" computed from the string and the
> > > random bits:
> > >
> > > - hash() would use ob_shash
> > > - dict_lookup() would use ((ob_shash * 103) ^ (ob_sstate &
~3))
> > >
> > > This way, you cache almost all computations, adding only a
computation
> > > and a couple logical ops when looking up a string in a dict.
> >
> > That's a good idea. For Unicode, it might be best to add another
slot
> > into the object, even though this increases the object size.
> >
> >
> > Wouldn't that break the ABI in 2.x?
>
> I was thinking about adding the field at the end, so I thought it
> shouldn't. However, if somebody inherits from PyUnicodeObject, it still
> might - so my new proposal is to add the extra hash into the str block,
> either at str[-1], or after the terminating 0. This would cause an
> average increase of four bytes of the storage (0 bytes in 50% of the
> cases, 8 bytes because of padding in the other 50%).
>
> What do you think?

So far it sounds like the very best solution of all, as far as backward
compatibility is concerned.  If the extra bits are only used when two
strings have a matching hash value, the only doctests that could be
affected are ones testing for this issue.  ;-)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Writable doc

2012-01-19 Thread Ethan Furman


Benjamin Peterson wrote:

2012/1/19 Victor Stinner :

http://bugs.python.org/issue12773  :)

The bug is marked as close, whereas the bug exists in Python 3.2 and
has no been closed. The fix must be backported.


It's not a bug; it's a feature.


Where does one draw the line between feature and bug?  As a user I'm 
inclined to classify this as a bug:  __doc__ was writable with old-style 
classes; __doc__ is writable with new-style classes with any metaclass; 
and there exists no good reason (that I'm aware of ;) for __doc__ to not 
be writable.


~Ethan~
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Writable doc

2012-01-19 Thread Guido van Rossum

On Thu, Jan 19, 2012 at 8:36 AM, Ethan Furman  wrote:

> Benjamin Peterson wrote:
>
>> 2012/1/19 Victor Stinner **:
>>
>>> http://bugs.python.org/**issue12773  :)

>>> The bug is marked as close, whereas the bug exists in Python 3.2 and
>>> has no been closed. The fix must be backported.
>>>
>>
>> It's not a bug; it's a feature.
>>
>
> Where does one draw the line between feature and bug?  As a user I'm
> inclined to classify this as a bug:  __doc__ was writable with old-style
> classes; __doc__ is writable with new-style classes with any metaclass; and
> there exists no good reason (that I'm aware of ;) for __doc__ to not be
> writable.

Like it or not, this has worked this way ever since new-style classes were
introduced. That has made it a de-facto feature. We should not encourage
people to write code that works with a certain bugfix release but not with
the previous bugfix release of the same feature release.

Given that we haven't had any complaints about this in nearly a decade, the
backport can't be important. Don't do it.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 407 / splitting the stdlib

2012-01-19 Thread Bill Janssen

Nick Coghlan  wrote:

> On Thu, Jan 19, 2012 at 10:19 AM, Steven D'Aprano  wrote:
> > Brett Cannon wrote:
> > Do we have any evidence of this alleged bitrot? I spend a lot of time on the
> > comp.lang.python newsgroup and I see no evidence that people using Python
> > believe the standard library is rotting from lack of attention.
> 
> IMO, it's a problem mainly with network (especially web) protocols and
> file formats. It can take the stdlib a long time to catch up with
> external developments due to the long release cycle, so people are
> often forced to switch to third party libraries that better track the
> latest versions of relevant standards (de facto or otherwise).

I'm not sure how much of a problem this really is.  I continually build
fairly complicated systems with Python that do a lot of HTTP networking,
for instance.  It's fairly easy to replace use of the standard library
modules with use of Tornado and httplib2, and I wouldn't think of *not*
doing that.  But the standard modules are there, out-of-the-box, for
experimentation and tinkering, and they work in the sense that they pass
their module tests.  Are those standard modules as "Internet-proof" as
some commercially-supported package with an income stream that supports
frequent security updates would be?

Perhaps not.  But maybe that's OK.

Another way of doing this would be to "bless" certain third-party
modules in some fashion short of incorporation, and provide them with
more robust development support, again, "somehow", so that they don't
fall by the wayside when their developers move on to something else,
but are still able to release on an independent schedule.

Bill
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Hashing proposal: change only string-only dicts

2012-01-19 Thread Gregory P. Smith

On Wed, Jan 18, 2012 at 9:55 AM, "Martin v. Löwis" wrote:

> Am 18.01.2012 17:01, schrieb PJ Eby:
> > On Tue, Jan 17, 2012 at 7:58 PM, "Martin v. Löwis"  > > wrote:
> >
> > Am 17.01.2012 22:26, schrieb Antoine Pitrou:
> > > Only 2 bits are used in ob_sstate, meaning 30 are left. These 30
> bits
> > > could cache a "hash perturbation" computed from the string and the
> > > random bits:
> > >
> > > - hash() would use ob_shash
> > > - dict_lookup() would use ((ob_shash * 103) ^ (ob_sstate & ~3))
> > >
> > > This way, you cache almost all computations, adding only a
> computation
> > > and a couple logical ops when looking up a string in a dict.
> >
> > That's a good idea. For Unicode, it might be best to add another slot
> > into the object, even though this increases the object size.
> >
> > Wouldn't that break the ABI in 2.x?
>
> I was thinking about adding the field at the end, so I thought it
> shouldn't. However, if somebody inherits from PyUnicodeObject, it still
> might - so my new proposal is to add the extra hash into the str block,
> either at str[-1], or after the terminating 0. This would cause an
> average increase of four bytes of the storage (0 bytes in 50% of the
> cases, 8 bytes because of padding in the other 50%).
>
> What do you think?
>

str[-1] is not likely to work if you want to maintain ABI compatibility.
 Appending it to the data after the terminating \0 is more likely to be
possible, but if there is any possibility that existing compiled extension
modules have somehow inlined code to do allocation of the str field even
that is questionable (i don't think there are?).

I'd also be concerned about C API code that uses PyUnicode_Resize(). How do
you keep track of if you have filled in these extra bytes at the end in or
not?  allocation and resize fill it with a magic value indicating "not
filled in" similar to a tp_hash of -1?

Regardless of all of this, I don't think this fully addresses the overall
issue as strings within other hashable data structures like tuples would
not be treated this way, only strings directly stored in a dict.  Sure you
can continue on and "fix" tuples and such in a similar manner but then what
about user defined classes that implement __hash__ based on the return
value of hash() on some strings they contain?

I don't see anything I'd consider a real complete fix unless we also
backport the randomized hash code so that people who need a guaranteed fix
can enable it and use it.

-gps
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 407 / splitting the stdlib

2012-01-19 Thread Eric Snow

On Jan 19, 2012 9:28 AM, "Bill Janssen"  wrote:
> I'm not sure how much of a problem this really is.  I continually build
> fairly complicated systems with Python that do a lot of HTTP networking,
> for instance.  It's fairly easy to replace use of the standard library
> modules with use of Tornado and httplib2, and I wouldn't think of *not*
> doing that.  But the standard modules are there, out-of-the-box, for
> experimentation and tinkering, and they work in the sense that they pass
> their module tests.  Are those standard modules as "Internet-proof" as
> some commercially-supported package with an income stream that supports
> frequent security updates would be?

This is starting to sound a little like the discussion about the
__preview__ / __experimental__ idea.  If I recall correctly, one of the
points is that for some organizations getting a third-party library
approved for use is not trivial.  In contrast, inclusion in the stdlib is
like a free pass, since the organization can rely on the robustness of the
CPython QA and release processes.

As well, there is at least a small cost with third-party libraries for
those that maintain more rigorous configuration management.  In contrast,
there is basically no extra cost with new/updated stdlib, beyond upgrading
Python.

-eric

>
> Perhaps not.  But maybe that's OK.
>
> Another way of doing this would be to "bless" certain third-party
> modules in some fashion short of incorporation, and provide them with
> more robust development support, again, "somehow", so that they don't
> fall by the wayside when their developers move on to something else,
> but are still able to release on an independent schedule.
>
> Bill
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ericsnowcurrently%40gmail.com
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Writable doc

2012-01-19 Thread Stephen J. Turnbull

Ethan Furman writes:

 > Where does one draw the line between feature and bug?

Bug:  Doesn't work as documented.
Feature:  Works as expected but not documented[1] to do so.
Miracle:  Works as documented.[2]

Unspecified behavior that doesn't work as you expect is the unmarked
case (ie, none of the above).

The Devil's Dictionary defines feature somewhat differently:

Feature: Name for any behavior you don't feel like justifying to a user.

Footnotes: 
[1]  Including cases where the patch contains documentation but hasn't
been committed to trunk yet.

[2]  Python is pretty miraculous, isn't it?



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Writable doc

2012-01-19 Thread Ethan Furman

Guido van Rossum wrote:
> We should not encourage people to write code that works with a certain
> bugfix release but not with the previous bugfix release of the same
> feature release.

Then what's the point of a bug-fix release?  If 3.2.1 had broken 
threading, wouldn't we fix it in 3.2.2 and encourage folks to switch to 
3.2.2?  Or would we scrap 3.2 and move immediately to 3.3?  (Is that 
more or less what happened with 3.0?)

Like it or not, this has worked this way ever since new-style classes 
were introduced. That has made it a de-facto feature.

But what of the discrepancy between the 'type' metaclass and any other 
Python metaclass?

Given that we haven't had any complaints about this in nearly a decade, 
the backport can't be important. Don't do it.

Agreed.

~Ethan~
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Writable doc

2012-01-19 Thread Michael Foord

On 19/01/2012 17:46, Ethan Furman wrote:

Guido van Rossum wrote:
> We should not encourage people to write code that works with a certain
> bugfix release but not with the previous bugfix release of the same
> feature release.

Then what's the point of a bug-fix release?  If 3.2.1 had broken 
threading, wouldn't we fix it in 3.2.2 and encourage folks to switch 
to 3.2.2?  Or would we scrap 3.2 and move immediately to 3.3?  (Is 
that more or less what happened with 3.0?)

Like it or not, this has worked this way ever since new-style classes 
were introduced. That has made it a de-facto feature.

But what of the discrepancy between the 'type' metaclass and any other 
Python metaclass?

There are many discrepancies between built-in types and any Python 
class. Writable attributes are (generally) one of them.

Michael

Given that we haven't had any complaints about this in nearly a 
decade, the backport can't be important. Don't do it.

Agreed.

~Ethan~
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk

--
http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions

2012-01-19 Thread Georg Brandl

Am 19.01.2012 01:12, schrieb Steven D'Aprano:

> One on-going complaint is that Python-Dev doesn't have the manpower or time 
> to 
> do everything that needs to be done. Bugs languish for months or years 
> because 
> nobody has the time to look at it. Will going to a more rapid release cycle 
> give people more time, or just increase their workload? You're hoping that a 
> more rapid release cycle will attract more developers, and there is a chance 
> that you could be right; but a more rapid release cycle WILL increase the 
> total work load. So you're betting that this change will attract enough new 
> developers that the work load per person will decrease even as the total work 
> load increases. I don't think that's a safe bet.

I can't help noticing that so far, worries about the workload came mostly from
people who don't actually bear that load (this is no accusation!), while those
that do are the proponents of the PEP...

That is, I don't want to exclude you from the discussion, but on the issue of
workload I would like to encourage more of our (past and present) release
managers and active bug triagers to weigh in.

cheers,
Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-checkins] cpython (2.7): Issue #13605: add documentation for nargs=argparse.REMAINDER

2012-01-19 Thread Nadeem Vawda

On Thu, Jan 19, 2012 at 11:03 PM, sandro.tosi
 wrote:
> +  are gathered into a lits. This is commonly useful for command line

s/lits/list ?
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-checkins] cpython (2.7): Issue #13605: add documentation for nargs=argparse.REMAINDER

2012-01-19 Thread Sandro Tosi

On Thu, Jan 19, 2012 at 22:09, Nadeem Vawda  wrote:
> On Thu, Jan 19, 2012 at 11:03 PM, sandro.tosi
>  wrote:
>> +  are gathered into a lits. This is commonly useful for command line
>
> s/lits/list ?

crap! I committed an older version of the patch... thanks for spotting
it, i'll fix it right away


-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Writable doc

2012-01-19 Thread Guido van Rossum

On Thu, Jan 19, 2012 at 9:46 AM, Ethan Furman  wrote:

> Guido van Rossum wrote:
> > We should not encourage people to write code that works with a certain
> > bugfix release but not with the previous bugfix release of the same
> > feature release.
>
> Then what's the point of a bug-fix release?  If 3.2.1 had broken
> threading, wouldn't we fix it in 3.2.2 and encourage folks to switch to
> 3.2.2?  Or would we scrap 3.2 and move immediately to 3.3?  (Is that more
> or less what happened with 3.0?)


Usually the bugs fixed in bugfix releases are things that usually go well
but don't work under certain circumstances.

But I'd also be happy to just declare that assignable __doc__ is a feature
without explaining why.

 Like it or not, this has worked this way ever since new-style classes were
> introduced. That has made it a de-facto feature.
>

But what of the discrepancy between the 'type' metaclass and any other
> Python metaclass?


Michael Foord explained that.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-checkins] cpython (2.7): Issue #13605: add documentation for nargs=argparse.REMAINDER

2012-01-19 Thread Sandro Tosi

On Thu, Jan 19, 2012 at 22:07, Terry Reedy  wrote:
> typo
...
> lits .> list

yep, i've already fixed it committing a more useful example too

-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Writable doc

2012-01-19 Thread Terry Reedy


On 1/19/2012 1:04 PM, Stephen J. Turnbull wrote:

Ethan Furman writes:

  >  Where does one draw the line between feature and bug?

Bug:  Doesn't work as documented.


The basic idea is that the x.y docs define (mostly) the x.y language. 
Patches to the x.y docs fix typos, omissions, ambiguities, and the 
occasional error. The x.y.z cpython releases are increasingly better 
implementations of Python x.y.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Writable doc

2012-01-19 Thread Ethan Furman


Stephen J. Turnbull wrote:

Ethan Furman writes:


Where does one draw the line between feature and bug?


Miracle:  Works as documented.[2]


[2]  Python is pretty miraculous, isn't it?


Yes, indeed it is!  :)

~Ethan~
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions

2012-01-19 Thread Martin v. Löwis

> I can't help noticing that so far, worries about the workload came mostly from
> people who don't actually bear that load (this is no accusation!), while those
> that do are the proponents of the PEP...

Ok, so let me add then that I'm worried about the additional work-load.

I'm particularly worried about the coordination of vacation across the
three people that work on a release. It might well not be possible to
make any release for a period of two months, which, in a six-months
release cycle with two alphas and a beta, might mean that we (the
release people) would need to adjust our vacation plans with the release
schedule, or else step down (unless you would release the "normal"
feature releases as source-only releases).

FWIW, it might well be that I can't be available for the 3.3 final
release (I haven't finalized my vacation schedule yet for August).

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] python build failed on mac

2012-01-19 Thread Vijay N. Majagaonkar

Hi all,

I am trying to build python 3 on mac and build failing with following error
can somebody help me with this



$ hg clone  http://hg.python.org/cpython

$ ./configure
$ make

gcc   -framework CoreFoundation -o python.exe Modules/python.o
libpython3.3m.a -ldl  -framework CoreFoundation
./python.exe -SE -m sysconfig --generate-posix-vars
Could not find platform dependent libraries 
Consider setting $PYTHONHOME to [:]
python.exe(43296) malloc: *** mmap(size=7310873954244194304) failed (error
code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
make: *** [Lib/_sysconfigdata.py] Segmentation fault: 11
make: *** Deleting file `Lib/_sysconfigdata.py'




;)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Counting collisions for the win

2012-01-19 Thread Victor Stinner

Hi,

I'm working on the hash collision issue since 2 or 3 weeks. I
evaluated all solutions and I think that I have now a good knowledge
of the problem and how it should be solved. The major issue is to have
a minor or no impact on applications (don't break backward
compatibility). I saw three major solutions:

 - use a randomized hash
 - use two hashes, a randomized hash and the actual hash kept for
backward compatibility
 - count collisions on dictionary lookup

Using a randomized hash does break a lot of tests (e.g. tests relying
on the representation of a dictionary). The patch is huge, too big to
backport it directly on stable versions. Using a randomized hash may
also break (indirectly) real applications because the application
output is also somehow "randomized". For example, in the Django test
suite, the HTML output is different at each run. Web browsers may
render the web page differently, or crash, or ... I don't think that
Django would like to sort attributes of each HTML tag, just because we
wanted to fix a vulnerability.

Randomized hash has also a major issue: if the attacker is able to
compute the secret, (s)he can easily compute collisions and exploit
the hash collision vulnerability again. I don't know exactly how
complex it is to compute the secret, but our hash function is weak (it
is far from being cryptographic, it is really simple to run it
backward). If someone writes a fast function to compute the secret, we
will go back to the same point.

IMO using two hashes has the same disavantages of the randomized hash
solution, whereas it is more complex to implement.

The last solution is very simple: count collision and raise an
exception if it hits a limit. The path is something like 10 lines
whereas the randomized hash is more close to 500 lines, add a new
file, change Visual Studio project file, etc. First I thaught that it
would break more applications than the randomized hash, but I tried on
Django: the test suite fails with a limit of 20 collisions, but not
with a limit of 50 collisions, whereas the patch uses a limit of 1000
collisions. According to my basic tests, a limit of 35 collisions
requires a dictionary with more than 10,000,000 integer keys to raise
an error. I am not talking about the attack, but valid data.

More details about my tests on the Django test suite:
http://bugs.python.org/issue13703#msg151620

--

I propose to solve the hash collision vulnerability by counting
collisions because it does fix the vulnerability with a minor or no
impact on applications or backward compatibility. I don't see why we
should use a different fix for Python 3.3. If counting collisons
solves the issue for stable versions, it is also enough for Python
3.3. We now know all issues of the randomized hash solution, and I
think that there are more drawbacks than advantages. IMO the
randomized hash is overkill to fix the hash collision issue.

I just have some requests on Marc Andre Lemburg patch:

 - the limit should be configurable: a new function in the sys module
should be enough. It may be private (or replaced by an environment
variable?) in stable versions
 - the set type should also be patched (I didn't check if it is
vulnerable or not using the patch)
 - the patch has no test! (a class with a fixed hash should be enough
to write a test)
 - the limit must be documented somwhere
 - the exception type should be different than KeyError

Victor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Coroutines and PEP 380

2012-01-19 Thread Greg


Glyph wrote:
[Guido] mentions the point that coroutines that can implicitly switch out from 
under you have the same non-deterministic property as threads: you don't 
know where you're going to need a lock or lock-like construct to update 
any variables, so you need to think about concurrency more deeply than 
if you could explicitly always see a 'yield'.


I'm not convinced that being able to see 'yield's will help
all that much. In any system that makes substantial use of
generator-based coroutines, you're going to see 'yield from's
all over the place, from the lowest to the highest levels.
But that doesn't mean you need a correspondingly large
number of locks. You can't look at a 'yield' and conclude
that you need a lock there or tell what needs to be locked.

There's no substitute for deep thought where any kind of
theading is involved, IMO.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-19 Thread Guido van Rossum

On Thu, Jan 19, 2012 at 4:48 PM, Victor Stinner <
[email protected]> wrote:

> Hi,
>
> I'm working on the hash collision issue since 2 or 3 weeks. I
> evaluated all solutions and I think that I have now a good knowledge
> of the problem and how it should be solved. The major issue is to have
> a minor or no impact on applications (don't break backward
> compatibility). I saw three major solutions:
>
>  - use a randomized hash
>  - use two hashes, a randomized hash and the actual hash kept for
> backward compatibility
>  - count collisions on dictionary lookup
>
> Using a randomized hash does break a lot of tests (e.g. tests relying
> on the representation of a dictionary). The patch is huge, too big to
> backport it directly on stable versions. Using a randomized hash may
> also break (indirectly) real applications because the application
> output is also somehow "randomized". For example, in the Django test
> suite, the HTML output is different at each run. Web browsers may
> render the web page differently, or crash, or ... I don't think that
> Django would like to sort attributes of each HTML tag, just because we
> wanted to fix a vulnerability.
>
> Randomized hash has also a major issue: if the attacker is able to
> compute the secret, (s)he can easily compute collisions and exploit
> the hash collision vulnerability again. I don't know exactly how
> complex it is to compute the secret, but our hash function is weak (it
> is far from being cryptographic, it is really simple to run it
> backward). If someone writes a fast function to compute the secret, we
> will go back to the same point.
>
> IMO using two hashes has the same disavantages of the randomized hash
> solution, whereas it is more complex to implement.
>
> The last solution is very simple: count collision and raise an
> exception if it hits a limit. The path is something like 10 lines
> whereas the randomized hash is more close to 500 lines, add a new
> file, change Visual Studio project file, etc. First I thaught that it
> would break more applications than the randomized hash, but I tried on
> Django: the test suite fails with a limit of 20 collisions, but not
> with a limit of 50 collisions, whereas the patch uses a limit of 1000
> collisions. According to my basic tests, a limit of 35 collisions
> requires a dictionary with more than 10,000,000 integer keys to raise
> an error. I am not talking about the attack, but valid data.
>
> More details about my tests on the Django test suite:
> http://bugs.python.org/issue13703#msg151620
>
> --
>
> I propose to solve the hash collision vulnerability by counting
> collisions because it does fix the vulnerability with a minor or no
> impact on applications or backward compatibility. I don't see why we
> should use a different fix for Python 3.3. If counting collisons
> solves the issue for stable versions, it is also enough for Python
> 3.3. We now know all issues of the randomized hash solution, and I
> think that there are more drawbacks than advantages. IMO the
> randomized hash is overkill to fix the hash collision issue.
>

+1


> I just have some requests on Marc Andre Lemburg patch:
>
>  - the limit should be configurable: a new function in the sys module
> should be enough. It may be private (or replaced by an environment
> variable?) in stable versions
>  - the set type should also be patched (I didn't check if it is
> vulnerable or not using the patch)
>  - the patch has no test! (a class with a fixed hash should be enough
> to write a test)
>  - the limit must be documented somwhere
>  - the exception type should be different than KeyError
>
> Victor
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions

2012-01-19 Thread Nick Coghlan

On Fri, Jan 20, 2012 at 9:54 AM, "Martin v. Löwis"  wrote:
>> I can't help noticing that so far, worries about the workload came mostly 
>> from
>> people who don't actually bear that load (this is no accusation!), while 
>> those
>> that do are the proponents of the PEP...
>
> Ok, so let me add then that I'm worried about the additional work-load.
>
> I'm particularly worried about the coordination of vacation across the
> three people that work on a release. It might well not be possible to
> make any release for a period of two months, which, in a six-months
> release cycle with two alphas and a beta, might mean that we (the
> release people) would need to adjust our vacation plans with the release
> schedule, or else step down (unless you would release the "normal"
> feature releases as source-only releases).

I must admit that aspect had concerned me as well. Currently we use
the 18-24 month window for releases to slide things around to
accommodate the schedules of the RM, Martin (Windows binaries) and
Ned/Ronald (Mac OS X binaries).

Before we could realistically switch to more frequent releases,
something would need to change on the binary release side.

Regards,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Coroutines and PEP 380

2012-01-19 Thread Matt Joiner

On Fri, Jan 20, 2012 at 8:41 AM, Greg  wrote:
> Glyph wrote:
>>
>> [Guido] mentions the point that coroutines that can implicitly switch out
>> from under you have the same non-deterministic property as threads: you
>> don't know where you're going to need a lock or lock-like construct to
>> update any variables, so you need to think about concurrency more deeply
>> than if you could explicitly always see a 'yield'.
>
>
> I'm not convinced that being able to see 'yield's will help
> all that much. In any system that makes substantial use of
> generator-based coroutines, you're going to see 'yield from's
> all over the place, from the lowest to the highest levels.
> But that doesn't mean you need a correspondingly large
> number of locks. You can't look at a 'yield' and conclude
> that you need a lock there or tell what needs to be locked.
>
> There's no substitute for deep thought where any kind of
> theading is involved, IMO.
>
> --
> Greg
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com

I wasn't aware that Guido had brought this up, and I believe what he
says to be true. Preemptive coroutines, are just a hack around the
GIL, and reduce OS overheads. It's the explicit nature of the enhanced
generators that is their greatest value.

FWIW, I wrote a Python 3 compatible equivalent to gevent (also
greenlet based, and also very similar to Brett's et al coroutine
proposal), which didn't really solve the concurrency problems I hoped.
There were no guarantees whether functions would "switch out", so all
the locking and threading issues simply reemerged, albeit with also
needing to have all calls non-blocking, losing compatibility with any
routine that didn't make use of nonblocking calls and/or expose it's
"yield" in the correct way, but reducing GIL contention. Overall not
worth it.

In short, implicit coroutines are just a GIL work around, that break
compatibility for little gain.

Thanks Glyph for those links.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-19 Thread Ivan Kozik

On Fri, Jan 20, 2012 at 00:48, Victor Stinner
 wrote:
> I propose to solve the hash collision vulnerability by counting
> collisions because it does fix the vulnerability with a minor or no
> impact on applications or backward compatibility. I don't see why we
> should use a different fix for Python 3.3. If counting collisons
> solves the issue for stable versions, it is also enough for Python
> 3.3. We now know all issues of the randomized hash solution, and I
> think that there are more drawbacks than advantages. IMO the
> randomized hash is overkill to fix the hash collision issue.

I'd like to point out that an attacker is not limited to sending just
one dict full of colliding keys.  Given a 22ms stall for a dict full
of 1000 colliding keys, and 100 such objects inside a parent object
(perhaps JSON), you can stall a server for 2.2+ seconds.  Going with
the raise-at-1000 approach doesn't solve the problem for everyone.

In addition, because the raise-at-N-collisions approach raises an
exception, everyone who wants to handle this error condition properly
has to change their code to catch a previously-unexpected exception.
(I know they're usually still better off with the fix, but why force
many people to change code when you can actually fix the hashing
problem?)

Another issue is that even with a configurable limit, different
modules can't have their own limits.  One module might want a
relatively safe raise-at-100, and another module creating massive
dicts might want raise-at-1000.  How does a developer know whether
they can raise or lower the limit, given that they use a bunch of
different modules?

I actually went with this stop-at-N-collisions approach by patching my
CPython a few years ago, where I limiting dictobject and setobject's
critical `for` loop to 100 iterations (I realize this might handle
fewer than 100 collisions.)  This worked fine until I tried to compile
PyPy, where the translator blew up due to a massive dict.  This,
combined with the second problem (needing to catch an exception), led
me to abandon this approach and write Securetypes, which has a
securedict that uses SHA-1.  Not that I like this either; I think I'm
happy with the randomize-hash() approach.

Ivan
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-19 Thread Guido van Rossum

On Thu, Jan 19, 2012 at 7:32 PM, Ivan Kozik  wrote:

> On Fri, Jan 20, 2012 at 00:48, Victor Stinner
>  wrote:
> > I propose to solve the hash collision vulnerability by counting
> > collisions because it does fix the vulnerability with a minor or no
> > impact on applications or backward compatibility. I don't see why we
> > should use a different fix for Python 3.3. If counting collisons
> > solves the issue for stable versions, it is also enough for Python
> > 3.3. We now know all issues of the randomized hash solution, and I
> > think that there are more drawbacks than advantages. IMO the
> > randomized hash is overkill to fix the hash collision issue.
>
> I'd like to point out that an attacker is not limited to sending just
> one dict full of colliding keys.  Given a 22ms stall for a dict full
> of 1000 colliding keys, and 100 such objects inside a parent object
> (perhaps JSON), you can stall a server for 2.2+ seconds.  Going with
> the raise-at-1000 approach doesn't solve the problem for everyone.
>

It's "just" a DoS attack. Those won't go away. We just need to raise the
effort needed for the attacker. The original attack would cause something
like 5 minutes of CPU usage per request (with a set of colliding keys that
could be computed once and used to attack every Python-run website in the
world). That's at least 2 orders of magnitude worse.

In addition, because the raise-at-N-collisions approach raises an
> exception, everyone who wants to handle this error condition properly
> has to change their code to catch a previously-unexpected exception.
> (I know they're usually still better off with the fix, but why force
> many people to change code when you can actually fix the hashing
> problem?)
>

Why would anybody need to change their code? Every web framework worth its
salt has a top-level error catcher that logs the error, serves a 500
response, and possibly does other things like email the admin.


> Another issue is that even with a configurable limit, different
> modules can't have their own limits.  One module might want a
> relatively safe raise-at-100, and another module creating massive
> dicts might want raise-at-1000.  How does a developer know whether
> they can raise or lower the limit, given that they use a bunch of
> different modules?
>

I don't think it needs to be configurable. There just needs to be a way to
turn it off.


> I actually went with this stop-at-N-collisions approach by patching my
> CPython a few years ago, where I limiting dictobject and setobject's
> critical `for` loop to 100 iterations (I realize this might handle
> fewer than 100 collisions.)  This worked fine until I tried to compile
> PyPy, where the translator blew up due to a massive dict.


I think that's because your collision-counting algorithm was much more
primitive than MAL's.


> This,
> combined with the second problem (needing to catch an exception), led
> me to abandon this approach and write Securetypes, which has a
> securedict that uses SHA-1.  Not that I like this either; I think I'm
> happy with the randomize-hash() approach.
>

Why did you need to catch the exception? Were you not happy with the
program simply terminating with a traceback when it got attacked?

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 407: New release cycle and introducing long-term support versions

2012-01-19 Thread Brian Curtin

On Thu, Jan 19, 2012 at 17:54, "Martin v. Löwis"  wrote:
> Ok, so let me add then that I'm worried about the additional work-load.
>
> I'm particularly worried about the coordination of vacation across the
> three people that work on a release. It might well not be possible to
> make any release for a period of two months, which, in a six-months
> release cycle with two alphas and a beta, might mean that we (the
> release people) would need to adjust our vacation plans with the release
> schedule, or else step down (unless you would release the "normal"
> feature releases as source-only releases).
>
> FWIW, it might well be that I can't be available for the 3.3 final
> release (I haven't finalized my vacation schedule yet for August).

In the interest of not having Windows releases depend on one person,
and having gone through building the installer myself (which I know is
but one of the duties), I'm available to help should you need it.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-19 Thread Steven D'Aprano


Victor Stinner wrote:


The last solution is very simple: count collision and raise an
exception if it hits a limit. ...
According to my basic tests, a limit of 35 collisions
requires a dictionary with more than 10,000,000 integer keys to raise
an error. I am not talking about the attack, but valid data.


You might think that 10 million keys is a lot of data, but that's only about 
100 MB worth. I already see hardware vendors advertising computers with 6 GB 
RAM as "entry level", e.g. the HP Pavilion starts with 6GB expandable to 16GB. 
I expect that there are already people using Python who will unpredictably hit 
that limit by accident, and the number will only grow as computers get more 
memory.


With a limit of 35 collisions, it only takes 35 keys to to force a dict to 
raise an exception, if you are an attacker able to select colliding keys. 
We're trying to defend against an attacker who is able to force collisions, 
not one who is waiting for accidental collisions. I don't see that causing the 
dict to raise an exception helps matters: it just changes the attack from 
"keep the dict busy indefinitely" to "cause an exception and crash the 
application".


This moves responsibility from dealing with collisions out of the dict to the 
application code. Instead of solving the problem in one place (the built-in 
dict) now every application that uses dicts has to identify which dicts can be 
attacked, and deal with the exception.


That pushes the responsibility for security onto people who are the least 
willing or able to deal with it: the average developer, who neither 
understands nor cares about security, or if they do care, they can't convince 
their manager to care.


I suppose an exception is an improvement over the application hanging 
indefinitely, but I'd hardly call it a fix.


Ruby uses randomized hashes. Are there any other languages with a dict or 
mapping class that raises on too many exceptions?



--
Steven
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-19 Thread Ivan Kozik

On Fri, Jan 20, 2012 at 03:48, Guido van Rossum  wrote:
> I think that's because your collision-counting algorithm was much more
> primitive than MAL's.

Conceded.

>> This,
>> combined with the second problem (needing to catch an exception), led
>> me to abandon this approach and write Securetypes, which has a
>> securedict that uses SHA-1.  Not that I like this either; I think I'm
>> happy with the randomize-hash() approach.
>
>
> Why did you need to catch the exception? Were you not happy with the program
> simply terminating with a traceback when it got attacked?

No, I wasn't happy with termination.  I wanted to treat it just like a
JSON decoding error, and send the appropriate response.

I actually forgot to mention the main reason I abandoned the
stop-at-N-collisions approach.  I had a server with a dict that stayed
in memory, across many requests.  It was being populated with
identifiers chosen by clients.  I couldn't have my server stay broken
if this dict filled up with a bunch of colliding keys.  (I don't think
I could have done another thing either, like nuke the dict or evict
some keys.)

Ivan
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-19 Thread Carl Meyer

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Victor,

On 01/19/2012 05:48 PM, Victor Stinner wrote:
[snip]
> Using a randomized hash may
> also break (indirectly) real applications because the application
> output is also somehow "randomized". For example, in the Django test
> suite, the HTML output is different at each run. Web browsers may
> render the web page differently, or crash, or ... I don't think that
> Django would like to sort attributes of each HTML tag, just because we
> wanted to fix a vulnerability.

I'm a Django core developer, and if it is true that our test-suite has a
dictionary-ordering dependency that is expressed via HTML attribute
ordering, I consider that a bug and would like to fix it. I'd be
grateful for, not resentful of, a change in CPython that revealed the
bug and prompted us to fix it. (I presume that it is true, as it sounds
like you experienced it directly; I don't have time to play around at
the moment, but I'm surprised we haven't seen bug reports about it from
users of 64-bit Pythons long ago). I can't speak for the core team, but
I doubt there would be much disagreement on this point: ideally Django
would run equally well on any implementation of Python, and as far as I
know none of the alternative implementations guarantee hash or
dict-ordering compatibility with CPython.

I don't have the expertise to speak otherwise to the alternatives for
fixing the collisions vulnerability, but I don't believe it's accurate
to presume that Django would not want to fix a dict-ordering dependency,
and use that as a justification for one approach over another.

Carl
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8Y83oACgkQ8W4rlRKtE2cNawCg5q/p1+OOKFYDymDJGoClBBlg
WNAAn3xevD+0CqAQ+mFNHCBhtLgw8IYv
=HDOh
-END PGP SIGNATURE-
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-19 Thread Nick Coghlan

On Fri, Jan 20, 2012 at 2:00 PM, Steven D'Aprano  wrote:
> With a limit of 35 collisions, it only takes 35 keys to to force a dict to
> raise an exception, if you are an attacker able to select colliding keys.
> We're trying to defend against an attacker who is able to force collisions,
> not one who is waiting for accidental collisions. I don't see that causing
> the dict to raise an exception helps matters: it just changes the attack
> from "keep the dict busy indefinitely" to "cause an exception and crash the
> application".

No, that's fundamentally misunderstanding the nature of the attack.
The reason the hash collision attack is a problem is because it allows
you to DoS a web service in a way that requires minimal client side
resources but can have a massive effect on the server. The attacker is
making a single request that takes the server an inordinately long
time to process, consuming CPU resources all the while, and likely
preventing the handling of any other requests (especially for an
event-based server, since the attack is CPU based, bypassing all use
of asynchronous IO).

With the 1000 collision limit in place, the attacker sends their
massive request, the affected dict quickly hits the limit, throws an
unhandled exception which is then caught by the web framework and
turned into a 500 Error response (or whatever's appropriate for the
protocol being attacked).

If a given web service doesn't *already* have a catch all handler to
keep an unexpected exception from bringing the entire service down,
then DoS attacks like this one are the least of its worries.

As for why other languages haven't gone this way, I have no idea.
There are lots of details relating to a language's hash and hash map
design that will drive how suitable randomisation is as an answer, and
it also depends greatly on how you decide to characterise the threat.

FWIW, Victor's analysis in the opening post of this thread matches the
conclusions I came to a few days ago, although he's been over the
alternatives far more thoroughly than I have.

Regards,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-19 Thread Nick Coghlan

On Fri, Jan 20, 2012 at 2:54 PM, Carl Meyer  wrote:
> I don't have the expertise to speak otherwise to the alternatives for
> fixing the collisions vulnerability, but I don't believe it's accurate
> to presume that Django would not want to fix a dict-ordering dependency,
> and use that as a justification for one approach over another.

It's more a matter of wanting deployment of a security fix to be as
painless as possible - a security fix that system administrators can't
deploy because it breaks critical applications may as well not exist.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-19 Thread Glenn Linderman


On 1/19/2012 8:54 PM, Carl Meyer wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Victor,

On 01/19/2012 05:48 PM, Victor Stinner wrote:
[snip]

Using a randomized hash may
also break (indirectly) real applications because the application
output is also somehow "randomized". For example, in the Django test
suite, the HTML output is different at each run. Web browsers may
render the web page differently, or crash, or ... I don't think that
Django would like to sort attributes of each HTML tag, just because we
wanted to fix a vulnerability.

I'm a Django core developer, and if it is true that our test-suite has a
dictionary-ordering dependency that is expressed via HTML attribute
ordering, I consider that a bug and would like to fix it. I'd be
grateful for, not resentful of, a change in CPython that revealed the
bug and prompted us to fix it. (I presume that it is true, as it sounds
like you experienced it directly; I don't have time to play around at
the moment, but I'm surprised we haven't seen bug reports about it from
users of 64-bit Pythons long ago). I can't speak for the core team, but
I doubt there would be much disagreement on this point: ideally Django
would run equally well on any implementation of Python, and as far as I
know none of the alternative implementations guarantee hash or
dict-ordering compatibility with CPython.

I don't have the expertise to speak otherwise to the alternatives for
fixing the collisions vulnerability, but I don't believe it's accurate
to presume that Django would not want to fix a dict-ordering dependency,
and use that as a justification for one approach over another.

Carl


It might be a good idea to have a way to seed the hash with some value 
to allow testing with different dict orderings -- this would allow tests 
to be developed using one Python implementation that would be immune to 
the different orderings on different implementations; however, 
randomizing the hash not only doesn't solve the problem for long-running 
applications, it causes non-deterministic performance from one run to 
the next even with the exact same data: a different (random) seed could 
cause collisions sporadically with data that usually gave good 
performance results, and there would be little explanation for it, and 
little way to reproduce the problem to report it or understand it.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Coroutines and PEP 380

2012-01-19 Thread Glyph

On Jan 19, 2012, at 4:41 PM, Greg wrote:

> Glyph wrote:
>> [Guido] mentions the point that coroutines that can implicitly switch out 
>> from under you have the same non-deterministic property as threads: you 
>> don't know where you're going to need a lock or lock-like construct to 
>> update any variables, so you need to think about concurrency more deeply 
>> than if you could explicitly always see a 'yield'.
> 
> I'm not convinced that being able to see 'yield's will help
> all that much.

Well, apparently we disagree, and I work on such a system all day, every day 
:-).  It was nice to see that Matt Joiner also agreed for very similar reasons, 
and at least I know I'm not crazy.

> In any system that makes substantial use of
> generator-based coroutines, you're going to see 'yield from's
> all over the place, from the lowest to the highest levels.
> But that doesn't mean you need a correspondingly large
> number of locks. You can't look at a 'yield' and conclude
> that you need a lock there or tell what needs to be locked.

Yes, but you can look at a 'yield' and conclude that you might need a lock, and 
that you have to think about it.

Further exploration of my own feelings on the subject grew a bit beyond a good 
length for a reply here, so if you're interested in my thoughts you can have a 
look at my blog: 
.

> There's no substitute for deep thought where any kind of theading is 
> involved, IMO.

Sometimes there's no alternative, but wherever I can, I avoid thinking, 
especially hard thinking.  This maxim has served me very well throughout my 
programming career ;-).

-glyph

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

42 matches

Mail list logo