Re: [Python-Dev] Licensing

2010-07-06 Thread Nir Aides
I take "...running off with the good stuff and selling it for profit" to
mean "creating derivative work and commercializing it as proprietary code"
which you can not do with GPL licensed code. Also, while the GPL does not
prevent selling copies for profit it does not make it very practical either.


On Tue, Jul 6, 2010 at 9:44 AM, Ben Finney

> wrote:

> Guido van Rossum  writes:
>
> > A secondary reasoning for some open source licenses might be to
> > prevent others from running off with the good stuff and selling it for
> > profit. The GPL is big on that […]
>
> Really, it's not. Please stop spreading this canard.
>
> The GPL explicitly and deliberately grants the freedom to sell the work
> for profit. Every copyright holder who grants license under the terms of
> the GPL is explicitly saying “you can seel this software for any price
> you like” http://www.gnu.org/philosophy/selling.html>.
>
> Whatever other complaints people may have against the GPL, it's simply
> *false* to claim what Guido did above. Please stop it.
>
> --
>  \“We cannot solve our problems with the same thinking we used |
>  `\   when we created them.” —Albert Einstein |
> _o__)  |
> Ben Finney
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread Ben Finney
Nir Aides  writes:

> I take "...running off with the good stuff and selling it for profit" to
> mean "creating derivative work and commercializing it as proprietary code"
> which you can not do with GPL licensed code.

It's the “proprietary“ which is the distinguishing criterion there. The
“selling” and “commercial” is totally orthogonal to that.

That's the point: selling, and commercial activity in general, is
explicitly encouraged and permission granted by the GPL. Too many people
speak as though it were otherwise. To those who do: Please stop.

-- 
 \   “Following fashion and the status quo is easy. Thinking about |
  `\your users' lives and creating something practical is much |
_o__)harder.” —Ryan Singer, 2008-07-09 |
Ben Finney

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread Toshio Kuratomi
On Tue, Jul 06, 2010 at 10:10:09AM +0300, Nir Aides wrote:
> I take "...running off with the good stuff and selling it for profit" to mean
> "creating derivative work and commercializing it as proprietary code" which 
> you
> can not do with GPL licensed code. Also, while the GPL does not prevent 
> selling
> copies for profit it does not make it very practical either.
> 
Uhmmm http://finance.yahoo.com/q/is?s=RHT&annual

It is very possible to make money with the GPL.  The GPL does, as you say,
prevents you from creating derivative works that are proprietary code.  It
does *not* prevent you from creating derivative works and commercializing
it.

-Toshio


pgpInicmKNFs3.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread Virgil Dupras
On Tue, Jul 6, 2010 at 9:22 AM, Ben Finney  wrote:

> That's the point: selling, and commercial activity in general, is
> explicitly encouraged and permission granted by the GPL. Too many people
> speak as though it were otherwise. To those who do: Please stop.
>

Please, GPL advocates also spread their own type of FUD, claiming
"free as in speech ain't the same thing as free as in beer, people!".
While true, the bottom line is that Python being BSD-type enables me
to make money with it that I wouldn't make if Python was GPL-type.
Moreover, I don't think that GPL license allows money-making that BSD
type wouldn't allow. Hence the common point of view saying "BSD-type
is more commercial-friendly than GPL".

I've written an article last year that, while it doesn't address this
issue specifically, it touches it.

http://www.hardcoded.net/articles/going_open_source.htm

Virgil Dupras
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread Jesse Noller
On Tue, Jul 6, 2010 at 6:01 AM, Virgil Dupras  wrote:
> On Tue, Jul 6, 2010 at 9:22 AM, Ben Finney  wrote:
>
>> That's the point: selling, and commercial activity in general, is
>> explicitly encouraged and permission granted by the GPL. Too many people
>> speak as though it were otherwise. To those who do: Please stop.
>>
>
> Please, GPL advocates also spread their own type of FUD, claiming
> "free as in speech ain't the same thing as free as in beer, people!".
> While true, the bottom line is that Python being BSD-type enables me
> to make money with it that I wouldn't make if Python was GPL-type.
> Moreover, I don't think that GPL license allows money-making that BSD
> type wouldn't allow. Hence the common point of view saying "BSD-type
> is more commercial-friendly than GPL".
>
> I've written an article last year that, while it doesn't address this
> issue specifically, it touches it.
>
> http://www.hardcoded.net/articles/going_open_source.htm
>

Can we please drop the GPL slap fighting? It's completely worthless
here. Take it to reddit or someplace else. The Python / PSF license
won't be changing anytime soon. Ben could have just have easily
responded to Guido in private if he felt that strongly.

jesse
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing // PSF // Motion of non-confidence

2010-07-06 Thread Steven D'Aprano
On Tue, 6 Jul 2010 01:58:26 pm Stephen J. Turnbull wrote:
> Antoine Pitrou writes:
>  > Which is the very wrong thing to do, though. License text should
>  > be understandable by non-lawyer people;
>
> This is a common mistake, at least with respect to common-law
> systems. Licenses are written in a formal language intended to have
> precise semantics, especially in the event of a dispute going to
> court.  What you wrote is precisely analogous to "a computer program
> should be understandable to non-programmer people".

You've never used Apple's much-missed Hypertalk, have you? :)

Given that Python has often been described as executable pseudo-code, I 
think it is ironic that you're implying that comprehensibility of 
language is a bad thing! Python is no less precise in its semantics 
than (say) APL.

There are movements to discourage unreadable legalise in favour of 
simpler language that is more readable while still being precise. For 
example, the Canadian Bar Association supports the Plain English 
Movement:

http://en.wikipedia.org/wiki/Plain_Language_Movement

and of course excessive formality and legalise is often criticised even 
by lawyers for *harming* precision. (When even the judge can't work out 
what you mean, that's a problem.)

None of this is to imply that the Python licence is guilty of such 
excessive legalise. But I think that, to the extent that other 
priorities and legal obligations permit it, we should always be be open 
to the idea of improving the readability and comprehensibility 
of "legal source code".


> The fact is, in the U.S. if an ordinary person thinks they understand
> a license, then it's probably quite unpredictable what a court will
> say about attempts to enforce it.

I'm not sure that this is a fact or just an opinion, but *my* opinion is 
that this is a safe bet. Most people in the industry consider that it's 
generally unpredictable what a court will say about licences in general 
(particularly the shrink-wrap variety).

It's certainly true that the general public generally has no clue about 
licences, contracts, or legal agreements in general, but then 
agreements written by lawyers aren't always much better. I've been 
asked to sign agreements that are nonsensical, e.g. circular 
definitions where Clause N says to refer to Clause X, and Clause X says 
to refer to Clause N, or NDAs that prohibited me from doing *anything* 
with the "confidential information" the other party gave me, including 
the work they wanted me to do. Or blatantly illegal, e.g. non-compete 
clauses that don't have a hope in hell of surviving a legal challenge, 
including one that would have meant that I was agreeing to never work 
for any person or company in Australia who ever had with a telephone.



-- 
Steven D'Aprano
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] blocking 2.7

2010-07-06 Thread Walter Dörwald
On 05.07.10 16:19, Nick Coghlan wrote:
> On Mon, Jul 5, 2010 at 5:20 AM, Terry Reedy  wrote:
>> On 7/4/2010 2:31 AM, Éric Araujo wrote:

 But Python tests lack coverage stats, so it is hard to say anything.
>>>
>>> FYI: http://coverage.livinglogic.de/
>>
>> Turns out the audioop is one of the best covered modules, at 98%
> 
> Alas, those are only the stats for the audioop test suite. audioop
> itself is written in C, so the automatic coverage stats generated by
> livinglogic don't provide any details.

http://coverage.livinglogic.de/ *does* include coverage info for stuff
written in C, see for example:

   http://coverage.livinglogic.de/Objects/unicodeobject.c.html

However it *is* strange that test_audioop.py gets executed, but
audioop.c doesn't seem to be.

Servus,
   Walter
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread Ben Finney
Jesse Noller  writes:

> The Python / PSF license won't be changing anytime soon.

The existing license for Python suits me fine.

> Ben could have just have easily responded to Guido in private if he
> felt that strongly.

No. I responded in the same forum where the falsehood was put forth, to
correct that falsehood. That's done now; thanks for your attention, all.

-- 
 \   “Timid men prefer the calm of despotism to the boisterous sea |
  `\of liberty.” —Thomas Jefferson |
_o__)  |
Ben Finney

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] blocking 2.7

2010-07-06 Thread Mark Dickinson
On Tue, Jul 6, 2010 at 1:10 PM, Walter Dörwald  wrote:
> http://coverage.livinglogic.de/ *does* include coverage info for stuff
> written in C, see for example:
>
>   http://coverage.livinglogic.de/Objects/unicodeobject.c.html
>
> However it *is* strange that test_audioop.py gets executed, but
> audioop.c doesn't seem to be.

It looks as though none of the extension modules (besides those that
are compiled statically into the interpreter) are reporting coverage.
I wonder whether the correct flags are being passed to the module
build stage?  Incidentally, there doesn't seem to be any of the usual
'make' output I'd associate with the module-building stage in the
build log at:

http://coverage.livinglogic.de/buildlog.txt

For example, I'd expect to see the string 'mathmodule' somewhere in that output.

Mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing // PSF // Motion of non-confidence

2010-07-06 Thread Stephen J. Turnbull
Steven D'Aprano writes:
 > On Tue, 6 Jul 2010 01:58:26 pm Stephen J. Turnbull wrote:

 > > Licenses are written in a formal language intended to have
 > > precise semantics, especially in the event of a dispute going to
 > > court.  What you wrote is precisely analogous to "a computer program
 > > should be understandable to non-programmer people".
 > 
 > You've never used Apple's much-missed Hypertalk, have you? :)

No.  I was solving quadratic programs back then, and FORTRAN was much
better for that.  But I think it's more relevant that my mother tried
writing HyperCard stacks, and gave up.  On the rare occasions she
wanted her computer to do something she couldn't do with MacPaint or
MacWrite, she called me.  She never complained about me writing
programs in BASIC, even though they were totally incomprehensible to
her  And mentioning the "Python as executable pseudo-code" thing,
I think you're way overestimating what average non-programmer people
can cope with.  (I'd be pleased to be proved wrong, especially by the
undergrads I teach!!!)

As for missing it, why would I when I've got Python?
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Signs of neglect?

2010-07-06 Thread Barry Warsaw
On Jul 04, 2010, at 06:58 PM, Éric Araujo wrote:

>I’d like to volunteer to maintain a tool but I’m not sure where I can
>help. I’m already proposing changes to Brett for
>Tools/scripts/patchcheck.py, and I have commented on Tools/i18n bugs,
>but these ones are already maintained by their authors (e.g. Barry is
>assigned pygettext bugs) and I’m by no means a gettext expert.

It's been a while since I did much pygettext stuff.  I think Martin's
basically taken it over in recent years.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [RELEASE] Python 2.7 released

2010-07-06 Thread Barry Warsaw
On Jul 04, 2010, at 11:03 AM, Benjamin Peterson wrote:

>2010/7/4 Benjamin Peterson :
>> On behalf of the Python development team, I'm jocund to announce the
>> second release candidate of Python 2.7.
>
>Arg!!! This should, of course, be "final release".

Congratulations Benjamin!
-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Coverage, was: Re: blocking 2.7

2010-07-06 Thread Walter Dörwald
On 06.07.10 15:07, Mark Dickinson wrote:

> On Tue, Jul 6, 2010 at 1:10 PM, Walter Dörwald  wrote:
>> http://coverage.livinglogic.de/ *does* include coverage info for stuff
>> written in C, see for example:
>>
>>   http://coverage.livinglogic.de/Objects/unicodeobject.c.html
>>
>> However it *is* strange that test_audioop.py gets executed, but
>> audioop.c doesn't seem to be.
> 
> It looks as though none of the extension modules (besides those that
> are compiled statically into the interpreter) are reporting coverage.
> I wonder whether the correct flags are being passed to the module
> build stage?  Incidentally, there doesn't seem to be any of the usual
> 'make' output I'd associate with the module-building stage in the
> build log at:
> 
> http://coverage.livinglogic.de/buildlog.txt
> 
> For example, I'd expect to see the string 'mathmodule' somewhere in that 
> output.

True, there seems to be a problem. I'm running

   ./configure --enable-unicode=ucs4 --with-pydebug

and then

   make coverage

This doesn't seem to build extension modules. However as far as I
understand the Makefile, "make coverage" should build extension modules:

# Default target
all:build_all
build_all:  $(BUILDPYTHON) oldsharedmods sharedmods gdbhooks

coverage:
@echo "Building with support for coverage checking:"
$(MAKE) clean
$(MAKE) all CFLAGS="$(CFLAGS) -O0 -pg -fprofile-arcs -ftest-coverage"
LIBS="$(LIBS) -lgcov"

# Build the shared modules
sharedmods: $(BUILDPYTHON)
@case $$MAKEFLAGS in \
*s*) $(RUNSHARED) CC='$(CC)' LDSHARED='$(BLDSHARED)'
LDFLAGS='$(LDFLAGS)' OPT='$(OPT)' ./$(BUILDPYTHON) -E $(srcdir)/setup.py
-q build;; \
*) $(RUNSHARED) CC='$(CC)' LDSHARED='$(BLDSHARED)' LDFLAGS='$(LDFLAGS)'
OPT='$(OPT)' ./$(BUILDPYTHON) -E $(srcdir)/setup.py build;; \
esac

I'm rerunning now with "make && make coverage" to see if this fixes
anything.

Servus,
   Walter
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thoughts on the bytes/string discussion

2010-07-06 Thread Ronald Oussoren

On 27 Jun, 2010, at 11:48, Greg Ewing wrote:

> Stefan Behnel wrote:
>> Greg Ewing, 26.06.2010 09:58:
>>> Would there be any sanity in having an option to compile
>>> Python with UTF-8 as the internal string representation?
>> It would break Py_UNICODE, because the internal size of a unicode character 
>> would no longer be fixed.
> 
> It's not fixed anyway with the 2-char build -- some
> characters are represented using a pair of surrogates.

It is for practical purposes not even fixed in 4-char builds. In 4-char builds 
every Unicode code points corresponds to one item in a python unicode string, 
but a base characters with combining characters is still a sequence of 
characters and should IMHO almost always be treated as a single object. As an 
example, given s="be\N{COMBINING DIAERESIS}" s[:2] or s[2:] is almost certainly 
semanticly invalid.

Ronald



smime.p7s
Description: S/MIME cryptographic signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread LD 'Gus' Landis
Yes. The BSD license on FreeBSD has allowed Apple to
make MacOS X a completely proprietary product.  The BSD
license allows you to take and never release your mods.  It
has very little to do with money, IMO.

On Tue, Jul 6, 2010 at 1:22 AM, Ben Finney  wrote:
> Nir Aides  writes:
>
>> I take "...running off with the good stuff and selling it for profit" to
>> mean "creating derivative work and commercializing it as proprietary code"
>> which you can not do with GPL licensed code.
>
> It's the “proprietary“ which is the distinguishing criterion there. The
> “selling” and “commercial” is totally orthogonal to that.
>
> That's the point: selling, and commercial activity in general, is
> explicitly encouraged and permission granted by the GPL. Too many people
> speak as though it were otherwise. To those who do: Please stop.
>
> --
>  \       “Following fashion and the status quo is easy. Thinking about |
>  `\        your users' lives and creating something practical is much |
> _o__)                                harder.” —Ryan Singer, 2008-07-09 |
> Ben Finney
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/ldlandis%40gmail.com
>



-- 
---
NOTE: If it is important CALL ME - I may miss email,
which I do NOT normally check on weekends nor on
a regular basis during any other day.
---
LD Landis - N0YRQ - de la tierra del encanto
3960 Schooner Loop, Las Cruces, NM 88012
575-448-1763  N32 21'48.28" W106 46'5.80"
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing // PSF // Motion of non-confidence

2010-07-06 Thread VanL

On 7/5/2010 11:47 PM, Antoine Pitrou wrote:

The point of free software licenses, though (as opposed to proprietary
licenses), is not mainly to go to court (to “protect IP”, as the PSF
says - quite naively in my opinion); it is to enable trust among people.


Yes, that is true. Open source licenses are social documents as much as 
they are legal documents. However, they need to be legally enforceable 
so as to have their intended social effect.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing // PSF // Motion of non-confidence

2010-07-06 Thread VanL

On 7/5/2010 8:03 PM, Steve Holden wrote:

Neil Hodgson wrote:

There have been moves in the past to simplify the license of Python
but this would require agreement from the current rights owners
including CWI and CNRI. IIRC not all of the rights owners are willing
to agree to a change.


That is the current position.


This is a pet project of mine, but it needs round tuits that are 
currently in short supply.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread Stephen J. Turnbull
LD 'Gus' Landis writes:
 > Yes. The BSD license on FreeBSD has allowed Apple to
 > make MacOS X a completely proprietary product.

That's simply not true.
http://www.opensource.apple.com/release/mac-os-x-1064/.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread LD 'Gus' Landis
I stand corrected.  Thanks for the pointer Stephen!

On Tue, Jul 6, 2010 at 10:36 AM, Stephen J. Turnbull  wrote:
> LD 'Gus' Landis writes:
>  > Yes. The BSD license on FreeBSD has allowed Apple to
>  > make MacOS X a completely proprietary product.
>
> That's simply not true.
> http://www.opensource.apple.com/release/mac-os-x-1064/.
>



-- 
---
NOTE: If it is important CALL ME - I may miss email,
which I do NOT normally check on weekends nor on
a regular basis during any other day.
---
LD Landis - N0YRQ - de la tierra del encanto
3960 Schooner Loop, Las Cruces, NM 88012
575-448-1763  N32 21'48.28" W106 46'5.80"
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing // PSF // Motion of non-confidence

2010-07-06 Thread Glyph Lefkowitz

On Jul 6, 2010, at 8:09 AM, Steven D'Aprano wrote:

> You've never used Apple's much-missed Hypertalk, have you? :)

on mailingListMessage
get the message
put it into aMessage
if the thread of aMessage contains license wankery
put aMessage into the trash
end if
end mailingListMessage

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Include datetime.py in stdlib or not?

2010-07-06 Thread Alexander Belopolsky
This idea has been discussed extensively in this and other forums and
I believe it is time to make a decision.

The proposal is to add pure python implementation of datetime module
to stdlib.   The current C implementation will transparently override
pure python definitions in CPython.  Other python implementations will
have an option of supplying their own fast implementation.  This
approach has already been adopted by several modules including pickle,
heapq and warnings.   It has even been suggested [1] that this is the
direction in which the majority of CPython extension modules should be
heading.

 This proposal has brought mostly positive feedback on the tracker [2]
with only a few objections being raised.

1. Since this does not bring any new functionality and datetime module
is not expected to evolve, there is no need for pure python version.
2. There are other areas of stdlib that can benefit more from pure
python equivalents.
3. Reference implementations should be written by a senior CPython
developer and not scraped from external projects like PyPy.

Let me briefly address these objections:

1. Availability of pure python equivalents of standard library modules
is very helpful for debugging python applications. This is
particularly true when the stdlib module is designed to be extendable
by and calls into user-supplied code.  This is true in the case of
datetime module which relies on 3rd-party or user-supplied code for
any timezone support.

The datetime module indeed saw very little development in the last 6
years.   However this lack of development may itself be the result of
pure python version not being available.  For example, the idea to
supply a concrete tzinfo object representing UTC has been brought up
back in 2002. [3]  An RFE [4] was created in the tracker in January,
2009 and took more than 1.5 years to implement.  If you look at the
history of issue5094, you will see that development slowed down
considerably when C coding started.  Note that for this particular
feature, there was probably no need to have it implemented in C to
begin with.  (Most common operations involve datetime objects in the
same timezone and those don't need to call timezone methods.)

2. Unlike other areas of stdlib, datetime module was originally
prototyped in python and it turns out that it hardly changed between
python 2.3 and 2.6 with a couple of features added in 2.7.  A port to
3.x was uneventful as well.

3. The version of datetime.py [5] that I propose for inclusion is
substantially the pure python prototype written by Tim Peters and
others back in 2003.  The PyPy changes are very few [6].

I believe the code is substantially ready for inclusion.  There are a
few items that need to be fixed related to how floating point
arguments to timedelta are handled, as well as some clean-up of
docstrings and error messages (both C and python implementations can
see some improvement in this area).  The biggest item in terms of
development effort would be to refactor  test_datetime to test both
implementations.  A simple solution [7] of importing test_datetime
twice with and without _datetime will probably not be accepted because
it is not compatible with alternative unittest runners.

What do you think?  Please reply here or add a comment at
http://bugs.python.org/issue7989.

[1] http://bugs.python.org/issue5094#msg106498
[2] http://bugs.python.org/issue7989
[3] http://www.zope.org/Members/fdrake/DateTimeWiki/SuggestedRequirements
[4] http://bugs.python.org/issue5094
[5] 
http://svn.python.org/view/*checkout*/sandbox/branches/py3k-datetime/datetime.py
[6] http://bugs.python.org/file17701/datetime-sandbox-pypy.diff
[7] http://bugs.python.org/file17848/issue7989.diff
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Include datetime.py in stdlib or not?

2010-07-06 Thread Brett Cannon
On Tue, Jul 6, 2010 at 12:59, Alexander Belopolsky
 wrote:
> This idea has been discussed extensively in this and other forums and
> I believe it is time to make a decision.
>
> The proposal is to add pure python implementation of datetime module
> to stdlib.   The current C implementation will transparently override
> pure python definitions in CPython.  Other python implementations will
> have an option of supplying their own fast implementation.  This
> approach has already been adopted by several modules including pickle,
> heapq and warnings.   It has even been suggested [1] that this is the
> direction in which the majority of CPython extension modules should be
> heading.
>
>  This proposal has brought mostly positive feedback on the tracker [2]
> with only a few objections being raised.
>
> 1. Since this does not bring any new functionality and datetime module
> is not expected to evolve, there is no need for pure python version.
> 2. There are other areas of stdlib that can benefit more from pure
> python equivalents.
> 3. Reference implementations should be written by a senior CPython
> developer and not scraped from external projects like PyPy.

I should mention that PyPy has said they are quite happy to donate
their datetime implementation which is what Alexander (I believe) has
been working off of.

Also, adding a pure Python version alleviates the need of the other
VMs from having to maintain the same module separately. Making the
stdlib shareable (and thus eventually breaking it out from CPython)
was discussed at the language summit at PyCon 2010 and generally
agreed upon, and this is a step towards making that happen.

-Brett


>
> Let me briefly address these objections:
>
> 1. Availability of pure python equivalents of standard library modules
> is very helpful for debugging python applications. This is
> particularly true when the stdlib module is designed to be extendable
> by and calls into user-supplied code.  This is true in the case of
> datetime module which relies on 3rd-party or user-supplied code for
> any timezone support.
>
> The datetime module indeed saw very little development in the last 6
> years.   However this lack of development may itself be the result of
> pure python version not being available.  For example, the idea to
> supply a concrete tzinfo object representing UTC has been brought up
> back in 2002. [3]  An RFE [4] was created in the tracker in January,
> 2009 and took more than 1.5 years to implement.  If you look at the
> history of issue5094, you will see that development slowed down
> considerably when C coding started.  Note that for this particular
> feature, there was probably no need to have it implemented in C to
> begin with.  (Most common operations involve datetime objects in the
> same timezone and those don't need to call timezone methods.)
>
> 2. Unlike other areas of stdlib, datetime module was originally
> prototyped in python and it turns out that it hardly changed between
> python 2.3 and 2.6 with a couple of features added in 2.7.  A port to
> 3.x was uneventful as well.
>
> 3. The version of datetime.py [5] that I propose for inclusion is
> substantially the pure python prototype written by Tim Peters and
> others back in 2003.  The PyPy changes are very few [6].
>
> I believe the code is substantially ready for inclusion.  There are a
> few items that need to be fixed related to how floating point
> arguments to timedelta are handled, as well as some clean-up of
> docstrings and error messages (both C and python implementations can
> see some improvement in this area).  The biggest item in terms of
> development effort would be to refactor  test_datetime to test both
> implementations.  A simple solution [7] of importing test_datetime
> twice with and without _datetime will probably not be accepted because
> it is not compatible with alternative unittest runners.
>
> What do you think?  Please reply here or add a comment at
> http://bugs.python.org/issue7989.
>
> [1] http://bugs.python.org/issue5094#msg106498
> [2] http://bugs.python.org/issue7989
> [3] http://www.zope.org/Members/fdrake/DateTimeWiki/SuggestedRequirements
> [4] http://bugs.python.org/issue5094
> [5] 
> http://svn.python.org/view/*checkout*/sandbox/branches/py3k-datetime/datetime.py
> [6] http://bugs.python.org/file17701/datetime-sandbox-pypy.diff
> [7] http://bugs.python.org/file17848/issue7989.diff
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] blocking 2.7

2010-07-06 Thread Nick Coghlan
On Tue, Jul 6, 2010 at 10:10 PM, Walter Dörwald  wrote:
> On 05.07.10 16:19, Nick Coghlan wrote:
> http://coverage.livinglogic.de/ *does* include coverage info for stuff
> written in C, see for example:
>
>   http://coverage.livinglogic.de/Objects/unicodeobject.c.html

Ah, I missed that. Cool.

> However it *is* strange that test_audioop.py gets executed, but
> audioop.c doesn't seem to be.

There do seem to be a *lot* of N/A's against the C code (that's why I
thought the C code wasn't included in the stats collection in the
first place).

Regards,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread Terry Reedy
I think there are a couple of potential action items that have come out 
of the discussion.


1. Python License

If there is not already, could there be an explanatory note, something 
like (worded to be 'neutral':


"The Python License is complicated because Python has been developed at 
various times under the auspices of four different organizations. Each 
retains ownership of the code developed or contributed during its tenure 
and continues to license its portion of the code under its own  Python 
license."


Perhaps add: "The PSF cannot unilaterally change this."

It would be nice if a layperson summary could be added:

"Overall, the Python License is similar to the MIT license."

and even "Basically, you can do what you want as long as you do it at 
your own risk and do not claim ownership of either the code or the name 
Python."


Such paraphrases have been posted on Python-list, though without legal 
standing. But I would understand if our lawyer objected that for the 
PSF, rather than individuals, to say the same would somehow give the 
paraphrase a legal standing it should not have.


2. Contibutor License

I signed this some time ago, but wondered a bit about the discrepancy 
between this and the distribution license. I appreciate that Anatoly's 
question about the same has elicited an explanation that I can 
understand: The PSF requests that we give the PSF a clear, 
understandable license that allows the PSF both to distribute our 
contributions *and* to re-license it under the complicated license that 
it is forced to use for distribution. To put it another way: the 
contributor agreement is simple so contributors do not have to bother 
(as contributors) with the complications of the distribution license.


Perhaps this could be clearer on the contributor license page.

PS to Anatoly: I hope your questions, at least on the contributor 
agreement, are sufficiently well answered that you will sign it, send it 
in, and continue contributing. I say this as someone who did read and 
think about it and decide there was nothing to worry about because I 
would keep ownership of my words, trusted that they would appear in at 
least one more Python version, and otherwise did not excessively care 
what PSF did with them. I also say this as someone who currently would 
not upload a package of mine to the PyPI repository because for that I 
*would* care.


---
Comment on trust. Trust works both ways. So does distrust.

Asking contributors to give written licenses in addition to the license 
implicit in the act of contribution is an act of distrust. It says 
something like "We worry that you might change you mind and sue, and a 
court might not immediately toss the suit." So it should not surprise if 
the occasional person reacts with overt hurt and distrust.



--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Include datetime.py in stdlib or not?

2010-07-06 Thread Nick Coghlan
On Wed, Jul 7, 2010 at 5:59 AM, Alexander Belopolsky
 wrote:
> What do you think?  Please reply here or add a comment at
> http://bugs.python.org/issue7989.

(For those that haven't read the tracker discussion, it's long, but
worth skimming to get a better idea of the various points of view).

+1 on the general idea, but I haven't looked at the patches in order
to be able to comment on the specifics (except that following any of
the test_warnings, test_heapq, test_pickle, test_io, etc. styles of
testing parallel implementations should be fine).

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread Nick Coghlan
On Wed, Jul 7, 2010 at 7:05 AM, Terry Reedy  wrote:
> Asking contributors to give written licenses in addition to the license
> implicit in the act of contribution is an act of distrust. It says something
> like "We worry that you might change you mind and sue, and a court might not
> immediately toss the suit." So it should not surprise if the occasional
> person reacts with overt hurt and distrust.

The other (IMO, more important) element to it is that it acts as an
assertion that the developer actually *has* the rights to contribute
the code they're contributing. So, rather than being worried about
someone changing their mind about their contributions (although that's
admittedly part of it), we're more concerned that contributors
actually think about who owns the copyright on the code they're
offering and make sure the appropriate permissions are in place.

For example, if you look at some of the code that even Guido has
submitted (e.g. pgen2), that's actually come in under Google's
contributor agreement, rather than Guido's personal one. Presumably
that was work he did on company time, so the copyright actually rests
with Google rather than Guido.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread Simon Cross
On Tue, Jul 6, 2010 at 11:05 PM, Terry Reedy  wrote:
> 1. Python License
>
> If there is not already, could there be an explanatory note, something like
> (worded to be 'neutral':

As a sub-point, I'd like to see something short explaining how the
different licenses in the LICENSE file are meat to be combined. At the
moment the terms and conditions section just lists them without
explanation.

Schiavo
Simon
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Licensing

2010-07-06 Thread M.-A. Lemburg
Terry Reedy wrote:
> Comment on trust. Trust works both ways. So does distrust.
> 
> Asking contributors to give written licenses in addition to the license
> implicit in the act of contribution is an act of distrust. It says
> something like "We worry that you might change you mind and sue, and a
> court might not immediately toss the suit." So it should not surprise if
> the occasional person reacts with overt hurt and distrust.

The written contributor agreements are needed to enable the PSF
to defend the IP in the Python software. They are just a legal tool,
nothing more.

Note that the PSF doesn't relicense the contributed code under
the whole license stack. The contributed code is (currently) being
relicensed under the PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2
(the top part of the stack), which is a very straight forward
BSD-style license.

The other licenses in the stack only apply to the code owned
by the resp. parties CWI, CNRI, BeOpen and the cast of thousands
(which fortunately didn't get to send in their lawyers and still
had a very good time).

Apart from that, the Python distribution also comes with 3rd
party code under various other BSD-style licenses.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jul 06 2010)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2010-07-19: EuroPython 2010, Birmingham, UK12 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

2010-07-06 Thread Terry Reedy

[Also posted to http://bugs.python.org/issue2986
Developed with input from Eli Bendersky, who will write patchfile(s) for 
whichever change option is chosen.]


Summary: difflib.SeqeunceMatcher was developed, documented, and 
originally operated as "a flexible class for comparing pairs of 
sequences of any [hashable] type". An "experimental" heuristic was added 
in 2.3a1 to speed up its application to sequences of code lines, which 
are selected from an unbounded set of possibilities. As explained below, 
this heuristic partly to completely disables SequenceMatcher for 
realistic-length sequences from a small finite alphabet. The regression 
is easy to fix. The docs were never changed to reflect the effect of the 
heuristic, but should be, with whatever additional change is made.


In the commit message for revision 26661, which added the heuristic, Tim 
Peters wrote "While I like what I've seen of the effects so far, I still 
consider this experimental.  Please give it a try!" Several people who 
have tried it discovered the problem with small alphabets and posted to 
the tracker. Issues #1528074, #1678339. #1678345, and #4622 are 
now-closed duplicates of #2986. The heuristic needs revision.


Open questions (discussed after the examples): what exactly to do, which 
versions to do it too, and who will do it.


---
Some minimal difference examples:

from difflib import SequenceMatcher as SM

# base example
print(SM(None, 'x' + 'y'*199, 'y'*199).ratio())
# should be and is 0.9975 (rounded)

# make 'y' junk
print(SM(lambda c:c=='y', 'x' + 'y'*199, 'y'*199).ratio())
# should be and is 0.0

# Increment b by 1 char
print(SM(None, 'x' + 'y'*199, 'y'*200).ratio())
# should be .995, but now is 0.0 because y is treated as junk

# Reverse a and b, which increments b
print(SM(None, 'y'*199, 'x' + 'y'*199).ratio())
# should be .9975, as before, but now is 0.0 because y is junked

The reason for the bug is the heuristic: if the second sequence is at 
least 200 items long then any item occurring more than one percent of 
the time in the second sequence is treated as junk. This was aimed at 
recurring code lines like 'else:' and 'return', but can be fatal for 
small alphabets where common items are necessary content.


A more realistic example than the above is comparing DNA gene sequences. 
Without the heuristic SequenceMatcher.get_opcodes() reports an 
appropriate sequence of matches and edits and .ratio works as documented 
and expected.  For 1000/2000/6000 bases, the times on a old Athlon 2800 
machine are <1/2/12 seconds. Since 6000 is longer than most genes, this 
is a realistic and practical use.


With the heuristic, everything is junk and there is only one match, 
''=='' augmented by the initial prefix of matching bases. This is 
followed by one edit: replace the rest of the first sequence with the 
rest of the second sequence. A much faster way to find the first 
mismatch would be

   i = 0
   while first[i] == second[i]:
  i+=1
The match ratio, based on the initial matching prefix only, is 
spuriously low.


---
Questions:

1: what change should be make.

Proposed fix: Disentangle the heuristic from the calculation of the 
internal b2j dict that maps items to indexes in the second sequence b. 
Only apply the heuristic (or not) afterward.


Version A: Modify the heuristic to only eliminate common items when 
there are more than, say, 100 items (when len(b2j)> 100 where b2j is 
first calculated without popularity deletions).


The would leave DNA, protein, and printable ascii+[\n\r\t] sequences 
alone. On the other hand, realistic sequences of more than 200 code 
lines should have at least 100 different lines, and so the heuristic 
should continue to be applied when it (mostly?) 'should' be. This change 
leaves the API unchanged and does not require a user decision.


Version B: add a parameter to .__init__ to make the heuristic optional. 
If the default were True ('use it'), then the code would run the same as 
now (even when bad). With the heuristic turned off, users would be able 
to get the .ratio they may expect and need. On the other hand, users 
would have to understand the heuristic to know when and when not to use it.


Version C: A more radical alternative would be to make one or more of 
the tuning parameters user settable, with one setting turning it off.


2. What type of issue is this, and what version get changed.

I see the proposal as partial reversion of a change that sometimes 
causes a regression, in order to fix the regression. Such would usually 
be called a bugfix. Other tracker reviewers claim this issue is a 
feature request, not a bugfix. Either way, 3.2 gets the fix. The 
practical issue is whether at least 2.7(.1) should get the fix, or 
whether the bug should forever continue in 2.x.


3. Who will make the change.

Eli will write a patch and I will check it. However, Georg Brandel 
assigned the issue to Tim Peters, with a request for comment, but Tim 
never responded. Is there 

Re: [Python-Dev] Mercurial migration readiness

2010-07-06 Thread anatoly techtonik
On Fri, Jul 2, 2010 at 3:34 PM, Antoine Pitrou  wrote:
>>
>> > After the switch, hg.python.org/cpython will be the official repo, and
>> > code.python.org/hg will probably be closed.
>>
>> Why this transition is not described in PEP?
>
> Because it's not a transition. It's a mirror. It was put in place
> before the hg migration plan was accepted, IIRC.

Where is this migration plan then if it is not in PEP?

>> How code.python.org/hg is synchronized with Subversion?
>
> What does your question mean exactly? It's a mirror (well, a set of
> mirrors) and is synchronized roughly every 5 minutes.

Method. Software used, which parameters are set for it, how to repeat
the process?

>> Why it is not possible to leave code.python.org/hg as is in slave mode
>> and then realtime replication is ready just switch master/slave over?
>
> The two sets of repositories use different conversion tools and rules.
> They have nothing in common (different changeset IDs, different
> metadata, different branch/clone layout).

That would be nice to hear about in more detail. As I understand there
is no place where it is described. I already see +1 from Fred Drake
and another +1 from Steve Holden down the thread.

However, Antoine Pitrou, Dirkjan Ochtman and Jesse Noller object. They
afraid that contributors won't survive low-level details about
Mercurial migration. I'd say there a plenty of ways isolate them and
at the same time satisfy "Mercurial aficionados" either on the same
page or in different places.

On Fri, Jul 2, 2010 at 4:06 PM, Stephen J. Turnbull  wrote:
>
> There is no reason at this point to suppose the transition can't be
> complete by the end of summer.  However, as always, the devil is in
> the details, and one of them may be a showstopper.  We'll just have to
> see about that.

The transition can be complete in a few minutes. The question is how
good it will be. As there are no plan, no roadmap, no status - it is
hard to judge if it is feasible at all.


Ok. Given that nobody is able/willing to say anything more - I've
gathered all your feedback concerning current status of Mercurial
migration on this Wave -
https://wave.google.com/wave/waveref/googlewave.com/w+4_fnAVHwA  I
hope you will find the time to enhance it with more info so not
contributors proficient with Mercurial could help to speed up the
transition.

-- 
anatoly t.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mercurial migration readiness

2010-07-06 Thread Jesse Noller
On Tue, Jul 6, 2010 at 7:47 PM, anatoly techtonik  wrote:
> On Fri, Jul 2, 2010 at 3:34 PM, Antoine Pitrou  wrote:
>>>
>>> > After the switch, hg.python.org/cpython will be the official repo, and
>>> > code.python.org/hg will probably be closed.
>>>
>>> Why this transition is not described in PEP?
>>
>> Because it's not a transition. It's a mirror. It was put in place
>> before the hg migration plan was accepted, IIRC.
>
> Where is this migration plan then if it is not in PEP?
>
>>> How code.python.org/hg is synchronized with Subversion?
>>
>> What does your question mean exactly? It's a mirror (well, a set of
>> mirrors) and is synchronized roughly every 5 minutes.
>
> Method. Software used, which parameters are set for it, how to repeat
> the process?
>
>>> Why it is not possible to leave code.python.org/hg as is in slave mode
>>> and then realtime replication is ready just switch master/slave over?
>>
>> The two sets of repositories use different conversion tools and rules.
>> They have nothing in common (different changeset IDs, different
>> metadata, different branch/clone layout).
>
> That would be nice to hear about in more detail. As I understand there
> is no place where it is described. I already see +1 from Fred Drake
> and another +1 from Steve Holden down the thread.
>
> However, Antoine Pitrou, Dirkjan Ochtman and Jesse Noller object. They
> afraid that contributors won't survive low-level details about
> Mercurial migration. I'd say there a plenty of ways isolate them and
> at the same time satisfy "Mercurial aficionados" either on the same
> page or in different places.

No, I don't need you misrepresenting anything I've said Anatoly - I
said there's no need to maintain SVN alongside mercurial after we
convert, and doing so is silly. I maintain that once we convert, we
very happily stay converted, and drop official "other" mirrors unless
other volunteers step up to maintain them.

I have no problem with additional documentation should people wish to
volunteer to write it.

We do not work for you Anatoly.

> On Fri, Jul 2, 2010 at 4:06 PM, Stephen J. Turnbull  
> wrote:
>>
>> There is no reason at this point to suppose the transition can't be
>> complete by the end of summer.  However, as always, the devil is in
>> the details, and one of them may be a showstopper.  We'll just have to
>> see about that.
>
> The transition can be complete in a few minutes. The question is how
> good it will be. As there are no plan, no roadmap, no status - it is
> hard to judge if it is feasible at all.

No. There is no question except in your mind. We all have a rough idea
of the status, modulo the PEPs being updated. It is also perfectly
feasible. I would love it, and offer you a christmas card if you could
drop the hyperbole and misrepresentation.

>
> Ok. Given that nobody is able/willing to say anything more - I've
> gathered all your feedback concerning current status of Mercurial
> migration on this Wave -
> https://wave.google.com/wave/waveref/googlewave.com/w+4_fnAVHwA  I
> hope you will find the time to enhance it with more info so not
> contributors proficient with Mercurial could help to speed up the
> transition.

While the summary is nice; your wave entry has nothing to do with the
mercurial transition, if you want to help, please ask someone to take
on an open task, or volunteer to write/accentuate the PEPs, or help
with documentation for post-migration workflow. Your contributions can
be effective and useful, rather than noisemaking and abrasive.

The mercurial transition will occur, barring someone directly involved
finding show-stopping reasons otherwise, with or without you. The
decision was made some time ago, and despite your recent noisemaking,
will continue on.

jesse
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

2010-07-06 Thread Kevin Jacobs
On Tue, Jul 6, 2010 at 7:18 PM, Terry Reedy  wrote:

> [Also posted to http://bugs.python.org/issue2986
> A much faster way to find the first mismatch would be
>   i = 0
>   while first[i] == second[i]:
>  i+=1
> The match ratio, based on the initial matching prefix only, is spuriously
> low.
>
>
I don't have much experience with the Python sequence matcher, but many
classical edit distance and alignment algorithms benefit from stripping any
common prefix and suffix before engaging in heavy-lifting.  This is
trivially optimal for Hamming-like distances and easily shown to be for
Levenshtein and Damerau type distances.

-Kevin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Include datetime.py in stdlib or not?

2010-07-06 Thread Terry Reedy

On 7/6/2010 3:59 PM, Alexander Belopolsky wrote:

I am more interested in Brett's overall vision than this particular 
module. I understand that to be one of a stdlib that is separate from 
CPython and is indeed the standard Python library.


Questions:

!. Would the other distributions use a standard stdlib rather than 
current individual versions? If so, and if at least one used the Python 
version of each module, this would alleviate the concern that non-use == 
non-testing. (Test improvement would also help this.)


2. Would the other distributions pool their currently separate stdlib 
efforts to help maintain one standard stdlib. If so, this would 
alleviate the concern about the extra effort to maintain both a C and 
Python version. (Test improvement would also help this also.)


3. What version of Python would be allowed for use in the stdlib? I 
would like the stdlib for 3.x to be able to use 3.x code. This would be 
only a minor concern for CPython as long as 2.7 is maintained, but a 
major concern for the other implementation currently 'stuck' in 2.x 
only. A good 3to2 would be needed.


I generally favor having Python versions of modules available. My 
current post on difflib.SequenceMatcher is based on experiments with an 
altered version. I copied difflib.py to my test directory, renamed it 
diff2lib.py, so I could import both versions, found and edited the 
appropriate method, and off I went. If difflib were in C, my post would 
have been based on speculation about how a fixed version would operate, 
rather than on data.


4. Does not ctypes make it possible to replace a method of a 
Python-coded class with a faster C version, with something like

  try:
connect to methods.dll
check that function xyx exists
replace Someclass.xyy with ctypes wrapper
  except: pass
For instance, the SequenceMatcher heuristic was added to speedup the 
matching process that I believe is encapsulated in one O(n**2) or so 
bottleneck method. I believe most everything else is O(n) bookkeeping.




  This proposal has brought mostly positive feedback on the tracker [2]
with only a few objections being raised.

1. Since this does not bring any new functionality and datetime module
is not expected to evolve, there is no need for pure python version.


see above


2. There are other areas of stdlib that can benefit more from pure
python equivalents.


Possibly true, but developers do what they do, and this seems mostly done.


3. Reference implementations should be written by a senior CPython
developer and not scraped from external projects like PyPy.


I did not see that im my reading of the thread. In any case, what 
matters is quality, not authorship.


> What do you think?  Please reply here or add a comment at
> http://bugs.python.org/issue7989.

From scanning that and the posts here, it seems like a pep or other doc 
on dual version modules would be a good idea. It should at least 
document how to code the switch from python version to the x coded 
version and how to test both, as discussed.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

2010-07-06 Thread Tim Peters
[Terry Reedy]
> [Also posted to http://bugs.python.org/issue2986
> Developed with input from Eli Bendersky, who will write patchfile(s) for
> whichever change option is chosen.]

Thanks for paying attention to this, Terry (and Ed)!  I somehow
managed to miss the whole discussion over the intervening years :-(

> Summary: difflib.SeqeunceMatcher was developed, documented, and originally
> operated as "a flexible class for comparing pairs of sequences of any
> [hashable] type".

Although it can be used for that, its true intent was to produce
intuitive diffs for revisions of text files (including code files)
edited by humans.  Where "intuitive" means approximately "less jarring
than the often-odd diffs produced by algorithms working on some rigid
mathematical notion of 'minimal edit distance'".

Whether it's useful for more than that I can't say, because that's all
I ever developed (or used) the algorithm for.

> An "experimental" heuristic was added in 2.3a1

Big bad on me for that!  At the time I fully intended to document
that, and at least make it tunable, but life intervened and it got
dropped on the floor.

> to speed up its application to sequences of code lines,

Yes, that was the intent.  I was corresponding with a user at the time
who had odd notions (well, by my standards) of how to format C code,
which left him with many hundreds of lines containing only an open
brace, or a close brace, or just a semicolon (etc).  difflib spun its
wheels frantically trying to sort this out, and the heuristic in
question cut processing time from hours (in the worst cases) to
seconds.

Since that (text file comparison) was always the primary case for this
class, it was worth doing something about.  But it should not have
gone in the way it did (undocumented & unfinished, as you correctly
note).

> which are selected from an
> unbounded set of possibilities. As explained below, this heuristic partly to
> completely disables SequenceMatcher for realistic-length sequences from a
> small finite alphabet.

Which wasn't an anticipated use case, so should not be favored.
Slowing down difflib for what it was intended for is not a good idea -
practicality beats purity.

Ya, ya, I understand someone playing around with DNA sequences might
find difflib tempting at first, but fix this and they're still going
to be unhappy.  There are much better (faster, "more standard")
algorithms for comparing sequences drawn from tiny alphabets, and
especially so for DNA matching.

> The regression is easy to fix. The docs were never
> changed to reflect the effect of the heuristic, but should be, with whatever
> additional change is made.

True - and always was.

> In the commit message for revision 26661, which added the heuristic, Tim
> Peters wrote "While I like what I've seen of the effects so far, I still
> consider this experimental.  Please give it a try!" Several people who have
> tried it discovered the problem with small alphabets and posted to the
> tracker. Issues #1528074, #1678339. #1678345, and #4622 are now-closed
> duplicates of #2986. The heuristic needs revision.
>
> Open questions (discussed after the examples): what exactly to do, which
> versions to do it too, and who will do it.
>
> ---
> Some minimal difference examples:
>
> from difflib import SequenceMatcher as SM
>
> # base example
> print(SM(None, 'x' + 'y'*199, 'y'*199).ratio())
> # should be and is 0.9975 (rounded)
>
> # make 'y' junk
> print(SM(lambda c:c=='y', 'x' + 'y'*199, 'y'*199).ratio())
> # should be and is 0.0
>
> # Increment b by 1 char
> print(SM(None, 'x' + 'y'*199, 'y'*200).ratio())
> # should be .995, but now is 0.0 because y is treated as junk
>
> # Reverse a and b, which increments b
> print(SM(None, 'y'*199, 'x' + 'y'*199).ratio())
> # should be .9975, as before, but now is 0.0 because y is junked
>
> The reason for the bug is the heuristic: if the second sequence is at least
> 200 items long then any item occurring more than one percent of the time in
> the second sequence is treated as junk. This was aimed at recurring code
> lines like 'else:' and 'return', but can be fatal for small alphabets where
> common items are necessary content.

Indeed, it makes no sense at all for tiny alphabets.  OTOH, as above,
it gave factor-of-thousands speedups for intended use cases, and
that's more important to me.  There should certainly be a way to turn
off the "auto junk" heuristic, and to tune it, but - sorry for being
pragmatic ;-) - it was a valuable speed improvement for what I expect
still remain difflib's overwhelmingly most common use cases.

> A more realistic example than the above is comparing DNA gene sequences.

Comparing DNA sequences is realistic, but using SequenceMatcher to do
so is unrealistic except for a beginner just playing with the ideas.
There should be a way to disable the heuristic so the beginner can
have their fun, but any serious work in this area will need to use
different algorithms.

> Without the heuristic SequenceMatcher.

Re: [Python-Dev] Licensing

2010-07-06 Thread Guido van Rossum
On Tue, Jul 6, 2010 at 11:27 PM, Nick Coghlan  wrote:
> For example, if you look at some of the code that even Guido has
> submitted (e.g. pgen2), that's actually come in under Google's
> contributor agreement, rather than Guido's personal one. Presumably
> that was work he did on company time, so the copyright actually rests
> with Google rather than Guido.

I hope you are misremembering some details. I did that work while at
Elemental Security (i.e. before I joined Google). It should have
Elemental Security's contributor agreement. I developed that code
initially for inclusion in Elemental's product line (as part of a
parser for a domain-specific language named "Fuel" which did not get
open-sourced -- probably for the better.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thoughts on the bytes/string discussion

2010-07-06 Thread Stefan Behnel

Ronald Oussoren, 06.07.2010 16:51:

On 27 Jun, 2010, at 11:48, Greg Ewing wrote:


Stefan Behnel wrote:

Greg Ewing, 26.06.2010 09:58:

Would there be any sanity in having an option to compile Python
with UTF-8 as the internal string representation?

It would break Py_UNICODE, because the internal size of a unicode
character would no longer be fixed.


It's not fixed anyway with the 2-char build -- some characters are
represented using a pair of surrogates.


It is for practical purposes not even fixed in 4-char builds. In 4-char
builds every Unicode code points corresponds to one item in a python
unicode string, but a base characters with combining characters is still
a sequence of characters and should IMHO almost always be treated as a
single object. As an example, given s="be\N{COMBINING DIAERESIS}" s[:2]
or s[2:] is almost certainly semanticly invalid.


Sure. However, this is not a problem for the purpose of the C-API, 
especially for Cython (which is the angle from which I brought this up). 
All Cython cares about is that it mimics CPython's sematics excactly when 
transforming code, and a CPython runtime will ignore surrogate pairs and 
combining characters during iteration and indexing, and when determining 
the string length. So a single character unicode string can currently be 
safely aliased by Py_UNICODE with correct Python semantics. That would no 
longer be the case if the internal representation switched to UTF-8 and/or 
if CPython started to take surrogates and combining characters into account 
when considering the string length.


Note that it's impossible to determine if a unicode string contains 
surrogate pairs because it's running on a narrow unicode build or because 
the user entered them into the string. But the user would likely expect the 
second case to treat them as separate code points, whereas the first is an 
implementation detail that should normally be invisible. Combining 
characters are a lot clearer here, as they can only be entered by users, so 
keeping them separate as provided is IMHO the expected behaviour.


I think the main theme here is that the interpretation of code points and 
their transformation for user interfaces and backends is left to the user 
code. Py_UNICODE represents a code point in the current system, including 
surrogate pair 'escapes'. And that would change if the underlying encoding 
switched to something other than UTF-16/UCS-4.


Stefan

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com