Re: [Python-Dev] Licensing
I take "...running off with the good stuff and selling it for profit" to mean "creating derivative work and commercializing it as proprietary code" which you can not do with GPL licensed code. Also, while the GPL does not prevent selling copies for profit it does not make it very practical either. On Tue, Jul 6, 2010 at 9:44 AM, Ben Finney > wrote: > Guido van Rossum writes: > > > A secondary reasoning for some open source licenses might be to > > prevent others from running off with the good stuff and selling it for > > profit. The GPL is big on that […] > > Really, it's not. Please stop spreading this canard. > > The GPL explicitly and deliberately grants the freedom to sell the work > for profit. Every copyright holder who grants license under the terms of > the GPL is explicitly saying “you can seel this software for any price > you like” http://www.gnu.org/philosophy/selling.html>. > > Whatever other complaints people may have against the GPL, it's simply > *false* to claim what Guido did above. Please stop it. > > -- > \“We cannot solve our problems with the same thinking we used | > `\ when we created them.” —Albert Einstein | > _o__) | > Ben Finney > > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org > ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing
Nir Aides writes: > I take "...running off with the good stuff and selling it for profit" to > mean "creating derivative work and commercializing it as proprietary code" > which you can not do with GPL licensed code. It's the “proprietary“ which is the distinguishing criterion there. The “selling” and “commercial” is totally orthogonal to that. That's the point: selling, and commercial activity in general, is explicitly encouraged and permission granted by the GPL. Too many people speak as though it were otherwise. To those who do: Please stop. -- \ “Following fashion and the status quo is easy. Thinking about | `\your users' lives and creating something practical is much | _o__)harder.” —Ryan Singer, 2008-07-09 | Ben Finney ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing
On Tue, Jul 06, 2010 at 10:10:09AM +0300, Nir Aides wrote: > I take "...running off with the good stuff and selling it for profit" to mean > "creating derivative work and commercializing it as proprietary code" which > you > can not do with GPL licensed code. Also, while the GPL does not prevent > selling > copies for profit it does not make it very practical either. > Uhmmm http://finance.yahoo.com/q/is?s=RHT&annual It is very possible to make money with the GPL. The GPL does, as you say, prevents you from creating derivative works that are proprietary code. It does *not* prevent you from creating derivative works and commercializing it. -Toshio pgpInicmKNFs3.pgp Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing
On Tue, Jul 6, 2010 at 9:22 AM, Ben Finney wrote: > That's the point: selling, and commercial activity in general, is > explicitly encouraged and permission granted by the GPL. Too many people > speak as though it were otherwise. To those who do: Please stop. > Please, GPL advocates also spread their own type of FUD, claiming "free as in speech ain't the same thing as free as in beer, people!". While true, the bottom line is that Python being BSD-type enables me to make money with it that I wouldn't make if Python was GPL-type. Moreover, I don't think that GPL license allows money-making that BSD type wouldn't allow. Hence the common point of view saying "BSD-type is more commercial-friendly than GPL". I've written an article last year that, while it doesn't address this issue specifically, it touches it. http://www.hardcoded.net/articles/going_open_source.htm Virgil Dupras ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing
On Tue, Jul 6, 2010 at 6:01 AM, Virgil Dupras wrote: > On Tue, Jul 6, 2010 at 9:22 AM, Ben Finney wrote: > >> That's the point: selling, and commercial activity in general, is >> explicitly encouraged and permission granted by the GPL. Too many people >> speak as though it were otherwise. To those who do: Please stop. >> > > Please, GPL advocates also spread their own type of FUD, claiming > "free as in speech ain't the same thing as free as in beer, people!". > While true, the bottom line is that Python being BSD-type enables me > to make money with it that I wouldn't make if Python was GPL-type. > Moreover, I don't think that GPL license allows money-making that BSD > type wouldn't allow. Hence the common point of view saying "BSD-type > is more commercial-friendly than GPL". > > I've written an article last year that, while it doesn't address this > issue specifically, it touches it. > > http://www.hardcoded.net/articles/going_open_source.htm > Can we please drop the GPL slap fighting? It's completely worthless here. Take it to reddit or someplace else. The Python / PSF license won't be changing anytime soon. Ben could have just have easily responded to Guido in private if he felt that strongly. jesse ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing // PSF // Motion of non-confidence
On Tue, 6 Jul 2010 01:58:26 pm Stephen J. Turnbull wrote: > Antoine Pitrou writes: > > Which is the very wrong thing to do, though. License text should > > be understandable by non-lawyer people; > > This is a common mistake, at least with respect to common-law > systems. Licenses are written in a formal language intended to have > precise semantics, especially in the event of a dispute going to > court. What you wrote is precisely analogous to "a computer program > should be understandable to non-programmer people". You've never used Apple's much-missed Hypertalk, have you? :) Given that Python has often been described as executable pseudo-code, I think it is ironic that you're implying that comprehensibility of language is a bad thing! Python is no less precise in its semantics than (say) APL. There are movements to discourage unreadable legalise in favour of simpler language that is more readable while still being precise. For example, the Canadian Bar Association supports the Plain English Movement: http://en.wikipedia.org/wiki/Plain_Language_Movement and of course excessive formality and legalise is often criticised even by lawyers for *harming* precision. (When even the judge can't work out what you mean, that's a problem.) None of this is to imply that the Python licence is guilty of such excessive legalise. But I think that, to the extent that other priorities and legal obligations permit it, we should always be be open to the idea of improving the readability and comprehensibility of "legal source code". > The fact is, in the U.S. if an ordinary person thinks they understand > a license, then it's probably quite unpredictable what a court will > say about attempts to enforce it. I'm not sure that this is a fact or just an opinion, but *my* opinion is that this is a safe bet. Most people in the industry consider that it's generally unpredictable what a court will say about licences in general (particularly the shrink-wrap variety). It's certainly true that the general public generally has no clue about licences, contracts, or legal agreements in general, but then agreements written by lawyers aren't always much better. I've been asked to sign agreements that are nonsensical, e.g. circular definitions where Clause N says to refer to Clause X, and Clause X says to refer to Clause N, or NDAs that prohibited me from doing *anything* with the "confidential information" the other party gave me, including the work they wanted me to do. Or blatantly illegal, e.g. non-compete clauses that don't have a hope in hell of surviving a legal challenge, including one that would have meant that I was agreeing to never work for any person or company in Australia who ever had with a telephone. -- Steven D'Aprano ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] blocking 2.7
On 05.07.10 16:19, Nick Coghlan wrote: > On Mon, Jul 5, 2010 at 5:20 AM, Terry Reedy wrote: >> On 7/4/2010 2:31 AM, Éric Araujo wrote: But Python tests lack coverage stats, so it is hard to say anything. >>> >>> FYI: http://coverage.livinglogic.de/ >> >> Turns out the audioop is one of the best covered modules, at 98% > > Alas, those are only the stats for the audioop test suite. audioop > itself is written in C, so the automatic coverage stats generated by > livinglogic don't provide any details. http://coverage.livinglogic.de/ *does* include coverage info for stuff written in C, see for example: http://coverage.livinglogic.de/Objects/unicodeobject.c.html However it *is* strange that test_audioop.py gets executed, but audioop.c doesn't seem to be. Servus, Walter ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing
Jesse Noller writes: > The Python / PSF license won't be changing anytime soon. The existing license for Python suits me fine. > Ben could have just have easily responded to Guido in private if he > felt that strongly. No. I responded in the same forum where the falsehood was put forth, to correct that falsehood. That's done now; thanks for your attention, all. -- \ “Timid men prefer the calm of despotism to the boisterous sea | `\of liberty.” —Thomas Jefferson | _o__) | Ben Finney ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] blocking 2.7
On Tue, Jul 6, 2010 at 1:10 PM, Walter Dörwald wrote: > http://coverage.livinglogic.de/ *does* include coverage info for stuff > written in C, see for example: > > http://coverage.livinglogic.de/Objects/unicodeobject.c.html > > However it *is* strange that test_audioop.py gets executed, but > audioop.c doesn't seem to be. It looks as though none of the extension modules (besides those that are compiled statically into the interpreter) are reporting coverage. I wonder whether the correct flags are being passed to the module build stage? Incidentally, there doesn't seem to be any of the usual 'make' output I'd associate with the module-building stage in the build log at: http://coverage.livinglogic.de/buildlog.txt For example, I'd expect to see the string 'mathmodule' somewhere in that output. Mark ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing // PSF // Motion of non-confidence
Steven D'Aprano writes: > On Tue, 6 Jul 2010 01:58:26 pm Stephen J. Turnbull wrote: > > Licenses are written in a formal language intended to have > > precise semantics, especially in the event of a dispute going to > > court. What you wrote is precisely analogous to "a computer program > > should be understandable to non-programmer people". > > You've never used Apple's much-missed Hypertalk, have you? :) No. I was solving quadratic programs back then, and FORTRAN was much better for that. But I think it's more relevant that my mother tried writing HyperCard stacks, and gave up. On the rare occasions she wanted her computer to do something she couldn't do with MacPaint or MacWrite, she called me. She never complained about me writing programs in BASIC, even though they were totally incomprehensible to her And mentioning the "Python as executable pseudo-code" thing, I think you're way overestimating what average non-programmer people can cope with. (I'd be pleased to be proved wrong, especially by the undergrads I teach!!!) As for missing it, why would I when I've got Python? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Signs of neglect?
On Jul 04, 2010, at 06:58 PM, Éric Araujo wrote: >I’d like to volunteer to maintain a tool but I’m not sure where I can >help. I’m already proposing changes to Brett for >Tools/scripts/patchcheck.py, and I have commented on Tools/i18n bugs, >but these ones are already maintained by their authors (e.g. Barry is >assigned pygettext bugs) and I’m by no means a gettext expert. It's been a while since I did much pygettext stuff. I think Martin's basically taken it over in recent years. -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [RELEASE] Python 2.7 released
On Jul 04, 2010, at 11:03 AM, Benjamin Peterson wrote: >2010/7/4 Benjamin Peterson : >> On behalf of the Python development team, I'm jocund to announce the >> second release candidate of Python 2.7. > >Arg!!! This should, of course, be "final release". Congratulations Benjamin! -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Coverage, was: Re: blocking 2.7
On 06.07.10 15:07, Mark Dickinson wrote: > On Tue, Jul 6, 2010 at 1:10 PM, Walter Dörwald wrote: >> http://coverage.livinglogic.de/ *does* include coverage info for stuff >> written in C, see for example: >> >> http://coverage.livinglogic.de/Objects/unicodeobject.c.html >> >> However it *is* strange that test_audioop.py gets executed, but >> audioop.c doesn't seem to be. > > It looks as though none of the extension modules (besides those that > are compiled statically into the interpreter) are reporting coverage. > I wonder whether the correct flags are being passed to the module > build stage? Incidentally, there doesn't seem to be any of the usual > 'make' output I'd associate with the module-building stage in the > build log at: > > http://coverage.livinglogic.de/buildlog.txt > > For example, I'd expect to see the string 'mathmodule' somewhere in that > output. True, there seems to be a problem. I'm running ./configure --enable-unicode=ucs4 --with-pydebug and then make coverage This doesn't seem to build extension modules. However as far as I understand the Makefile, "make coverage" should build extension modules: # Default target all:build_all build_all: $(BUILDPYTHON) oldsharedmods sharedmods gdbhooks coverage: @echo "Building with support for coverage checking:" $(MAKE) clean $(MAKE) all CFLAGS="$(CFLAGS) -O0 -pg -fprofile-arcs -ftest-coverage" LIBS="$(LIBS) -lgcov" # Build the shared modules sharedmods: $(BUILDPYTHON) @case $$MAKEFLAGS in \ *s*) $(RUNSHARED) CC='$(CC)' LDSHARED='$(BLDSHARED)' LDFLAGS='$(LDFLAGS)' OPT='$(OPT)' ./$(BUILDPYTHON) -E $(srcdir)/setup.py -q build;; \ *) $(RUNSHARED) CC='$(CC)' LDSHARED='$(BLDSHARED)' LDFLAGS='$(LDFLAGS)' OPT='$(OPT)' ./$(BUILDPYTHON) -E $(srcdir)/setup.py build;; \ esac I'm rerunning now with "make && make coverage" to see if this fixes anything. Servus, Walter ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] thoughts on the bytes/string discussion
On 27 Jun, 2010, at 11:48, Greg Ewing wrote:
> Stefan Behnel wrote:
>> Greg Ewing, 26.06.2010 09:58:
>>> Would there be any sanity in having an option to compile
>>> Python with UTF-8 as the internal string representation?
>> It would break Py_UNICODE, because the internal size of a unicode character
>> would no longer be fixed.
>
> It's not fixed anyway with the 2-char build -- some
> characters are represented using a pair of surrogates.
It is for practical purposes not even fixed in 4-char builds. In 4-char builds
every Unicode code points corresponds to one item in a python unicode string,
but a base characters with combining characters is still a sequence of
characters and should IMHO almost always be treated as a single object. As an
example, given s="be\N{COMBINING DIAERESIS}" s[:2] or s[2:] is almost certainly
semanticly invalid.
Ronald
smime.p7s
Description: S/MIME cryptographic signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing
Yes. The BSD license on FreeBSD has allowed Apple to make MacOS X a completely proprietary product. The BSD license allows you to take and never release your mods. It has very little to do with money, IMO. On Tue, Jul 6, 2010 at 1:22 AM, Ben Finney wrote: > Nir Aides writes: > >> I take "...running off with the good stuff and selling it for profit" to >> mean "creating derivative work and commercializing it as proprietary code" >> which you can not do with GPL licensed code. > > It's the “proprietary“ which is the distinguishing criterion there. The > “selling” and “commercial” is totally orthogonal to that. > > That's the point: selling, and commercial activity in general, is > explicitly encouraged and permission granted by the GPL. Too many people > speak as though it were otherwise. To those who do: Please stop. > > -- > \ “Following fashion and the status quo is easy. Thinking about | > `\ your users' lives and creating something practical is much | > _o__) harder.” —Ryan Singer, 2008-07-09 | > Ben Finney > > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/ldlandis%40gmail.com > -- --- NOTE: If it is important CALL ME - I may miss email, which I do NOT normally check on weekends nor on a regular basis during any other day. --- LD Landis - N0YRQ - de la tierra del encanto 3960 Schooner Loop, Las Cruces, NM 88012 575-448-1763 N32 21'48.28" W106 46'5.80" ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing // PSF // Motion of non-confidence
On 7/5/2010 11:47 PM, Antoine Pitrou wrote: The point of free software licenses, though (as opposed to proprietary licenses), is not mainly to go to court (to “protect IP”, as the PSF says - quite naively in my opinion); it is to enable trust among people. Yes, that is true. Open source licenses are social documents as much as they are legal documents. However, they need to be legally enforceable so as to have their intended social effect. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing // PSF // Motion of non-confidence
On 7/5/2010 8:03 PM, Steve Holden wrote: Neil Hodgson wrote: There have been moves in the past to simplify the license of Python but this would require agreement from the current rights owners including CWI and CNRI. IIRC not all of the rights owners are willing to agree to a change. That is the current position. This is a pet project of mine, but it needs round tuits that are currently in short supply. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing
LD 'Gus' Landis writes: > Yes. The BSD license on FreeBSD has allowed Apple to > make MacOS X a completely proprietary product. That's simply not true. http://www.opensource.apple.com/release/mac-os-x-1064/. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing
I stand corrected. Thanks for the pointer Stephen! On Tue, Jul 6, 2010 at 10:36 AM, Stephen J. Turnbull wrote: > LD 'Gus' Landis writes: > > Yes. The BSD license on FreeBSD has allowed Apple to > > make MacOS X a completely proprietary product. > > That's simply not true. > http://www.opensource.apple.com/release/mac-os-x-1064/. > -- --- NOTE: If it is important CALL ME - I may miss email, which I do NOT normally check on weekends nor on a regular basis during any other day. --- LD Landis - N0YRQ - de la tierra del encanto 3960 Schooner Loop, Las Cruces, NM 88012 575-448-1763 N32 21'48.28" W106 46'5.80" ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing // PSF // Motion of non-confidence
On Jul 6, 2010, at 8:09 AM, Steven D'Aprano wrote: > You've never used Apple's much-missed Hypertalk, have you? :) on mailingListMessage get the message put it into aMessage if the thread of aMessage contains license wankery put aMessage into the trash end if end mailingListMessage ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Include datetime.py in stdlib or not?
This idea has been discussed extensively in this and other forums and I believe it is time to make a decision. The proposal is to add pure python implementation of datetime module to stdlib. The current C implementation will transparently override pure python definitions in CPython. Other python implementations will have an option of supplying their own fast implementation. This approach has already been adopted by several modules including pickle, heapq and warnings. It has even been suggested [1] that this is the direction in which the majority of CPython extension modules should be heading. This proposal has brought mostly positive feedback on the tracker [2] with only a few objections being raised. 1. Since this does not bring any new functionality and datetime module is not expected to evolve, there is no need for pure python version. 2. There are other areas of stdlib that can benefit more from pure python equivalents. 3. Reference implementations should be written by a senior CPython developer and not scraped from external projects like PyPy. Let me briefly address these objections: 1. Availability of pure python equivalents of standard library modules is very helpful for debugging python applications. This is particularly true when the stdlib module is designed to be extendable by and calls into user-supplied code. This is true in the case of datetime module which relies on 3rd-party or user-supplied code for any timezone support. The datetime module indeed saw very little development in the last 6 years. However this lack of development may itself be the result of pure python version not being available. For example, the idea to supply a concrete tzinfo object representing UTC has been brought up back in 2002. [3] An RFE [4] was created in the tracker in January, 2009 and took more than 1.5 years to implement. If you look at the history of issue5094, you will see that development slowed down considerably when C coding started. Note that for this particular feature, there was probably no need to have it implemented in C to begin with. (Most common operations involve datetime objects in the same timezone and those don't need to call timezone methods.) 2. Unlike other areas of stdlib, datetime module was originally prototyped in python and it turns out that it hardly changed between python 2.3 and 2.6 with a couple of features added in 2.7. A port to 3.x was uneventful as well. 3. The version of datetime.py [5] that I propose for inclusion is substantially the pure python prototype written by Tim Peters and others back in 2003. The PyPy changes are very few [6]. I believe the code is substantially ready for inclusion. There are a few items that need to be fixed related to how floating point arguments to timedelta are handled, as well as some clean-up of docstrings and error messages (both C and python implementations can see some improvement in this area). The biggest item in terms of development effort would be to refactor test_datetime to test both implementations. A simple solution [7] of importing test_datetime twice with and without _datetime will probably not be accepted because it is not compatible with alternative unittest runners. What do you think? Please reply here or add a comment at http://bugs.python.org/issue7989. [1] http://bugs.python.org/issue5094#msg106498 [2] http://bugs.python.org/issue7989 [3] http://www.zope.org/Members/fdrake/DateTimeWiki/SuggestedRequirements [4] http://bugs.python.org/issue5094 [5] http://svn.python.org/view/*checkout*/sandbox/branches/py3k-datetime/datetime.py [6] http://bugs.python.org/file17701/datetime-sandbox-pypy.diff [7] http://bugs.python.org/file17848/issue7989.diff ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Include datetime.py in stdlib or not?
On Tue, Jul 6, 2010 at 12:59, Alexander Belopolsky wrote: > This idea has been discussed extensively in this and other forums and > I believe it is time to make a decision. > > The proposal is to add pure python implementation of datetime module > to stdlib. The current C implementation will transparently override > pure python definitions in CPython. Other python implementations will > have an option of supplying their own fast implementation. This > approach has already been adopted by several modules including pickle, > heapq and warnings. It has even been suggested [1] that this is the > direction in which the majority of CPython extension modules should be > heading. > > This proposal has brought mostly positive feedback on the tracker [2] > with only a few objections being raised. > > 1. Since this does not bring any new functionality and datetime module > is not expected to evolve, there is no need for pure python version. > 2. There are other areas of stdlib that can benefit more from pure > python equivalents. > 3. Reference implementations should be written by a senior CPython > developer and not scraped from external projects like PyPy. I should mention that PyPy has said they are quite happy to donate their datetime implementation which is what Alexander (I believe) has been working off of. Also, adding a pure Python version alleviates the need of the other VMs from having to maintain the same module separately. Making the stdlib shareable (and thus eventually breaking it out from CPython) was discussed at the language summit at PyCon 2010 and generally agreed upon, and this is a step towards making that happen. -Brett > > Let me briefly address these objections: > > 1. Availability of pure python equivalents of standard library modules > is very helpful for debugging python applications. This is > particularly true when the stdlib module is designed to be extendable > by and calls into user-supplied code. This is true in the case of > datetime module which relies on 3rd-party or user-supplied code for > any timezone support. > > The datetime module indeed saw very little development in the last 6 > years. However this lack of development may itself be the result of > pure python version not being available. For example, the idea to > supply a concrete tzinfo object representing UTC has been brought up > back in 2002. [3] An RFE [4] was created in the tracker in January, > 2009 and took more than 1.5 years to implement. If you look at the > history of issue5094, you will see that development slowed down > considerably when C coding started. Note that for this particular > feature, there was probably no need to have it implemented in C to > begin with. (Most common operations involve datetime objects in the > same timezone and those don't need to call timezone methods.) > > 2. Unlike other areas of stdlib, datetime module was originally > prototyped in python and it turns out that it hardly changed between > python 2.3 and 2.6 with a couple of features added in 2.7. A port to > 3.x was uneventful as well. > > 3. The version of datetime.py [5] that I propose for inclusion is > substantially the pure python prototype written by Tim Peters and > others back in 2003. The PyPy changes are very few [6]. > > I believe the code is substantially ready for inclusion. There are a > few items that need to be fixed related to how floating point > arguments to timedelta are handled, as well as some clean-up of > docstrings and error messages (both C and python implementations can > see some improvement in this area). The biggest item in terms of > development effort would be to refactor test_datetime to test both > implementations. A simple solution [7] of importing test_datetime > twice with and without _datetime will probably not be accepted because > it is not compatible with alternative unittest runners. > > What do you think? Please reply here or add a comment at > http://bugs.python.org/issue7989. > > [1] http://bugs.python.org/issue5094#msg106498 > [2] http://bugs.python.org/issue7989 > [3] http://www.zope.org/Members/fdrake/DateTimeWiki/SuggestedRequirements > [4] http://bugs.python.org/issue5094 > [5] > http://svn.python.org/view/*checkout*/sandbox/branches/py3k-datetime/datetime.py > [6] http://bugs.python.org/file17701/datetime-sandbox-pypy.diff > [7] http://bugs.python.org/file17848/issue7989.diff > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/brett%40python.org > ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] blocking 2.7
On Tue, Jul 6, 2010 at 10:10 PM, Walter Dörwald wrote: > On 05.07.10 16:19, Nick Coghlan wrote: > http://coverage.livinglogic.de/ *does* include coverage info for stuff > written in C, see for example: > > http://coverage.livinglogic.de/Objects/unicodeobject.c.html Ah, I missed that. Cool. > However it *is* strange that test_audioop.py gets executed, but > audioop.c doesn't seem to be. There do seem to be a *lot* of N/A's against the C code (that's why I thought the C code wasn't included in the stats collection in the first place). Regards, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing
I think there are a couple of potential action items that have come out of the discussion. 1. Python License If there is not already, could there be an explanatory note, something like (worded to be 'neutral': "The Python License is complicated because Python has been developed at various times under the auspices of four different organizations. Each retains ownership of the code developed or contributed during its tenure and continues to license its portion of the code under its own Python license." Perhaps add: "The PSF cannot unilaterally change this." It would be nice if a layperson summary could be added: "Overall, the Python License is similar to the MIT license." and even "Basically, you can do what you want as long as you do it at your own risk and do not claim ownership of either the code or the name Python." Such paraphrases have been posted on Python-list, though without legal standing. But I would understand if our lawyer objected that for the PSF, rather than individuals, to say the same would somehow give the paraphrase a legal standing it should not have. 2. Contibutor License I signed this some time ago, but wondered a bit about the discrepancy between this and the distribution license. I appreciate that Anatoly's question about the same has elicited an explanation that I can understand: The PSF requests that we give the PSF a clear, understandable license that allows the PSF both to distribute our contributions *and* to re-license it under the complicated license that it is forced to use for distribution. To put it another way: the contributor agreement is simple so contributors do not have to bother (as contributors) with the complications of the distribution license. Perhaps this could be clearer on the contributor license page. PS to Anatoly: I hope your questions, at least on the contributor agreement, are sufficiently well answered that you will sign it, send it in, and continue contributing. I say this as someone who did read and think about it and decide there was nothing to worry about because I would keep ownership of my words, trusted that they would appear in at least one more Python version, and otherwise did not excessively care what PSF did with them. I also say this as someone who currently would not upload a package of mine to the PyPI repository because for that I *would* care. --- Comment on trust. Trust works both ways. So does distrust. Asking contributors to give written licenses in addition to the license implicit in the act of contribution is an act of distrust. It says something like "We worry that you might change you mind and sue, and a court might not immediately toss the suit." So it should not surprise if the occasional person reacts with overt hurt and distrust. -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Include datetime.py in stdlib or not?
On Wed, Jul 7, 2010 at 5:59 AM, Alexander Belopolsky wrote: > What do you think? Please reply here or add a comment at > http://bugs.python.org/issue7989. (For those that haven't read the tracker discussion, it's long, but worth skimming to get a better idea of the various points of view). +1 on the general idea, but I haven't looked at the patches in order to be able to comment on the specifics (except that following any of the test_warnings, test_heapq, test_pickle, test_io, etc. styles of testing parallel implementations should be fine). Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing
On Wed, Jul 7, 2010 at 7:05 AM, Terry Reedy wrote: > Asking contributors to give written licenses in addition to the license > implicit in the act of contribution is an act of distrust. It says something > like "We worry that you might change you mind and sue, and a court might not > immediately toss the suit." So it should not surprise if the occasional > person reacts with overt hurt and distrust. The other (IMO, more important) element to it is that it acts as an assertion that the developer actually *has* the rights to contribute the code they're contributing. So, rather than being worried about someone changing their mind about their contributions (although that's admittedly part of it), we're more concerned that contributors actually think about who owns the copyright on the code they're offering and make sure the appropriate permissions are in place. For example, if you look at some of the code that even Guido has submitted (e.g. pgen2), that's actually come in under Google's contributor agreement, rather than Guido's personal one. Presumably that was work he did on company time, so the copyright actually rests with Google rather than Guido. Cheers, Nick. -- Nick Coghlan | [email protected] | Brisbane, Australia ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing
On Tue, Jul 6, 2010 at 11:05 PM, Terry Reedy wrote: > 1. Python License > > If there is not already, could there be an explanatory note, something like > (worded to be 'neutral': As a sub-point, I'd like to see something short explaining how the different licenses in the LICENSE file are meat to be combined. At the moment the terms and conditions section just lists them without explanation. Schiavo Simon ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Licensing
Terry Reedy wrote: > Comment on trust. Trust works both ways. So does distrust. > > Asking contributors to give written licenses in addition to the license > implicit in the act of contribution is an act of distrust. It says > something like "We worry that you might change you mind and sue, and a > court might not immediately toss the suit." So it should not surprise if > the occasional person reacts with overt hurt and distrust. The written contributor agreements are needed to enable the PSF to defend the IP in the Python software. They are just a legal tool, nothing more. Note that the PSF doesn't relicense the contributed code under the whole license stack. The contributed code is (currently) being relicensed under the PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2 (the top part of the stack), which is a very straight forward BSD-style license. The other licenses in the stack only apply to the code owned by the resp. parties CWI, CNRI, BeOpen and the cast of thousands (which fortunately didn't get to send in their lawyers and still had a very good time). Apart from that, the Python distribution also comes with 3rd party code under various other BSD-style licenses. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 06 2010) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2010-07-19: EuroPython 2010, Birmingham, UK12 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken
[Also posted to http://bugs.python.org/issue2986
Developed with input from Eli Bendersky, who will write patchfile(s) for
whichever change option is chosen.]
Summary: difflib.SeqeunceMatcher was developed, documented, and
originally operated as "a flexible class for comparing pairs of
sequences of any [hashable] type". An "experimental" heuristic was added
in 2.3a1 to speed up its application to sequences of code lines, which
are selected from an unbounded set of possibilities. As explained below,
this heuristic partly to completely disables SequenceMatcher for
realistic-length sequences from a small finite alphabet. The regression
is easy to fix. The docs were never changed to reflect the effect of the
heuristic, but should be, with whatever additional change is made.
In the commit message for revision 26661, which added the heuristic, Tim
Peters wrote "While I like what I've seen of the effects so far, I still
consider this experimental. Please give it a try!" Several people who
have tried it discovered the problem with small alphabets and posted to
the tracker. Issues #1528074, #1678339. #1678345, and #4622 are
now-closed duplicates of #2986. The heuristic needs revision.
Open questions (discussed after the examples): what exactly to do, which
versions to do it too, and who will do it.
---
Some minimal difference examples:
from difflib import SequenceMatcher as SM
# base example
print(SM(None, 'x' + 'y'*199, 'y'*199).ratio())
# should be and is 0.9975 (rounded)
# make 'y' junk
print(SM(lambda c:c=='y', 'x' + 'y'*199, 'y'*199).ratio())
# should be and is 0.0
# Increment b by 1 char
print(SM(None, 'x' + 'y'*199, 'y'*200).ratio())
# should be .995, but now is 0.0 because y is treated as junk
# Reverse a and b, which increments b
print(SM(None, 'y'*199, 'x' + 'y'*199).ratio())
# should be .9975, as before, but now is 0.0 because y is junked
The reason for the bug is the heuristic: if the second sequence is at
least 200 items long then any item occurring more than one percent of
the time in the second sequence is treated as junk. This was aimed at
recurring code lines like 'else:' and 'return', but can be fatal for
small alphabets where common items are necessary content.
A more realistic example than the above is comparing DNA gene sequences.
Without the heuristic SequenceMatcher.get_opcodes() reports an
appropriate sequence of matches and edits and .ratio works as documented
and expected. For 1000/2000/6000 bases, the times on a old Athlon 2800
machine are <1/2/12 seconds. Since 6000 is longer than most genes, this
is a realistic and practical use.
With the heuristic, everything is junk and there is only one match,
''=='' augmented by the initial prefix of matching bases. This is
followed by one edit: replace the rest of the first sequence with the
rest of the second sequence. A much faster way to find the first
mismatch would be
i = 0
while first[i] == second[i]:
i+=1
The match ratio, based on the initial matching prefix only, is
spuriously low.
---
Questions:
1: what change should be make.
Proposed fix: Disentangle the heuristic from the calculation of the
internal b2j dict that maps items to indexes in the second sequence b.
Only apply the heuristic (or not) afterward.
Version A: Modify the heuristic to only eliminate common items when
there are more than, say, 100 items (when len(b2j)> 100 where b2j is
first calculated without popularity deletions).
The would leave DNA, protein, and printable ascii+[\n\r\t] sequences
alone. On the other hand, realistic sequences of more than 200 code
lines should have at least 100 different lines, and so the heuristic
should continue to be applied when it (mostly?) 'should' be. This change
leaves the API unchanged and does not require a user decision.
Version B: add a parameter to .__init__ to make the heuristic optional.
If the default were True ('use it'), then the code would run the same as
now (even when bad). With the heuristic turned off, users would be able
to get the .ratio they may expect and need. On the other hand, users
would have to understand the heuristic to know when and when not to use it.
Version C: A more radical alternative would be to make one or more of
the tuning parameters user settable, with one setting turning it off.
2. What type of issue is this, and what version get changed.
I see the proposal as partial reversion of a change that sometimes
causes a regression, in order to fix the regression. Such would usually
be called a bugfix. Other tracker reviewers claim this issue is a
feature request, not a bugfix. Either way, 3.2 gets the fix. The
practical issue is whether at least 2.7(.1) should get the fix, or
whether the bug should forever continue in 2.x.
3. Who will make the change.
Eli will write a patch and I will check it. However, Georg Brandel
assigned the issue to Tim Peters, with a request for comment, but Tim
never responded. Is there
Re: [Python-Dev] Mercurial migration readiness
On Fri, Jul 2, 2010 at 3:34 PM, Antoine Pitrou wrote: >> >> > After the switch, hg.python.org/cpython will be the official repo, and >> > code.python.org/hg will probably be closed. >> >> Why this transition is not described in PEP? > > Because it's not a transition. It's a mirror. It was put in place > before the hg migration plan was accepted, IIRC. Where is this migration plan then if it is not in PEP? >> How code.python.org/hg is synchronized with Subversion? > > What does your question mean exactly? It's a mirror (well, a set of > mirrors) and is synchronized roughly every 5 minutes. Method. Software used, which parameters are set for it, how to repeat the process? >> Why it is not possible to leave code.python.org/hg as is in slave mode >> and then realtime replication is ready just switch master/slave over? > > The two sets of repositories use different conversion tools and rules. > They have nothing in common (different changeset IDs, different > metadata, different branch/clone layout). That would be nice to hear about in more detail. As I understand there is no place where it is described. I already see +1 from Fred Drake and another +1 from Steve Holden down the thread. However, Antoine Pitrou, Dirkjan Ochtman and Jesse Noller object. They afraid that contributors won't survive low-level details about Mercurial migration. I'd say there a plenty of ways isolate them and at the same time satisfy "Mercurial aficionados" either on the same page or in different places. On Fri, Jul 2, 2010 at 4:06 PM, Stephen J. Turnbull wrote: > > There is no reason at this point to suppose the transition can't be > complete by the end of summer. However, as always, the devil is in > the details, and one of them may be a showstopper. We'll just have to > see about that. The transition can be complete in a few minutes. The question is how good it will be. As there are no plan, no roadmap, no status - it is hard to judge if it is feasible at all. Ok. Given that nobody is able/willing to say anything more - I've gathered all your feedback concerning current status of Mercurial migration on this Wave - https://wave.google.com/wave/waveref/googlewave.com/w+4_fnAVHwA I hope you will find the time to enhance it with more info so not contributors proficient with Mercurial could help to speed up the transition. -- anatoly t. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mercurial migration readiness
On Tue, Jul 6, 2010 at 7:47 PM, anatoly techtonik wrote: > On Fri, Jul 2, 2010 at 3:34 PM, Antoine Pitrou wrote: >>> >>> > After the switch, hg.python.org/cpython will be the official repo, and >>> > code.python.org/hg will probably be closed. >>> >>> Why this transition is not described in PEP? >> >> Because it's not a transition. It's a mirror. It was put in place >> before the hg migration plan was accepted, IIRC. > > Where is this migration plan then if it is not in PEP? > >>> How code.python.org/hg is synchronized with Subversion? >> >> What does your question mean exactly? It's a mirror (well, a set of >> mirrors) and is synchronized roughly every 5 minutes. > > Method. Software used, which parameters are set for it, how to repeat > the process? > >>> Why it is not possible to leave code.python.org/hg as is in slave mode >>> and then realtime replication is ready just switch master/slave over? >> >> The two sets of repositories use different conversion tools and rules. >> They have nothing in common (different changeset IDs, different >> metadata, different branch/clone layout). > > That would be nice to hear about in more detail. As I understand there > is no place where it is described. I already see +1 from Fred Drake > and another +1 from Steve Holden down the thread. > > However, Antoine Pitrou, Dirkjan Ochtman and Jesse Noller object. They > afraid that contributors won't survive low-level details about > Mercurial migration. I'd say there a plenty of ways isolate them and > at the same time satisfy "Mercurial aficionados" either on the same > page or in different places. No, I don't need you misrepresenting anything I've said Anatoly - I said there's no need to maintain SVN alongside mercurial after we convert, and doing so is silly. I maintain that once we convert, we very happily stay converted, and drop official "other" mirrors unless other volunteers step up to maintain them. I have no problem with additional documentation should people wish to volunteer to write it. We do not work for you Anatoly. > On Fri, Jul 2, 2010 at 4:06 PM, Stephen J. Turnbull > wrote: >> >> There is no reason at this point to suppose the transition can't be >> complete by the end of summer. However, as always, the devil is in >> the details, and one of them may be a showstopper. We'll just have to >> see about that. > > The transition can be complete in a few minutes. The question is how > good it will be. As there are no plan, no roadmap, no status - it is > hard to judge if it is feasible at all. No. There is no question except in your mind. We all have a rough idea of the status, modulo the PEPs being updated. It is also perfectly feasible. I would love it, and offer you a christmas card if you could drop the hyperbole and misrepresentation. > > Ok. Given that nobody is able/willing to say anything more - I've > gathered all your feedback concerning current status of Mercurial > migration on this Wave - > https://wave.google.com/wave/waveref/googlewave.com/w+4_fnAVHwA I > hope you will find the time to enhance it with more info so not > contributors proficient with Mercurial could help to speed up the > transition. While the summary is nice; your wave entry has nothing to do with the mercurial transition, if you want to help, please ask someone to take on an open task, or volunteer to write/accentuate the PEPs, or help with documentation for post-migration workflow. Your contributions can be effective and useful, rather than noisemaking and abrasive. The mercurial transition will occur, barring someone directly involved finding show-stopping reasons otherwise, with or without you. The decision was made some time ago, and despite your recent noisemaking, will continue on. jesse ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken
On Tue, Jul 6, 2010 at 7:18 PM, Terry Reedy wrote: > [Also posted to http://bugs.python.org/issue2986 > A much faster way to find the first mismatch would be > i = 0 > while first[i] == second[i]: > i+=1 > The match ratio, based on the initial matching prefix only, is spuriously > low. > > I don't have much experience with the Python sequence matcher, but many classical edit distance and alignment algorithms benefit from stripping any common prefix and suffix before engaging in heavy-lifting. This is trivially optimal for Hamming-like distances and easily shown to be for Levenshtein and Damerau type distances. -Kevin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Include datetime.py in stdlib or not?
On 7/6/2010 3:59 PM, Alexander Belopolsky wrote: I am more interested in Brett's overall vision than this particular module. I understand that to be one of a stdlib that is separate from CPython and is indeed the standard Python library. Questions: !. Would the other distributions use a standard stdlib rather than current individual versions? If so, and if at least one used the Python version of each module, this would alleviate the concern that non-use == non-testing. (Test improvement would also help this.) 2. Would the other distributions pool their currently separate stdlib efforts to help maintain one standard stdlib. If so, this would alleviate the concern about the extra effort to maintain both a C and Python version. (Test improvement would also help this also.) 3. What version of Python would be allowed for use in the stdlib? I would like the stdlib for 3.x to be able to use 3.x code. This would be only a minor concern for CPython as long as 2.7 is maintained, but a major concern for the other implementation currently 'stuck' in 2.x only. A good 3to2 would be needed. I generally favor having Python versions of modules available. My current post on difflib.SequenceMatcher is based on experiments with an altered version. I copied difflib.py to my test directory, renamed it diff2lib.py, so I could import both versions, found and edited the appropriate method, and off I went. If difflib were in C, my post would have been based on speculation about how a fixed version would operate, rather than on data. 4. Does not ctypes make it possible to replace a method of a Python-coded class with a faster C version, with something like try: connect to methods.dll check that function xyx exists replace Someclass.xyy with ctypes wrapper except: pass For instance, the SequenceMatcher heuristic was added to speedup the matching process that I believe is encapsulated in one O(n**2) or so bottleneck method. I believe most everything else is O(n) bookkeeping. This proposal has brought mostly positive feedback on the tracker [2] with only a few objections being raised. 1. Since this does not bring any new functionality and datetime module is not expected to evolve, there is no need for pure python version. see above 2. There are other areas of stdlib that can benefit more from pure python equivalents. Possibly true, but developers do what they do, and this seems mostly done. 3. Reference implementations should be written by a senior CPython developer and not scraped from external projects like PyPy. I did not see that im my reading of the thread. In any case, what matters is quality, not authorship. > What do you think? Please reply here or add a comment at > http://bugs.python.org/issue7989. From scanning that and the posts here, it seems like a pep or other doc on dual version modules would be a good idea. It should at least document how to code the switch from python version to the x coded version and how to test both, as discussed. -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken
[Terry Reedy] > [Also posted to http://bugs.python.org/issue2986 > Developed with input from Eli Bendersky, who will write patchfile(s) for > whichever change option is chosen.] Thanks for paying attention to this, Terry (and Ed)! I somehow managed to miss the whole discussion over the intervening years :-( > Summary: difflib.SeqeunceMatcher was developed, documented, and originally > operated as "a flexible class for comparing pairs of sequences of any > [hashable] type". Although it can be used for that, its true intent was to produce intuitive diffs for revisions of text files (including code files) edited by humans. Where "intuitive" means approximately "less jarring than the often-odd diffs produced by algorithms working on some rigid mathematical notion of 'minimal edit distance'". Whether it's useful for more than that I can't say, because that's all I ever developed (or used) the algorithm for. > An "experimental" heuristic was added in 2.3a1 Big bad on me for that! At the time I fully intended to document that, and at least make it tunable, but life intervened and it got dropped on the floor. > to speed up its application to sequences of code lines, Yes, that was the intent. I was corresponding with a user at the time who had odd notions (well, by my standards) of how to format C code, which left him with many hundreds of lines containing only an open brace, or a close brace, or just a semicolon (etc). difflib spun its wheels frantically trying to sort this out, and the heuristic in question cut processing time from hours (in the worst cases) to seconds. Since that (text file comparison) was always the primary case for this class, it was worth doing something about. But it should not have gone in the way it did (undocumented & unfinished, as you correctly note). > which are selected from an > unbounded set of possibilities. As explained below, this heuristic partly to > completely disables SequenceMatcher for realistic-length sequences from a > small finite alphabet. Which wasn't an anticipated use case, so should not be favored. Slowing down difflib for what it was intended for is not a good idea - practicality beats purity. Ya, ya, I understand someone playing around with DNA sequences might find difflib tempting at first, but fix this and they're still going to be unhappy. There are much better (faster, "more standard") algorithms for comparing sequences drawn from tiny alphabets, and especially so for DNA matching. > The regression is easy to fix. The docs were never > changed to reflect the effect of the heuristic, but should be, with whatever > additional change is made. True - and always was. > In the commit message for revision 26661, which added the heuristic, Tim > Peters wrote "While I like what I've seen of the effects so far, I still > consider this experimental. Please give it a try!" Several people who have > tried it discovered the problem with small alphabets and posted to the > tracker. Issues #1528074, #1678339. #1678345, and #4622 are now-closed > duplicates of #2986. The heuristic needs revision. > > Open questions (discussed after the examples): what exactly to do, which > versions to do it too, and who will do it. > > --- > Some minimal difference examples: > > from difflib import SequenceMatcher as SM > > # base example > print(SM(None, 'x' + 'y'*199, 'y'*199).ratio()) > # should be and is 0.9975 (rounded) > > # make 'y' junk > print(SM(lambda c:c=='y', 'x' + 'y'*199, 'y'*199).ratio()) > # should be and is 0.0 > > # Increment b by 1 char > print(SM(None, 'x' + 'y'*199, 'y'*200).ratio()) > # should be .995, but now is 0.0 because y is treated as junk > > # Reverse a and b, which increments b > print(SM(None, 'y'*199, 'x' + 'y'*199).ratio()) > # should be .9975, as before, but now is 0.0 because y is junked > > The reason for the bug is the heuristic: if the second sequence is at least > 200 items long then any item occurring more than one percent of the time in > the second sequence is treated as junk. This was aimed at recurring code > lines like 'else:' and 'return', but can be fatal for small alphabets where > common items are necessary content. Indeed, it makes no sense at all for tiny alphabets. OTOH, as above, it gave factor-of-thousands speedups for intended use cases, and that's more important to me. There should certainly be a way to turn off the "auto junk" heuristic, and to tune it, but - sorry for being pragmatic ;-) - it was a valuable speed improvement for what I expect still remain difflib's overwhelmingly most common use cases. > A more realistic example than the above is comparing DNA gene sequences. Comparing DNA sequences is realistic, but using SequenceMatcher to do so is unrealistic except for a beginner just playing with the ideas. There should be a way to disable the heuristic so the beginner can have their fun, but any serious work in this area will need to use different algorithms. > Without the heuristic SequenceMatcher.
Re: [Python-Dev] Licensing
On Tue, Jul 6, 2010 at 11:27 PM, Nick Coghlan wrote: > For example, if you look at some of the code that even Guido has > submitted (e.g. pgen2), that's actually come in under Google's > contributor agreement, rather than Guido's personal one. Presumably > that was work he did on company time, so the copyright actually rests > with Google rather than Guido. I hope you are misremembering some details. I did that work while at Elemental Security (i.e. before I joined Google). It should have Elemental Security's contributor agreement. I developed that code initially for inclusion in Elemental's product line (as part of a parser for a domain-specific language named "Fuel" which did not get open-sourced -- probably for the better. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] thoughts on the bytes/string discussion
Ronald Oussoren, 06.07.2010 16:51:
On 27 Jun, 2010, at 11:48, Greg Ewing wrote:
Stefan Behnel wrote:
Greg Ewing, 26.06.2010 09:58:
Would there be any sanity in having an option to compile Python
with UTF-8 as the internal string representation?
It would break Py_UNICODE, because the internal size of a unicode
character would no longer be fixed.
It's not fixed anyway with the 2-char build -- some characters are
represented using a pair of surrogates.
It is for practical purposes not even fixed in 4-char builds. In 4-char
builds every Unicode code points corresponds to one item in a python
unicode string, but a base characters with combining characters is still
a sequence of characters and should IMHO almost always be treated as a
single object. As an example, given s="be\N{COMBINING DIAERESIS}" s[:2]
or s[2:] is almost certainly semanticly invalid.
Sure. However, this is not a problem for the purpose of the C-API,
especially for Cython (which is the angle from which I brought this up).
All Cython cares about is that it mimics CPython's sematics excactly when
transforming code, and a CPython runtime will ignore surrogate pairs and
combining characters during iteration and indexing, and when determining
the string length. So a single character unicode string can currently be
safely aliased by Py_UNICODE with correct Python semantics. That would no
longer be the case if the internal representation switched to UTF-8 and/or
if CPython started to take surrogates and combining characters into account
when considering the string length.
Note that it's impossible to determine if a unicode string contains
surrogate pairs because it's running on a narrow unicode build or because
the user entered them into the string. But the user would likely expect the
second case to treat them as separate code points, whereas the first is an
implementation detail that should normally be invisible. Combining
characters are a lot clearer here, as they can only be entered by users, so
keeping them separate as provided is IMHO the expected behaviour.
I think the main theme here is that the interpretation of code points and
their transformation for user interfaces and backends is left to the user
code. Py_UNICODE represents a code point in the current system, including
surrogate pair 'escapes'. And that would change if the underlying encoding
switched to something other than UTF-16/UCS-4.
Stefan
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
