date:20100707

Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

2010-07-07 Thread Eli Bendersky

[snip]
> Yes, that was the intent.  I was corresponding with a user at the time
> who had odd notions (well, by my standards) of how to format C code,
> which left him with many hundreds of lines containing only an open
> brace, or a close brace, or just a semicolon (etc).  difflib spun its
> wheels frantically trying to sort this out, and the heuristic in
> question cut processing time from hours (in the worst cases) to
> seconds.
>
> Since that (text file comparison) was always the primary case for this
> class, it was worth doing something about.  But it should not have
> gone in the way it did (undocumented & unfinished, as you correctly
> note).
>
>> which are selected from an
>> unbounded set of possibilities. As explained below, this heuristic partly to
>> completely disables SequenceMatcher for realistic-length sequences from a
>> small finite alphabet.
>
> Which wasn't an anticipated use case, so should not be favored.
> Slowing down difflib for what it was intended for is not a good idea -
> practicality beats purity.
>
> Ya, ya, I understand someone playing around with DNA sequences might
> find difflib tempting at first, but fix this and they're still going
> to be unhappy.  There are much better (faster, "more standard")
> algorithms for comparing sequences drawn from tiny alphabets, and
> especially so for DNA matching.

Tim, thanks for your insights. In response to the description above,
however, I would like to explain my use case, which originally got me
interested in this issue with SequenceMatcher.

I was not comparing DNAs, but using SequenceMatcher in my automatic
testbench checker that verified the output of some logic design. I
didn't want exact comparisons, so I was very happy to see
difflib.SequenceMatcher in stdlib, with its useful ratio/quick_ratio
functions. I was comparing the output sequence to an expected sequence
with a 0.995 ratio threshold and was very happy. Until my sequence got
longer than 200 elements...

So this isn't DNA, and the alphabet wasn't too tiny, but on the other
hand there was nothing in the module to suggest that it should be only
used to comparing lines in files. On the contrary, its
general-sounding name - SequenceMatcher, lulled me into the (false?)
belief that I can just use it for my sequence comparison without
worrying about finding better algorithms or implementing stuff like
edit distance myself. Judging by the comments in other related issues,
I'm far from being the only one.

Therefore, I think that you should just admit that your excellent
module became useful for more purposes than you originally intended it
for :-) !! I completely respect your desire to keep the "intended
purposes" as fast as possible, but there are solutions (some of which
were presented by Terry) that can make it more useful without any harm
to the performance of the intended purpose.

As Terry noted, I will be very happy to submit a patch with tests for
whatever decision that will be reached by pydev on this matter.

Eli
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mercurial migration readiness

2010-07-07 Thread Éric Araujo

>> Because it's not a transition. It's a mirror. It was put in place
>> before the hg migration plan was accepted, IIRC.
> Where is this migration plan then if it is not in PEP?

The “hg migration plan” is PEP 385. It means moving from svn.python.org
to hg.python.org.

It is not possible to make code.python.org/hg a mirror of the future
official hg repo (a quick search on the hgbook or the Mercurial wiki
will tell you how changeset hashes work). The code.p.o/hg repo was a
simple read-only mirror provided as a service to the community. Now that
the official repo will be a new, clean Mercurial repo, the current
mirror will become redundant and will probably be closed. People using
that mirror to follow cpython development and/or propose patches will
just redo a clone and rebase their patches.

>> The two sets of repositories use different conversion tools and rules.
>> They have nothing in common (different changeset IDs, different
>> metadata, different branch/clone layout).
> That would be nice to hear about in more detail. As I understand there
> is no place where it is described.

The layout of the future official repo is in PEP 385. The layout of the
old mirror can be understood by looking at the Web interface (one repo
per svn maintenance branch, etc.)

> The transition can be complete in a few minutes. The question is how
> good it will be. As there are no plan, no roadmap, no status

The transition is not just a command to be run, it includes discussion
and writing tools before that, and feedback and support after that. What
we call transition is the whole process. People have worked and are
still working on the PEP, on the repo, on the policy and on the
documentation, and it is unfair not to acknowledge that work and say
there is no plan.

Regards

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-07-07 Thread M.-A. Lemburg

Ronald Oussoren wrote:
> 
> On 27 Jun, 2010, at 11:48, Greg Ewing wrote:
> 
>> Stefan Behnel wrote:
>>> Greg Ewing, 26.06.2010 09:58:
 Would there be any sanity in having an option to compile
 Python with UTF-8 as the internal string representation?
>>> It would break Py_UNICODE, because the internal size of a unicode character 
>>> would no longer be fixed.
>>
>> It's not fixed anyway with the 2-char build -- some
>> characters are represented using a pair of surrogates.
> 
> It is for practical purposes not even fixed in 4-char builds. In 4-char 
> builds every Unicode code points corresponds to one item in a python unicode 
> string, but a base characters with combining characters is still a sequence 
> of characters and should IMHO almost always be treated as a single object. As 
> an example, given s="be\N{COMBINING DIAERESIS}" s[:2] or s[2:] is almost 
> certainly semanticly invalid.

Just to clarify: Python uses code units for Unicode storage.

Whether those code units map to code points or glyphs depends
on the used Python build and the code points in question.

See
http://www.egenix.com/library/presentations/#PythonAndUnicode
for more background information (esp. page 8).

Note that using UTF-8 as internal storage format would not work
in Python, since Python is a Unicode producer, i.e. it needs to
be able to generate and work with code points that are not allowed
in UTF-8, e.g. lone surrogates.

Another reason not to use UTF-8 encoded code units is that slicing
based on code units could easily create invalid UTF-8 which would
then render the data unusable. This is a lot less likely to happen
with UCS2 or UCS4.

And finally: RAM is cheap and today's CPUs work better with 16- or
32-bit values than 8-bit characters.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jul 07 2010)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2010-07-19: EuroPython 2010, Birmingham, UK11 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mercurial migration readiness

2010-07-07 Thread Dirkjan Ochtman

On Wed, Jul 7, 2010 at 01:47, anatoly techtonik  wrote:
> That would be nice to hear about in more detail. As I understand there
> is no place where it is described. I already see +1 from Fred Drake
> and another +1 from Steve Holden down the thread.
>
> However, Antoine Pitrou, Dirkjan Ochtman and Jesse Noller object. They
> afraid that contributors won't survive low-level details about
> Mercurial migration.

I'm happy to answer direct questions about the transition process I
have in mind (and mostly written down in the PEP), or the resulting hg
repository. I think the PEP has some details about what I think
constitutes a good conversion. If there are things that Fred or Steve
miss from that discussion, I'd be happy to add to the PEP.

Cheers,

Dirkjan
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-07-07 Thread Greg Ewing


M.-A. Lemburg wrote:


Note that using UTF-8 as internal storage format would not work
in Python, since Python is a Unicode producer, i.e. it needs to
be able to generate and work with code points that are not allowed
in UTF-8, e.g. lone surrogates.


Well, it wouldn't strictly be UTF-8, any more than the
2-byte build is strictly UTF-16, in the sense that lone
surrogates can be produced.


Another reason not to use UTF-8 encoded code units is that slicing
based on code units could easily create invalid UTF-8 which would
then render the data unusable. This is a lot less likely to happen
with UCS2 or UCS4.


The use cases I had in mind for a 1-byte build are those for
which the alternative would be keeping everything in bytes.
Applications using a 1-byte build would need to be aware of
the fact and take care to slice strings at valid places. If
they were using bytes, they would have to face exactly the
same issues.


And finally: RAM is cheap and today's CPUs work better with 16- or
32-bit values than 8-bit characters.


Yet some people have reported significant performance benefits
for some applications from using a 2-byte build instead of a
4-byte build. I was just speculating whether a 1-byte build
might be of further advantage in a few specialised cases.

No matter how much RAM or processing speed you have, it's always
possible to find an application that stresses the limits.

--
Greg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-07-07 Thread Antoine Pitrou

On Wed, 07 Jul 2010 11:13:09 +0200
"M.-A. Lemburg"  wrote:
> 
> And finally: RAM is cheap and today's CPUs work better with 16- or
> 32-bit values than 8-bit characters.

The latter is wrong. There is no cost in accessing bytes
rather than words on modern CPUs.
(actually, bytes are cheaper overall since they cost less cache)

Regards

Antoine.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

2010-07-07 Thread Antoine Pitrou

On Tue, 06 Jul 2010 19:18:09 -0400
Terry Reedy  wrote:
> 
> Version A: Modify the heuristic to only eliminate common items when 
> there are more than, say, 100 items (when len(b2j)> 100 where b2j is 
> first calculated without popularity deletions).
[...]
> 
> Version B: add a parameter to .__init__ to make the heuristic optional. 
[...]
> 
> Version C: A more radical alternative would be to make one or more of 
> the tuning parameters user settable, with one setting turning it off.

Version B would have my favour (but please make the default be True). 
Version A can lead to regressions (including performance regressions
such as described by Tim), and version C looks far more complicated to
use.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Licensing

2010-07-07 Thread Nick Coghlan

On Wed, Jul 7, 2010 at 2:59 PM, Guido van Rossum  wrote:
> On Tue, Jul 6, 2010 at 11:27 PM, Nick Coghlan  wrote:
>> For example, if you look at some of the code that even Guido has
>> submitted (e.g. pgen2), that's actually come in under Google's
>> contributor agreement, rather than Guido's personal one. Presumably
>> that was work he did on company time, so the copyright actually rests
>> with Google rather than Guido.
>
> I hope you are misremembering some details. I did that work while at
> Elemental Security (i.e. before I joined Google). It should have
> Elemental Security's contributor agreement. I developed that code
> initially for inclusion in Elemental's product line (as part of a
> parser for a domain-specific language named "Fuel" which did not get
> open-sourced -- probably for the better.

Whoops, I got my timeline wrong (it did seem a little off when I wrote
it - I think part of my brain was trying to tell me the dates didn't
match up). I must have been thinking of something else I was working
on recently that had Google's name in the header, most likely the abc
module.

So apologies for the confusion - just s/pgen2/abc/ in my example to
make it line up with my intent :)

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

2010-07-07 Thread Nick Coghlan

On Wed, Jul 7, 2010 at 9:18 AM, Terry Reedy  wrote:
> In the commit message for revision 26661, which added the heuristic, Tim
> Peters wrote "While I like what I've seen of the effects so far, I still
> consider this experimental.  Please give it a try!" Several people who have
> tried it discovered the problem with small alphabets and posted to the
> tracker. Issues #1528074, #1678339. #1678345, and #4622 are now-closed
> duplicates of #2986. The heuristic needs revision.

Python 2.3 you say...

Hmm, I've been using difflib.SequenceMatcher for years in a serial bit
error rate tester (with typical message sizes ranging from tens of
bytes to tens of thousands of bytes) that occasionally gives
unexpected results. I'd been blaming hardware glitches (and, to be
fair, all of the odd results I can recall off the top of my head were
definitively traced to problems in the hardware under test), but I
should probably check I'm not running afoul of this bug.

And Tim, the algorithm may not be optimal as a general purpose binary
diff algorithm, but it's still a hell of a lot faster than the
hardware I use it to test. Compared to the equipment configuration
times, the data comparison time is trivial.

There's another possibility here - perhaps the heuristic should be off
by default in SequenceMatcher, with a TextMatcher subclass that
enables it (and Differ and HtmlDiff then inheriting from the latter)?
There's currently barely anything in the SequenceMatcher documentation
to indicate that it is designed primarily for comparing text rather
than arbitrary sequences (the closest it gets is the reference to
Ratcliff/Obserhelp gestalt pattern matching and then the link to the
Ratcliff/Metzener Dr Dobb's article - and until this thread, I'd never
followed the article link). Rather than reverting to Tim's
undocumented vision, perhaps we should better articulate it by
separating the general purpose matcher from an optimised text matcher.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Alexander Belopolsky

On Tue, Jul 6, 2010 at 11:54 PM, Terry Reedy  wrote:
> On 7/6/2010 3:59 PM, Alexander Belopolsky wrote:
>
> I am more interested in Brett's overall vision than this particular module.
> I understand that to be one of a stdlib that is separate from CPython and is
> indeed the standard Python library.
>

I am also very much interested in the overall vision, but I would like
to keep the datetime.py thread focused, so I a going to reply to broad
questions under a separate subject.

> Questions:
>
> 1. Would the other distributions use a standard stdlib rather than current
> individual versions?

I certainly hope they will.  In the ideal word, passing test.regrtest
with unmodified Lib should be *the* definition of what is called
Python.  I understand that there is already some work underway in this
direction such as marking implementation specific tests with
appropriate decorators.

>
> 2. Would the other distributions pool their currently separate stdlib
> efforts to help maintain one standard stdlib?

I believe that making stdlib and test.regrtest more friendly to
alternative implementations will go long way towards this goal.  It
will, of course, be a decision that each project will have to make.

>
> 3. What version of Python would be allowed for use in the stdlib? I would
> like the stdlib for 3.x to be able to use 3.x code. This would be only a
> minor concern for CPython as long as 2.7 is maintained, but a major concern
> for the other implementation currently 'stuck' in 2.x only. A good 3to2
> would be needed.

Availability of python equivalents will hopefully help  "other
implementation currently 'stuck' in 2.x only" to get "unstuck" and
move to 3.x.   I understand that this is a somewhat sensitive issue at
the moment, but I believe a decision has been made supporting new
features for 2.x is outside of python-dev focus.

> 4. Does not ctypes make it possible to replace a method of a Python-coded
> class with a faster C version, with something like
>  try:
>    connect to methods.dll
>    check that function xyx exists
>    replace Someclass.xyy with ctypes wrapper
>  except: pass
> For instance, the SequenceMatcher heuristic was added to speedup the
> matching process that I believe is encapsulated in one O(n**2) or so
> bottleneck method. I believe most everything else is O(n) bookkeeping.
>
The ctypes modules is very CPython centric as far as I know.   For the
new modules, this may be a valid way to rapidly develop accelerated
versions.   For modules that are already written in C, I don't see
much benefit in replacing them with ctypes wrappers.

> [.. datetime specific discussion skipped ..]
> From scanning that and the posts here, it seems like a pep or other doc on
> dual version modules would be a good idea. It should at least document how
> to code the switch from python version to the x coded version and how to
> test both, as discussed.
>
I am certainly not ready to write such PEP.   I may be in a better
position to contribute to it after I gain more experience with
test_datetime.py.   At the moment I have more questions than answers.

For example, the established practice appears to be:

modulename.py

# Python code

try:
from _modulename import *
except:
pass

This is supposed to generate a .pyc file with no python definitions in
it if  _modulename is available.  The problem with datetime.py is that
it have several helper methods like _ymd2ord() that will still stay in
the module.  Should an "else:" clause be added to clean these up?
should these methods become class or static methods as appropriate?

The  established practice for testing is

py_module = support.import_fresh_module('modulename', blocked=['_modulename'])
c_module = support.import_fresh_module('modulename', fresh=['_modulename'])

class TestDefnitions: # not a unittest.TestCase subclass
   def test_foo(self):
self.module.foo(..)
   ...

class C_Test(TestDefnitions, unittest.TestCase):
   module = c_module

class Py_Test(TestDefnitions, unittest.TestCase):
   module = py_module

For datetime.py this approach presents several problems:

1. replacing datetime with self.module.datetime everywhere can get
messy quickly.
2. There are test classes defined at the test_datetime module level
that subclass from datetime classes.  The self.module is not available
at the module level.  These should probably be moved to setUp()
methods and attached to test case self.
3. If #2 is resolved by moving definitions inside functions, the
classes will become unpickleable and pickle tests will break.  Some
hackery involving injecting these classes into __main__ or module
globals may be required.

These challenges make datetime.py an interesting showcase for other
modules, so rather than writing a PEP based on abstract ideas, I think
it is better to get datetime.py integrated first and try to establish
the best practices on the way.
___
Python-Dev mailing list
Python-De

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Michael Foord


On 07/07/2010 16:29, Alexander Belopolsky wrote:

[snip...]


4. Does not ctypes make it possible to replace a method of a Python-coded
class with a faster C version, with something like
  try:
connect to methods.dll
check that function xyx exists
replace Someclass.xyy with ctypes wrapper
  except: pass
For instance, the SequenceMatcher heuristic was added to speedup the
matching process that I believe is encapsulated in one O(n**2) or so
bottleneck method. I believe most everything else is O(n) bookkeeping.

 

The ctypes modules is very CPython centric as far as I know.   For the
new modules, this may be a valid way to rapidly develop accelerated
versions.   For modules that are already written in C, I don't see
much benefit in replacing them with ctypes wrappers.


   


Nope, both IronPython and PyPy have ctypes implementations and Jython is 
in the process of "growing" one. Using ctypes for C extensions is the 
most portable way of providing C extensions for Python (other than 
providing a pure-Python implementation of course).


Michael

--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] thoughts on the bytes/string discussion

2010-07-07 Thread Stephen J. Turnbull

Greg Ewing writes:

 > The use cases I had in mind for a 1-byte build are those for
 > which the alternative would be keeping everything in bytes.
 > Applications using a 1-byte build would need to be aware of
 > the fact and take care to slice strings at valid places. If
 > they were using bytes, they would have to face exactly the
 > same issues.

In other words, the people who want to use bytes have no less pain,
and the people who want to use characters suffer much greater pain.
How can this be a win?  If you live in an ASCII-only world, there are
a few APIs where bytes aren't allowed, and indeed it would be a win to
use those APIs on ASCII-encoded bytestrings.  And I don't mean
ISO-8859-1-only, either; UTF-8 is not compatible with ISO-8859-1 at
the byte level.

But the proposal Guido supports would address that by making those
APIs polymorphic.

 > > And finally: RAM is cheap and today's CPUs work better with 16- or
 > > 32-bit values than 8-bit characters.
 > 
 > Yet some people have reported significant performance benefits
 > for some applications from using a 2-byte build instead of a
 > 4-byte build. I was just speculating whether a 1-byte build
 > might be of further advantage in a few specialised cases.

Of course it would be.  But as soon as you want to do *any* I/O in
text mode with non-ASCII characters, you're in real pain.  What do you
do if a user cut/pastes some text containing proper quotation marks or
an en-dash at prompt in a terminal?  So polymorphism is a far better
way to optimize those special cases, as it allows a byte string in any
encoding to be treated as text, not just UTF-8.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Jesse Noller

On Wed, Jul 7, 2010 at 11:29 AM, Alexander Belopolsky
 wrote:
> On Tue, Jul 6, 2010 at 11:54 PM, Terry Reedy  wrote:
>> On 7/6/2010 3:59 PM, Alexander Belopolsky wrote:
>>
>> I am more interested in Brett's overall vision than this particular module.
>> I understand that to be one of a stdlib that is separate from CPython and is
>> indeed the standard Python library.
>>
>
> I am also very much interested in the overall vision, but I would like
> to keep the datetime.py thread focused, so I a going to reply to broad
> questions under a separate subject.
>
>> Questions:
>>
>> 1. Would the other distributions use a standard stdlib rather than current
>> individual versions?
>
> I certainly hope they will.  In the ideal word, passing test.regrtest
> with unmodified Lib should be *the* definition of what is called
> Python.  I understand that there is already some work underway in this
> direction such as marking implementation specific tests with
> appropriate decorators.
>
>>
>> 2. Would the other distributions pool their currently separate stdlib
>> efforts to help maintain one standard stdlib?
>
> I believe that making stdlib and test.regrtest more friendly to
> alternative implementations will go long way towards this goal.  It
> will, of course, be a decision that each project will have to make.
>
>>
>> 3. What version of Python would be allowed for use in the stdlib? I would
>> like the stdlib for 3.x to be able to use 3.x code. This would be only a
>> minor concern for CPython as long as 2.7 is maintained, but a major concern
>> for the other implementation currently 'stuck' in 2.x only. A good 3to2
>> would be needed.
>
> Availability of python equivalents will hopefully help  "other
> implementation currently 'stuck' in 2.x only" to get "unstuck" and
> move to 3.x.   I understand that this is a somewhat sensitive issue at
> the moment, but I believe a decision has been made supporting new
> features for 2.x is outside of python-dev focus.

[the rest snipped for now]

I agree with Alexander's responses. Brett can chime in here too, and
so can Frank W. or any of the other people who were involved in the
conversation. Essentially, many of us agreed "one stdlib to bind
them", from a canonical repository would help everyone involved. Any
modules which were specific to the implementation - such as
multiprocessing would either be flagged as such or not included in the
shared repo (TBD).

This effort has been on hold largely due to the fact we're waiting on
the mercurial migration. It's not something I think any of us would
want to do prior to that, and requires a fair amount of scaffolding /
build tools /etc to make it a net win.

Below, you will find the partially completed draft PEP (from a private
mercurial repo) Brett/Frank and I had worked on (but again, paused due
to mercurial/etc). Now that we're edging closer to 3.2 (this would not
happen before then) and mercurial, I think we might need to find the
time to finish the PEP:

PEP: 
Title: Making the Standard Library a Separate Project
Version: $Revision: 65628 $
Last-Modified: $Date: 2008-08-10 06:59:20 -0700 (Sun, 10 Aug 2008) $
Author: XXX
Status: Draft
Type: Process
Content-Type: text/x-rst
Created: 14-Aug-2009
Post-History:

.. warning::
This PEP will not be submitted until the migration of
CPython to Mercurial occurs.

Abstract

XXX

Rationale
=
Although the C implementation of Python (CPython) is the original and reference
implementation of the Python language, there are now a number of additional
implementations that are widely used and reasonably complete implementations.
Among these implementations are Jython_, IronPython_, and PyPy_.

At `PyCon 2009`_, representatives of multiple implementations of Python agreed
that it would be a good idea to divide the Python Standard Library into two
logical components, the first being a shared library that is
essential for an implementation of Python to be considered a full
implementation.  All Python implementations would share this library on equal
terms. The second library would be an implementation-specific standard library
for things that are either implementation details for a specific VM or
that depend on internals of each implementation (for example, if part
of the implementation must be written in C for CPython or written in
Java for Jython).

The test suite should be similarly exposed and shared between all
implementations on equal terms: one set of tests that must pass to be
considered a full implementation, and one set of implementation-specific tests
layered on top of the shared test suite (think garbage collection vs
refcounting, etc). The same pattern should apply to documentation as well.

The idea is to put CPython on a more equal footing with the other
implementations, and to remove the need to have Jython, IronPython or PyPy
specific cases in the CPython standard library.

Criteria for Inclusion/Exclusion of Code
=

[Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread C. Titus Brown

Hi all,

over on the fellowship o' the packaging mailing list, one of our GSoC students
(merwok) asked about how much formatting info should go into Python stdlib
docstrings.  Right now the stdlib docstrings are primarily text, AFAIK; but
with the switch to Sphinx for the official Python docs, should we permit
ReST-general and/or Sphinx-specific markup in docstrings?

Hmm, I don't actually see that the stdlib docstrings are imported into the
Python documentation anywhere, so maybe the use of Sphinx isn't that
relevant.  But how about ReST in general?

See

http://sphinx.pocoo.org/markup/index.html

for sphinx-specific markup constructs.

thanks,
--titus
-- 
C. Titus Brown, [email protected]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Shashwat Anand

On Wed, Jul 7, 2010 at 9:24 PM, C. Titus Brown  wrote:

> Hi all,
>
> over on the fellowship o' the packaging mailing list, one of our GSoC
> students
> (merwok) asked about how much formatting info should go into Python stdlib
> docstrings.  Right now the stdlib docstrings are primarily text, AFAIK; but
> with the switch to Sphinx for the official Python docs, should we permit
> ReST-general and/or Sphinx-specific markup in docstrings?
>
> Hmm, I don't actually see that the stdlib docstrings are imported into the
> Python documentation anywhere, so maybe the use of Sphinx isn't that
> relevant.  But how about ReST in general?
>

So will we be able to use .__docs__ within python interpretor, which is
quite handy feature.
>>> print(os.getcwd.__doc__)
getcwd() -> path

Return a string representing the current working directory.
Also some python interpretors like bpython uses it ; a snapshot here -  h
ttp://cl.ly/c5bb3be4a01d9d44732f
So will it be ok to break them ?


>
> See
>
>http://sphinx.pocoo.org/markup/index.html
>
> for sphinx-specific markup constructs.
>
> thanks,
> --titus
> --
> C. Titus Brown, [email protected]
>
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread C. Titus Brown

On Wed, Jul 07, 2010 at 09:36:10PM +0530, Shashwat Anand wrote:
> On Wed, Jul 7, 2010 at 9:24 PM, C. Titus Brown  wrote:
> 
> > Hi all,
> >
> > over on the fellowship o' the packaging mailing list, one of our GSoC
> > students
> > (merwok) asked about how much formatting info should go into Python stdlib
> > docstrings.  Right now the stdlib docstrings are primarily text, AFAIK; but
> > with the switch to Sphinx for the official Python docs, should we permit
> > ReST-general and/or Sphinx-specific markup in docstrings?
> >
> > Hmm, I don't actually see that the stdlib docstrings are imported into the
> > Python documentation anywhere, so maybe the use of Sphinx isn't that
> > relevant.  But how about ReST in general?
> 
> So will we be able to use .__docs__ within python interpretor, which is
> quite handy feature.
> >>> print(os.getcwd.__doc__)
> getcwd() -> path
> 
> Return a string representing the current working directory.
> Also some python interpretors like bpython uses it ; a snapshot here -  h
> ttp://cl.ly/c5bb3be4a01d9d44732f
> So will it be ok to break them ?

I don't understand...

Frist, you can already use

help(os.getcwd)

to get the same result.

Second, what would we be breaking?  We'd be making the straight text
representation a bit more cluttered in return for adding certain kinds
of meta-information into the markup.  I think it's a judgement call...

cheers,
--titus
-- 
C. Titus Brown, [email protected]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Michael Foord


On 07/07/2010 17:06, Shashwat Anand wrote:



On Wed, Jul 7, 2010 at 9:24 PM, C. Titus Brown > wrote:


Hi all,

over on the fellowship o' the packaging mailing list, one of our
GSoC students
(merwok) asked about how much formatting info should go into
Python stdlib
docstrings.  Right now the stdlib docstrings are primarily text,
AFAIK; but
with the switch to Sphinx for the official Python docs, should we
permit
ReST-general and/or Sphinx-specific markup in docstrings?

Hmm, I don't actually see that the stdlib docstrings are imported
into the
Python documentation anywhere, so maybe the use of Sphinx isn't that
relevant.  But how about ReST in general?


So will we be able to use .__docs__ within python interpretor, which 
is quite handy feature.

>>> print(os.getcwd.__doc__)
getcwd() -> path

Return a string representing the current working directory.
Also some python interpretors like bpython uses it ; a snapshot here - 
 http://cl.ly/c5bb3be4a01d9d44732f

So will it be ok to break them ?


Using ReST won't *break* these tools, but may make the output less 
readable.


I would say that the major use of docstrings is for interactive help - 
so interactive readability should be *the most important* (but perhaps 
not only) factor when considering how to format standard library docstrings.


Michael



See

http://sphinx.pocoo.org/markup/index.html

for sphinx-specific markup constructs.

thanks,
--titus
--
C. Titus Brown, [email protected] 



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
   



--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of your 
employer, to release me from all obligations and waivers arising from any and all 
NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, 
confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS 
AGREEMENTS") that I have entered into with your employer, its partners, licensors, 
agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. 
You further represent that you have the authority to release me from any BOGUS AGREEMENTS 
on behalf of your employer.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread C. Titus Brown

On Wed, Jul 07, 2010 at 05:09:40PM +0100, Michael Foord wrote:
> On 07/07/2010 17:06, Shashwat Anand wrote:
>> On Wed, Jul 7, 2010 at 9:24 PM, C. Titus Brown > > wrote:
>>
>> Hi all,
>>
>> over on the fellowship o' the packaging mailing list, one of our
>> GSoC students
>> (merwok) asked about how much formatting info should go into
>> Python stdlib
>> docstrings.  Right now the stdlib docstrings are primarily text,
>> AFAIK; but
>> with the switch to Sphinx for the official Python docs, should we
>> permit
>> ReST-general and/or Sphinx-specific markup in docstrings?
>>
>> Hmm, I don't actually see that the stdlib docstrings are imported
>> into the
>> Python documentation anywhere, so maybe the use of Sphinx isn't that
>> relevant.  But how about ReST in general?
>>
>>
>> So will we be able to use .__docs__ within python interpretor, which  
>> is quite handy feature.
>> >>> print(os.getcwd.__doc__)
>> getcwd() -> path
>>
>> Return a string representing the current working directory.
>> Also some python interpretors like bpython uses it ; a snapshot here -  
>>  http://cl.ly/c5bb3be4a01d9d44732f
>> So will it be ok to break them ?
>
> Using ReST won't *break* these tools, but may make the output less  
> readable.
>
> I would say that the major use of docstrings is for interactive help -  
> so interactive readability should be *the most important* (but perhaps  
> not only) factor when considering how to format standard library 
> docstrings.

OK.

I guess docutils isn't in the stdlib (should it be?) or else we could modify
'help' to use it to prepare a straight text formatting.

cheers,
--titus
-- 
C. Titus Brown, [email protected]
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Shashwat Anand

On Wed, Jul 7, 2010 at 9:39 PM, Michael Foord wrote:

>  On 07/07/2010 17:06, Shashwat Anand wrote:
>
>
>
> On Wed, Jul 7, 2010 at 9:24 PM, C. Titus Brown  wrote:
>
>> Hi all,
>>
>> over on the fellowship o' the packaging mailing list, one of our GSoC
>> students
>> (merwok) asked about how much formatting info should go into Python stdlib
>> docstrings.  Right now the stdlib docstrings are primarily text, AFAIK;
>> but
>> with the switch to Sphinx for the official Python docs, should we permit
>> ReST-general and/or Sphinx-specific markup in docstrings?
>>
>> Hmm, I don't actually see that the stdlib docstrings are imported into the
>> Python documentation anywhere, so maybe the use of Sphinx isn't that
>> relevant.  But how about ReST in general?
>>
>
>  So will we be able to use .__docs__ within python interpretor, which is
> quite handy feature.
> >>> print(os.getcwd.__doc__)
> getcwd() -> path
>
>  Return a string representing the current working directory.
> Also some python interpretors like bpython uses it ; a snapshot here -  h
> ttp://cl.ly/c5bb3be4a01d9d44732f
> So will it be ok to break them ?
>
>
> Using ReST won't *break* these tools, but may make the output less
> readable.
>

Oops. Sorry for the wrong choice of word. I meant the 'output will be less
readable', text are perhaps easier to read than ReST, thats what I meant.

>
> I would say that the major use of docstrings is for interactive help - so
> interactive readability should be *the most important* (but perhaps not
> only) factor when considering how to format standard library docstrings.
>
> Michael
>
>
>
>>
>> See
>>
>>http://sphinx.pocoo.org/markup/index.html
>>
>> for sphinx-specific markup constructs.
>>
>> thanks,
>> --titus
>> --
>> C. Titus Brown, [email protected]
>>
>>
>
> ___
> Python-Dev mailing list
> [email protected]://mail.python.org/mailman/listinfo/python-dev
>
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
>
>
>
> -- http://www.ironpythoninaction.com/http://www.voidspace.org.uk/blog
>
> READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
> your employer, to release me from all obligations and waivers arising from 
> any and all NON-NEGOTIATED agreements, licenses, terms-of-service, 
> shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, 
> non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have 
> entered into with your employer, its partners, licensors, agents and assigns, 
> in perpetuity, without prejudice to my ongoing rights and privileges. You 
> further represent that you have the authority to release me from any BOGUS 
> AGREEMENTS on behalf of your employer.
>
>
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

2010-07-07 Thread Tim Peters

[Nick Coghlan]
> ...
> Hmm, I've been using difflib.SequenceMatcher for years in a serial bit
> error rate tester (with typical message sizes ranging from tens of
> bytes to tens of thousands of bytes) that occasionally gives
> unexpected results. I'd been blaming hardware glitches (and, to be
> fair, all of the odd results I can recall off the top of my head were
> definitively traced to problems in the hardware under test), but I
> should probably check I'm not running afoul of this bug.

That would be prudent ;-)

> And Tim, the algorithm may not be optimal as a general purpose binary
> diff algorithm, but it's still a hell of a lot faster than the
> hardware I use it to test. Compared to the equipment configuration
> times, the data comparison time is trivial.

I'm all in favor of people using the class for any purpose they find
useful.  Just saying the overwhelmingly most common use is for
comparing text files, and that's important - most important, since
most widely used.

> There's another possibility here - perhaps the heuristic should be off
> by default in SequenceMatcher, with a TextMatcher subclass that
> enables it (and Differ and HtmlDiff then inheriting from the latter)?

Or, to make life easiest for the most common uses, create a subclass
that _didn't_ have any notion of "junk" whatsoever.  Or a new flag to
turn auto-junk heuristics off.  Unfortunately, no true solution exists
without changing _something_ in the API, and since the behavior
changed 8 years ago there's just no guessing how many uses rely on the
by-now-long-current behavior.

> There's currently barely anything in the SequenceMatcher documentation
> to indicate that it is designed primarily for comparing text rather
> than arbitrary sequences (the closest it gets is the reference to
> Ratcliff/Obserhelp gestalt pattern matching and then the link to the
> Ratcliff/Metzener Dr Dobb's article - and until this thread, I'd never
> followed the article link).

It's designed to compare sequences of hashable elements.  The use
cases that _drove_ the implementation were (1) viewing a file as a
sequence of lines; and (2), viewing a line as a sequence of
characters.  Use cases always drive implementation (although rarely
mentioned the docs), but striving for natural generalization sometimes
pays off.  I expected it would in this case, and that others have
found unanticipated uses confirms that it did.  Unfortunately, I
screwed up by not finishing what I started 8 years ago (adding an
auto-junk heuristic that was very valuable in the primary use case,
but turned out to have very bad effects in some other cases).

> Rather than reverting to Tim's undocumented vision, perhaps we should
> better articulate it by separating the general purpose matcher from
> an optimised text matcher.

Having a notion of junk can improve the quality of results (in the
sense of making them more intuitive to humans, which was an explicit
goal of the module), and can yield enormous speedups.  Is this
restricted solely to comparing text files?  I don't know that to be
the case, but offhand doubt it's true.  As always, if they exist, we
won't hear from people with other use cases who never noticed the
change (except perhaps to see improved speed and better results) - not
until we "turn off" the heuristic on them, and then they'll view
_that_ as "a bug".
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Georg Brandl

Am 07.07.2010 18:09, schrieb Michael Foord:
>> Hi all,
>>
>> over on the fellowship o' the packaging mailing list, one of our GSoC 
>> students
>> (merwok) asked about how much formatting info should go into Python 
>> stdlib
>> docstrings.  Right now the stdlib docstrings are primarily text, AFAIK; 
>> but
>> with the switch to Sphinx for the official Python docs, should we permit
>> ReST-general and/or Sphinx-specific markup in docstrings?

I promised to write a PEP about that some time in the future.  (Probably after
3.2 final.)

>> Hmm, I don't actually see that the stdlib docstrings are imported into 
>> the
>> Python documentation anywhere, so maybe the use of Sphinx isn't that
>> relevant.  But how about ReST in general?
>>
>>
>> So will we be able to use .__docs__ within python interpretor, which is quite
>> handy feature.
>> >>> print(os.getcwd.__doc__)
>> getcwd() -> path
>>
>> Return a string representing the current working directory.
>> Also some python interpretors like bpython uses it ; a snapshot here -
>>  http://cl.ly/c5bb3be4a01d9d44732f
>> So will it be ok to break them ?
> 
> Using ReST won't *break* these tools, but may make the output less readable.
> 
> I would say that the major use of docstrings is for interactive help - so
> interactive readability should be *the most important* (but perhaps not only)
> factor when considering how to format standard library docstrings.

Agreed.  However, reST doesn't need to be less readable if the specific
inline markup is not used.  For example, using `identifier` to refer to a
function or *var* to refer to a variable (which is already done at quite a
few places) is very readable IMO.  Using ``code`` also isn't bad, considering
that double quotes are not much different and potentially ambiguous.

Overall, I think that we can make stdlib docstrings valid reST -- even if it's
reST without much markup -- but valid, so that people pulling in stdlib doc-
strings into Sphinx docs won't get ugly warnings.

What I would *not* like to see is heavy markup and Sphinx specifics -- that
would only make sense if we included the docstrings in the docs, and I don't
see that coming.

cheers,
Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Senthil Kumaran

On Wed, Jul 7, 2010 at 11:00 PM, Georg Brandl  wrote:
> Agreed.  However, reST doesn't need to be less readable if the specific
> inline markup is not used.  For example, using `identifier` to refer to a
> function or *var* to refer to a variable (which is already done at quite a
> few places) is very readable IMO.  Using ``code`` also isn't bad, considering
> that double quotes are not much different and potentially ambiguous.

What are the specific advantages that you see?
Can it more useful in some cases than the other?



-- 
Senthil
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Licensing

2010-07-07 Thread Guido van Rossum

On Wed, Jul 7, 2010 at 2:48 PM, Nick Coghlan  wrote:
> On Wed, Jul 7, 2010 at 2:59 PM, Guido van Rossum  wrote:
>> On Tue, Jul 6, 2010 at 11:27 PM, Nick Coghlan  wrote:
>>> For example, if you look at some of the code that even Guido has
>>> submitted (e.g. pgen2), that's actually come in under Google's
>>> contributor agreement, rather than Guido's personal one. Presumably
>>> that was work he did on company time, so the copyright actually rests
>>> with Google rather than Guido.
>>
>> I hope you are misremembering some details. I did that work while at
>> Elemental Security (i.e. before I joined Google). It should have
>> Elemental Security's contributor agreement. I developed that code
>> initially for inclusion in Elemental's product line (as part of a
>> parser for a domain-specific language named "Fuel" which did not get
>> open-sourced -- probably for the better.
>
> Whoops, I got my timeline wrong (it did seem a little off when I wrote
> it - I think part of my brain was trying to tell me the dates didn't
> match up). I must have been thinking of something else I was working
> on recently that had Google's name in the header, most likely the abc
> module.

Yeah, that, and anything I contributed *after* pgen2.

> So apologies for the confusion - just s/pgen2/abc/ in my example to
> make it line up with my intent :)

No problem! Just setting the record straight.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

2010-07-07 Thread Eli Bendersky

> Rather than reverting to Tim's
> undocumented vision, perhaps we should better articulate it by
> separating the general purpose matcher from an optimised text matcher.
>

For what it's worth, my benchmarking showed that modifying the
heuristic to only kick in when there are more than 100 kinds of
elements (Terry's option A) didn't affect the runtime of matching
whatsoever, even when the heuristic *does* kick in. All it adds,
really, is the overhead of a single 'if' statement. So it wouldn't be
right to assume that somehow modifying the heuristic or allowing to
turn it off will negatively affect performance in the special case Tim
originally optimized for.

Eli
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Éric Araujo

> I promised to write a PEP about that some time in the future.  (Probably after
> 3.2 final.)

Nice.

It seems that projects putting Sphinxy reST in their doc are using
automatic doc generation. This is however not always the best way to
make good doc, as demonstrated by Python’s hand-written,
very-high-quality documentation.

> Agreed.  However, reST doesn't need to be less readable if the specific
> inline markup is not used.  For example, using `identifier` to refer to a
> function or *var* to refer to a variable (which is already done at quite a
> few places) is very readable IMO.  Using ``code`` also isn't bad, considering
> that double quotes are not much different and potentially ambiguous.
> 
> Overall, I think that we can make stdlib docstrings valid reST -- even if it's
> reST without much markup -- but valid, so that people pulling in stdlib doc-
> strings into Sphinx docs won't get ugly warnings.
> 
> What I would *not* like to see is heavy markup and Sphinx specifics -- that
> would only make sense if we included the docstrings in the docs, and I don't
> see that coming.

Clear answer, thanks! We have backported some modules in distutils2, and
some docstrings already contain Sphinxy reST (e.g. :param: and :pep:),
it’s good to know now that we shouldn’t put hours into that to see it
reverted later.

Regards

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Barry Warsaw

On Jul 07, 2010, at 07:30 PM, Georg Brandl wrote:

>Overall, I think that we can make stdlib docstrings valid reST -- even
>if it's reST without much markup -- but valid, so that people pulling
>in stdlib doc- strings into Sphinx docs won't get ugly warnings.
>
>What I would *not* like to see is heavy markup and Sphinx specifics --
>that would only make sense if we included the docstrings in the docs,
>and I don't see that coming.

Does it make sense to add (reST-style) epydoc markup for API signatures?
E.g.

def create_foo(name, parent=None):
"""Create the named foo.

The named foo must not already exist, but if optional `parent` is given,
it must exist.

:param name: The name of the new foo.
:type name: string
:param parent: The new foo's parent.  If given, this must exist.
:type parent: string
:return: The new foo.
:rtype: `Foo`
:raises BadFooNameError: when `name` is illegal.
:raises FooAlreadyExistsError: when a foo with `name` already exists.
:raises BadParentError: when the foo's parent does not exist.
"""

We could then generate automatic API docs from this, a la:

http://www.blender.org/documentation/248PythonDoc/

-Barry

signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Robert Kern


On 7/7/10 1:53 PM, Éric Araujo wrote:

I promised to write a PEP about that some time in the future.  (Probably after
3.2 final.)


Nice.

It seems that projects putting Sphinxy reST in their doc are using
automatic doc generation. This is however not always the best way to
make good doc, as demonstrated by Python’s hand-written,
very-high-quality documentation.


This is a false dichotomy. Many of those projects using Sphinxy reST in their 
docstrings are using the semi-automatic[1] doc generation provided by Sphinx to 
construct *part* of their documentation. Namely, the reference of functions, 
classes and methods. A large part of Python's library reference consists of 
exactly this. Having a function's docstring provide the content for its entry in 
the library reference has the substantial DRY benefit of having exactly one 
source for the comprehensive documentation of that function available from both 
help() and the manual. As someone who uses the interactive prompt to play with 
things and read docstrings intensively, I would really like to see docstrings 
providing the same information as the manual.


Of course, opinions differ about how comprehensive docstrings should be compared 
to the reference manual's entries. And there are technical reasons for not 
wanting to try to extract docstrings from code (e.g. platform-specific modules). 
But one should not fear that the quality of the manual would decline.


[1] That's the really nice thing about Sphinx's docstring extraction features in 
contrast with other such tools. It doesn't generate a manual from the 
docstrings; it makes you explicitly reference the docstrings into the manual's 
text. This would fit in very naturally with Python's library reference.


--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Georg Brandl

Am 07.07.2010 19:53, schrieb Éric Araujo:
>> I promised to write a PEP about that some time in the future.  (Probably 
>> after
>> 3.2 final.)
> 
> Nice.
> 
> It seems that projects putting Sphinxy reST in their doc are using
> automatic doc generation. This is however not always the best way to
> make good doc, as demonstrated by Python’s hand-written,
> very-high-quality documentation.

I know, and this is what I originally intended for Sphinx.  However, the calls
for automatic doc generation are very loud, and it's understandable that most
project can't afford writing their documentation twice.

>> Agreed.  However, reST doesn't need to be less readable if the specific
>> inline markup is not used.  For example, using `identifier` to refer to a
>> function or *var* to refer to a variable (which is already done at quite a
>> few places) is very readable IMO.  Using ``code`` also isn't bad, considering
>> that double quotes are not much different and potentially ambiguous.
>> 
>> Overall, I think that we can make stdlib docstrings valid reST -- even if 
>> it's
>> reST without much markup -- but valid, so that people pulling in stdlib doc-
>> strings into Sphinx docs won't get ugly warnings.
>> 
>> What I would *not* like to see is heavy markup and Sphinx specifics -- that
>> would only make sense if we included the docstrings in the docs, and I don't
>> see that coming.
> 
> Clear answer, thanks! We have backported some modules in distutils2, and
> some docstrings already contain Sphinxy reST (e.g. :param: and :pep:),
> it’s good to know now that we shouldn’t put hours into that to see it
> reverted later.

:pep: isn't Sphinxy, put provided by docutils, and the :param foo: field
lists are also valid reST, if rendered a bit awkwardly without the transforms
that Sphinx does to display them in epydoc style.

cheers,
Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Include datetime.py in stdlib or not?

2010-07-07 Thread Alexander Belopolsky

On Tue, Jul 6, 2010 at 11:54 PM, Terry Reedy  wrote:
> On 7/6/2010 3:59 PM, Alexander Belopolsky wrote:
[.. skipping more general stdlib discussion see "Python equivalents in
stdlib" thread ..]

>> 2. There are other areas of stdlib that can benefit more from pure
>> python equivalents.
>
> Possibly true, but developers do what they do, and this seems mostly done.
>

The reason I want to do datetime module is that there are some
long-standing bugs/RFEs that would require some experimentation to get
it right.  Such experimentation is unfeasible in C where more effort
goes into thinking about integer overflow and reference counting than
into actual design.  Here are some of those issues:

http://bugs.python.org/issue5516 = equality not symmetric for
subclasses of datetime.date and datetime.datetime
http://bugs.python.org/issue2736 = datetime needs an "epoch" method
http://bugs.python.org/issue7584 = datetime.rfcformat() for Date and
Time on the Internet
http://bugs.python.org/issue1100942 = Add datetime.time.strptime and
datetime.date.strptime
http://bugs.python.org/issue8860 = Rounding in timedelta constructor
is inconsistent with that in timedelta arithmetics
http://bugs.python.org/issue1647654 = No obvious and correct way to
get the time zone offset
http://bugs.python.org/issue5288 = tzinfo objects with sub-minute
offsets are not supported (e.g. UTC+05:53:28)
http://bugs.python.org/issue1982 = Feature: extend strftime to accept
milliseconds

>> 3. Reference implementations should be written by a senior CPython
>> developer and not scraped from external projects like PyPy.
>
> I did not see that in my reading of the thread.
>
This POV was brought up in the #python-dev IRC channel.

>  In any case, what matters is quality, not authorship.

I completely agree and I think the sooner the python code gets into
the main tree the more reviews it will get before the next release.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

2010-07-07 Thread Antoine Pitrou

On Wed, 7 Jul 2010 19:44:31 +0200
Eli Bendersky  wrote:
> 
> For what it's worth, my benchmarking showed that modifying the
> heuristic to only kick in when there are more than 100 kinds of
> elements (Terry's option A) didn't affect the runtime of matching
> whatsoever, even when the heuristic *does* kick in. All it adds,
> really, is the overhead of a single 'if' statement. So it wouldn't be
> right to assume that somehow modifying the heuristic or allowing to
> turn it off will negatively affect performance in the special case Tim
> originally optimized for.

Just because it doesn't affect performance in your tests doesn't mean it
won't do so in the general case. Consider a case where Tim's junk
optimization kicked in and helped improve performance a lot, but where
there are still less than 100 alphabet symbols. The new heuristic will
ruin this use case.

That's why I'm advocating a dedicated flag instead.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] versioned .so files for Python 3.2

2010-07-07 Thread Barry Warsaw

On Jul 01, 2010, at 07:02 AM, Scott Dial wrote:

>I decided to prove to myself that it was not a significant issue to
>have parallel directory structures in a .tar.bz2, and I was surprised
>to find it much worse at that then I had imagined. For example,
>
># cd /usr/lib/python2.6/site-packages
># tar --exclude="*.pyc" --exclude="*.pyo" \
>  -cjf mercurial.tar.bz2 mercurial
># du -h mercurial.tar.bz2
>640Kmercurial.tar.bz2
>
># cp -a mercurial mercurial2
># tar --exclude="*.pyc" --exclude="*.pyo" \
>  -cjf mercurial2.tar.bz2 mercurial mercurial2
># du -h mercurial.tar.bz2
>1.3Mmercurial2.tar.bz2
>
>So, I was definitely wrong in saying that you do better than doubling.
[...]
>I appreciate all your replies. I am not sure a PEP is really needed
>here, but to having had all of this discussed and explained on the
>mailing list is certainly useful. I trust that yourself and the debuntu
>python group will end up chasing down and taking care of any quirks
>that this change might cause, so I am not worried about it. :D

Getting back to this after the US holiday.  Thanks for running these numbers
Scott.  I've opened a bug in the Python tracker and attached my latest patch:

http://bugs.python.org/issue9193

The one difference from previous versions of the patch is that the .so tag is
now settable via "./configure --with-so-abi-tag=foo".  This would generate
shared libs like _multiprocessing.foo.so.

I'd like to get consensus as to whether folks feel that a PEP is needed.  My
own thought is that I'd rather not do a PEP specific to this change, but I
would update PEP 384 with the implications on .so versioning.  Please also
feel free to review the patch in that issue.

Thanks,
-Barry

signature.asc
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Antoine Pitrou

On Wed, 07 Jul 2010 16:39:38 +0100
Michael Foord  wrote:
> On 07/07/2010 16:29, Alexander Belopolsky wrote:
> > [snip...]
> >
> >> 4. Does not ctypes make it possible to replace a method of a Python-coded
> >> class with a faster C version, with something like
> >>   try:
> >> connect to methods.dll
> >> check that function xyx exists
> >> replace Someclass.xyy with ctypes wrapper
> >>   except: pass
> >> For instance, the SequenceMatcher heuristic was added to speedup the
> >> matching process that I believe is encapsulated in one O(n**2) or so
> >> bottleneck method. I believe most everything else is O(n) bookkeeping.
> >>
> >>  
> > The ctypes modules is very CPython centric as far as I know.   For the
> > new modules, this may be a valid way to rapidly develop accelerated
> > versions.   For modules that are already written in C, I don't see
> > much benefit in replacing them with ctypes wrappers.
> 
> Nope, both IronPython and PyPy have ctypes implementations and Jython is 
> in the process of "growing" one. Using ctypes for C extensions is the 
> most portable way of providing C extensions for Python (other than 
> providing a pure-Python implementation of course).

Except that ctypes doesn't help provide C extensions at all. It only
helps provide wrappers around existing C libraries, which is quite a
different thing.
Which, in the end, makes the original suggestion meaningless.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Alexander Belopolsky

On Wed, Jul 7, 2010 at 2:12 PM, Barry Warsaw  wrote:
..
> Does it make sense to add (reST-style) epydoc markup for API signatures?
> E.g.
>
> def create_foo(name, parent=None):
>    """Create the named foo.
>
>    The named foo must not already exist, but if optional `parent` is given,
>    it must exist.
>
>    :param name: The name of the new foo.
>    :type name: string
..

-1.  Repeating the function signature in the docstring only adds
clutter and Java-style formal types/exceptions specifications are
rarely appropriate in Python.  I think marking arguments up with * as
in *name*, *parent*, should be enough in most cases.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Antoine Pitrou

On Wed, 7 Jul 2010 14:12:17 -0400
Barry Warsaw  wrote:
> On Jul 07, 2010, at 07:30 PM, Georg Brandl wrote:
> 
> >Overall, I think that we can make stdlib docstrings valid reST -- even
> >if it's reST without much markup -- but valid, so that people pulling
> >in stdlib doc- strings into Sphinx docs won't get ugly warnings.
> >
> >What I would *not* like to see is heavy markup and Sphinx specifics --
> >that would only make sense if we included the docstrings in the docs,
> >and I don't see that coming.
> 
> Does it make sense to add (reST-style) epydoc markup for API signatures?
> E.g.

It really looks ugly (and annoying to decipher) when viewed in plain
text.

Regards

Antoine.


> 
> def create_foo(name, parent=None):
> """Create the named foo.
> 
> The named foo must not already exist, but if optional `parent` is given,
> it must exist.
> 
> :param name: The name of the new foo.
> :type name: string
> :param parent: The new foo's parent.  If given, this must exist.
> :type parent: string
> :return: The new foo.
> :rtype: `Foo`
> :raises BadFooNameError: when `name` is illegal.
> :raises FooAlreadyExistsError: when a foo with `name` already exists.
> :raises BadParentError: when the foo's parent does not exist.
> """
> 
> We could then generate automatic API docs from this, a la:
> 
> http://www.blender.org/documentation/248PythonDoc/
> 
> -Barry
> 


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Alexander Belopolsky

On Wed, Jul 7, 2010 at 2:42 PM, Antoine Pitrou  wrote:
..
> Except that ctypes doesn't help provide C extensions at all. It only
> helps provide wrappers around existing C libraries, which is quite a
> different thing.

Yet it may allow writing an equivalent of a  C extension in pure
python.  For example posix or time modules could be easily
reimplemented that way if the libc was less platform dependent.  Such
reimplementation, however is unlikely to be very useful.

> Which, in the end, makes the original suggestion meaningless.

It is not meaningless, but would require effectively exposing pyport.h
in pure python and using it to select the correct signature for a
given library function.  This would be a tremendous effort and is
hardly justified.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Fred Drake

On Wed, Jul 7, 2010 at 2:27 PM, Georg Brandl  wrote:
> I know, and this is what I originally intended for Sphinx.  However, the calls
> for automatic doc generation are very loud, and it's understandable that most
> project can't afford writing their documentation twice.

The ability to provide extended content beyond what's provided in the
docstring using the auto* constructs may make it feasible to start
avoiding some of those DRY violations for Python's standard library;
I'm enjoying those for another project.

I hope we don't end up in a position where we can't use the auto*
constructs in the Python documentation.

  -Fred

-- 
Fred L. Drake, Jr.
"A storm broke loose in my mind."  --Albert Einstein
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Include datetime.py in stdlib or not?

2010-07-07 Thread Brett Cannon

On Tue, Jul 6, 2010 at 20:54, Terry Reedy  wrote:
> On 7/6/2010 3:59 PM, Alexander Belopolsky wrote:
>
> I am more interested in Brett's overall vision than this particular module.
> I understand that to be one of a stdlib that is separate from CPython and is
> indeed the standard Python library.
>
> Questions:
>
> !. Would the other distributions use a standard stdlib rather than current
> individual versions?

The idea is that the stdlib just becomes a subrepo that the other VMs
simply pull in to gain their version of the stdlib.

> If so, and if at least one used the Python version of
> each module, this would alleviate the concern that non-use == non-testing.
> (Test improvement would also help this.)
>
> 2. Would the other distributions pool their currently separate stdlib
> efforts to help maintain one standard stdlib. If so, this would alleviate
> the concern about the extra effort to maintain both a C and Python version.
> (Test improvement would also help this also.)

That's the idea. We already have contributors from the various VMs who
has commit privileges, but they all work in their own repos for
convenience. My hope is that if we break the stdlib out into its own
repository that people simply pull in then other VM contributors will
work directly off of the stdlib repo instead of their own, magnifying
the usefulness of their work.

>
> 3. What version of Python would be allowed for use in the stdlib? I would
> like the stdlib for 3.x to be able to use 3.x code. This would be only a
> minor concern for CPython as long as 2.7 is maintained, but a major concern
> for the other implementation currently 'stuck' in 2.x only. A good 3to2
> would be needed.

This will only affect py3k.

>
> I generally favor having Python versions of modules available. My current
> post on difflib.SequenceMatcher is based on experiments with an altered
> version. I copied difflib.py to my test directory, renamed it diff2lib.py,
> so I could import both versions, found and edited the appropriate method,
> and off I went. If difflib were in C, my post would have been based on
> speculation about how a fixed version would operate, rather than on data.
>

The effect upon CPython would be the extension modules become just
performance improvements, nothing more (unless they have to be in C as
in the case for sqlite3).

> 4. Does not ctypes make it possible to replace a method of a Python-coded
> class with a faster C version, with something like
>  try:
>    connect to methods.dll
>    check that function xyx exists
>    replace Someclass.xyy with ctypes wrapper
>  except: pass
> For instance, the SequenceMatcher heuristic was added to speedup the
> matching process that I believe is encapsulated in one O(n**2) or so
> bottleneck method. I believe most everything else is O(n) bookkeeping.
>

There is no need to go that far. All one needs to do is structure the
extension code such that when the extension module is imported, it
overrides key objects in the Python version. Using ctypes is just
added complexity.

>
>>  This proposal has brought mostly positive feedback on the tracker [2]
>> with only a few objections being raised.
>>
>> 1. Since this does not bring any new functionality and datetime module
>> is not expected to evolve, there is no need for pure python version.
>
> see above
>
>> 2. There are other areas of stdlib that can benefit more from pure
>> python equivalents.
>
> Possibly true, but developers do what they do, and this seems mostly done.
>
>> 3. Reference implementations should be written by a senior CPython
>> developer and not scraped from external projects like PyPy.
>
> I did not see that im my reading of the thread. In any case, what matters is
> quality, not authorship.
>
>> What do you think?  Please reply here or add a comment at
>> http://bugs.python.org/issue7989.
>
> From scanning that and the posts here, it seems like a pep or other doc on
> dual version modules would be a good idea. It should at least document how
> to code the switch from python version to the x coded version and how to
> test both, as discussed.

Franke Wierzbicki and I started such a PEP, but we both got busy with
other stuff. And since I am most likely going to be the one
spearheading this on the CPython side this will most likely not move
forward until I have time to get to it (which might be quite a while).

-Brett

>
> --
> Terry Jan Reedy
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Brett Cannon

On Wed, Jul 7, 2010 at 08:29, Alexander Belopolsky
 wrote:
> On Tue, Jul 6, 2010 at 11:54 PM, Terry Reedy  wrote:
>> On 7/6/2010 3:59 PM, Alexander Belopolsky wrote:
>>
>> I am more interested in Brett's overall vision than this particular module.
>> I understand that to be one of a stdlib that is separate from CPython and is
>> indeed the standard Python library.
>>
>
> I am also very much interested in the overall vision, but I would like
> to keep the datetime.py thread focused, so I a going to reply to broad
> questions under a separate subject.
>
>> Questions:
>>
>> 1. Would the other distributions use a standard stdlib rather than current
>> individual versions?
>
> I certainly hope they will.  In the ideal word, passing test.regrtest
> with unmodified Lib should be *the* definition of what is called
> Python.  I understand that there is already some work underway in this
> direction such as marking implementation specific tests with
> appropriate decorators.
>
>>
>> 2. Would the other distributions pool their currently separate stdlib
>> efforts to help maintain one standard stdlib?
>
> I believe that making stdlib and test.regrtest more friendly to
> alternative implementations will go long way towards this goal.  It
> will, of course, be a decision that each project will have to make.
>
>>
>> 3. What version of Python would be allowed for use in the stdlib? I would
>> like the stdlib for 3.x to be able to use 3.x code. This would be only a
>> minor concern for CPython as long as 2.7 is maintained, but a major concern
>> for the other implementation currently 'stuck' in 2.x only. A good 3to2
>> would be needed.
>
> Availability of python equivalents will hopefully help  "other
> implementation currently 'stuck' in 2.x only" to get "unstuck" and
> move to 3.x.   I understand that this is a somewhat sensitive issue at
> the moment, but I believe a decision has been made supporting new
> features for 2.x is outside of python-dev focus.
>
>
>> 4. Does not ctypes make it possible to replace a method of a Python-coded
>> class with a faster C version, with something like
>>  try:
>>    connect to methods.dll
>>    check that function xyx exists
>>    replace Someclass.xyy with ctypes wrapper
>>  except: pass
>> For instance, the SequenceMatcher heuristic was added to speedup the
>> matching process that I believe is encapsulated in one O(n**2) or so
>> bottleneck method. I believe most everything else is O(n) bookkeeping.
>>
> The ctypes modules is very CPython centric as far as I know.   For the
> new modules, this may be a valid way to rapidly develop accelerated
> versions.   For modules that are already written in C, I don't see
> much benefit in replacing them with ctypes wrappers.
>
>
>> [.. datetime specific discussion skipped ..]
>> From scanning that and the posts here, it seems like a pep or other doc on
>> dual version modules would be a good idea. It should at least document how
>> to code the switch from python version to the x coded version and how to
>> test both, as discussed.
>>
> I am certainly not ready to write such PEP.   I may be in a better
> position to contribute to it after I gain more experience with
> test_datetime.py.   At the moment I have more questions than answers.
>
> For example, the established practice appears to be:
>
> modulename.py
>
> # Python code
>
> try:
>    from _modulename import *
> except:
>    pass
>
> This is supposed to generate a .pyc file with no python definitions in
> it if  _modulename is available.  The problem with datetime.py is that
> it have several helper methods like _ymd2ord() that will still stay in
> the module.  Should an "else:" clause be added to clean these up?
> should these methods become class or static methods as appropriate?
>
> The  established practice for testing is
>
> py_module = support.import_fresh_module('modulename', blocked=['_modulename'])
> c_module = support.import_fresh_module('modulename', fresh=['_modulename'])
>
> class TestDefnitions: # not a unittest.TestCase subclass
>       def test_foo(self):
>            self.module.foo(..)
>       ...
>
> class C_Test(TestDefnitions, unittest.TestCase):
>       module = c_module
>
> class Py_Test(TestDefnitions, unittest.TestCase):
>       module = py_module
>
>
> For datetime.py this approach presents several problems:
>
> 1. replacing datetime with self.module.datetime everywhere can get
> messy quickly.
> 2. There are test classes defined at the test_datetime module level
> that subclass from datetime classes.  The self.module is not available
> at the module level.  These should probably be moved to setUp()
> methods and attached to test case self.
> 3. If #2 is resolved by moving definitions inside functions, the
> classes will become unpickleable and pickle tests will break.  Some
> hackery involving injecting these classes into __main__ or module
> globals may be required.

So I have been thinking about this about how to possibly make this
standard test scaffolding

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Brett Cannon

On Wed, Jul 7, 2010 at 11:46, Antoine Pitrou  wrote:
> On Wed, 7 Jul 2010 14:12:17 -0400
> Barry Warsaw  wrote:
>> On Jul 07, 2010, at 07:30 PM, Georg Brandl wrote:
>>
>> >Overall, I think that we can make stdlib docstrings valid reST -- even
>> >if it's reST without much markup -- but valid, so that people pulling
>> >in stdlib doc- strings into Sphinx docs won't get ugly warnings.
>> >
>> >What I would *not* like to see is heavy markup and Sphinx specifics --
>> >that would only make sense if we included the docstrings in the docs,
>> >and I don't see that coming.
>>
>> Does it make sense to add (reST-style) epydoc markup for API signatures?
>> E.g.
>
> It really looks ugly (and annoying to decipher) when viewed in plain
> text.

I agree. And it is highly repetitive since the signature information
is right there already. All of that info in those annotations can
easily be written in paragraph form if needed and honestly would read
better to my eyes.

-Brett

>
> Regards
>
> Antoine.
>
>
>>
>> def create_foo(name, parent=None):
>>     """Create the named foo.
>>
>>     The named foo must not already exist, but if optional `parent` is given,
>>     it must exist.
>>
>>     :param name: The name of the new foo.
>>     :type name: string
>>     :param parent: The new foo's parent.  If given, this must exist.
>>     :type parent: string
>>     :return: The new foo.
>>     :rtype: `Foo`
>>     :raises BadFooNameError: when `name` is illegal.
>>     :raises FooAlreadyExistsError: when a foo with `name` already exists.
>>     :raises BadParentError: when the foo's parent does not exist.
>>     """
>>
>> We could then generate automatic API docs from this, a la:
>>
>> http://www.blender.org/documentation/248PythonDoc/
>>
>> -Barry
>>
>
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Brett Cannon

On Wed, Jul 7, 2010 at 10:30, Georg Brandl  wrote:
> Am 07.07.2010 18:09, schrieb Michael Foord:
>>>     Hi all,
>>>
>>>     over on the fellowship o' the packaging mailing list, one of our GSoC 
>>> students
>>>     (merwok) asked about how much formatting info should go into Python 
>>> stdlib
>>>     docstrings.  Right now the stdlib docstrings are primarily text, AFAIK; 
>>> but
>>>     with the switch to Sphinx for the official Python docs, should we permit
>>>     ReST-general and/or Sphinx-specific markup in docstrings?
>
> I promised to write a PEP about that some time in the future.  (Probably after
> 3.2 final.)

For those of you who aren't aware, there actually already is a PEP on
using reST in docstrings: http://python.org/dev/peps/pep-0287/ .

But it could stand to be updated by Georg with current practice with
our internal doc practices as 2002 was back when we were still using
LaTeX.

-Brett

>
>>>     Hmm, I don't actually see that the stdlib docstrings are imported into 
>>> the
>>>     Python documentation anywhere, so maybe the use of Sphinx isn't that
>>>     relevant.  But how about ReST in general?
>>>
>>>
>>> So will we be able to use .__docs__ within python interpretor, which is 
>>> quite
>>> handy feature.
>>> >>> print(os.getcwd.__doc__)
>>> getcwd() -> path
>>>
>>> Return a string representing the current working directory.
>>> Also some python interpretors like bpython uses it ; a snapshot here -
>>>  http://cl.ly/c5bb3be4a01d9d44732f
>>> So will it be ok to break them ?
>>
>> Using ReST won't *break* these tools, but may make the output less readable.
>>
>> I would say that the major use of docstrings is for interactive help - so
>> interactive readability should be *the most important* (but perhaps not only)
>> factor when considering how to format standard library docstrings.
>
> Agreed.  However, reST doesn't need to be less readable if the specific
> inline markup is not used.  For example, using `identifier` to refer to a
> function or *var* to refer to a variable (which is already done at quite a
> few places) is very readable IMO.  Using ``code`` also isn't bad, considering
> that double quotes are not much different and potentially ambiguous.
>
> Overall, I think that we can make stdlib docstrings valid reST -- even if it's
> reST without much markup -- but valid, so that people pulling in stdlib doc-
> strings into Sphinx docs won't get ugly warnings.
>
> What I would *not* like to see is heavy markup and Sphinx specifics -- that
> would only make sense if we included the docstrings in the docs, and I don't
> see that coming.
>
> cheers,
> Georg
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

2010-07-07 Thread Tres Seaver

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Antoine Pitrou wrote:
> On Wed, 7 Jul 2010 19:44:31 +0200
> Eli Bendersky  wrote:
>> For what it's worth, my benchmarking showed that modifying the
>> heuristic to only kick in when there are more than 100 kinds of
>> elements (Terry's option A) didn't affect the runtime of matching
>> whatsoever, even when the heuristic *does* kick in. All it adds,
>> really, is the overhead of a single 'if' statement. So it wouldn't be
>> right to assume that somehow modifying the heuristic or allowing to
>> turn it off will negatively affect performance in the special case Tim
>> originally optimized for.
> 
> Just because it doesn't affect performance in your tests doesn't mean it
> won't do so in the general case. Consider a case where Tim's junk
> optimization kicked in and helped improve performance a lot, but where
> there are still less than 100 alphabet symbols. The new heuristic will
> ruin this use case.

That would describe pretty much every C program ever written, for
instance, and nearly as high a percentage of all Python modules /
scripts ever written:  the 'string.printable' set, less formfeed and
vertical tab, is 98 characters long.

> That's why I'm advocating a dedicated flag instead.

+1.



Tres.
- --
===
Tres Seaver  +1 540-429-0999  [email protected]
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkw032wACgkQ+gerLs4ltQ5xigCfVLhTzFX733cZAO2Jv6JZQm0i
HoIAmQEnTyxa2oLAuE22M7FZHUS00xDu
=WYt2
-END PGP SIGNATURE-

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Alexander Belopolsky

On Wed, Jul 7, 2010 at 3:45 PM, Brett Cannon  wrote:
> On Wed, Jul 7, 2010 at 08:29, Alexander Belopolsky
>  wrote:
..
>> For datetime.py this approach presents several problems:
>>
>> 1. replacing datetime with self.module.datetime everywhere can get
>> messy quickly.
>> 2. There are test classes defined at the test_datetime module level
>> that subclass from datetime classes.  The self.module is not available
>> at the module level.  These should probably be moved to setUp()
>> methods and attached to test case self.
>> 3. If #2 is resolved by moving definitions inside functions, the
>> classes will become unpickleable and pickle tests will break.  Some
>> hackery involving injecting these classes into __main__ or module
>> globals may be required.
>
> So I have been thinking about this about how to possibly make this
> standard test scaffolding a little cleaner. I think a class decorator
> might do the trick. If you had all test methods take a module argument
> you could pass in the module that should be used to test. Then you
> simply rename test_* to _test_*, create test_*_(c|py), and then have
> those methods call their _test_* equivalents with the proper module to
> test. You could even make this generic by having the keyword arguments
> to the decorator by what the test suffix is named.
>

Hmm, I've been playing with the idea of using a metaclass to do
essentially the same, but a class decorator may be a simpler solution.
  I still don't see how this address #1, though.  In the ideal world,
I would like not to touch the body of test_* methods.  These methods,
however are written assuming from datetime import date, time,
datetime, tzinfo, etc at the top of test_datetime.py.  Even if the
decorator will call _test_* with six additional arguments named date,
time, datetime, tzinfo, etc, it will not work because by the time
decorator (or even metaclass machinery) gets to operate, these names
are already resolved as globals.

> The benefit of this is you don't have to define one base class and
> then two subclasses; you define a single test class and simply add a
> decorator.

I like this.

> This addresses #1.

Except it does not. :-(

> As for #3, that I can't answer and might
> simply require restructuring those specific pickle tests.

What about #2?
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Brett Cannon

On Wed, Jul 7, 2010 at 13:16, Alexander Belopolsky
 wrote:
> On Wed, Jul 7, 2010 at 3:45 PM, Brett Cannon  wrote:
>> On Wed, Jul 7, 2010 at 08:29, Alexander Belopolsky
>>  wrote:
> ..
>>> For datetime.py this approach presents several problems:
>>>
>>> 1. replacing datetime with self.module.datetime everywhere can get
>>> messy quickly.
>>> 2. There are test classes defined at the test_datetime module level
>>> that subclass from datetime classes.  The self.module is not available
>>> at the module level.  These should probably be moved to setUp()
>>> methods and attached to test case self.
>>> 3. If #2 is resolved by moving definitions inside functions, the
>>> classes will become unpickleable and pickle tests will break.  Some
>>> hackery involving injecting these classes into __main__ or module
>>> globals may be required.
>>
>> So I have been thinking about this about how to possibly make this
>> standard test scaffolding a little cleaner. I think a class decorator
>> might do the trick. If you had all test methods take a module argument
>> you could pass in the module that should be used to test. Then you
>> simply rename test_* to _test_*, create test_*_(c|py), and then have
>> those methods call their _test_* equivalents with the proper module to
>> test. You could even make this generic by having the keyword arguments
>> to the decorator by what the test suffix is named.
>>
>
> Hmm, I've been playing with the idea of using a metaclass to do
> essentially the same, but a class decorator may be a simpler solution.
>  I still don't see how this address #1, though.  In the ideal world,
> I would like not to touch the body of test_* methods.  These methods,
> however are written assuming from datetime import date, time,
> datetime, tzinfo, etc at the top of test_datetime.py.  Even if the
> decorator will call _test_* with six additional arguments named date,
> time, datetime, tzinfo, etc, it will not work because by the time
> decorator (or even metaclass machinery) gets to operate, these names
> are already resolved as globals.

Well, I personally would call that bad form to import those classes
explicitly, but that's just me. You will simply need to make them work
off of the module object. There is nothing wrong with "cleaning up"
the tests as part of your work; the tests code should not be enshrined
as perfect.

>
>> The benefit of this is you don't have to define one base class and
>> then two subclasses; you define a single test class and simply add a
>> decorator.
>
> I like this.
>
>> This addresses #1.
>
> Except it does not. :-(
>
>> As for #3, that I can't answer and might
>> simply require restructuring those specific pickle tests.
>
> What about #2?
>

Either define two different subclasses or write a function that
returns the class using the superclass that you want.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Alexander Belopolsky

On Wed, Jul 7, 2010 at 4:33 PM, Brett Cannon  wrote:

 2. There are test classes defined at the test_datetime module level
 that subclass from datetime classes.  The self.module is not available
 at the module level.  These should probably be moved to setUp()
 methods and attached to test case self.
..
>> What about #2?

> Either define two different subclasses or write a function that
> returns the class using the superclass that you want.
>

Selecting one of two globally defined different subclasses will be
ugly in parameterized tests.  An in the other approach, the class
definitions will have to be moved away from the module level and
inside a scope where module variable is present.  Yes, it looks like
some refactoring is unavoidable.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Georg Brandl

Am 07.07.2010 20:12, schrieb Barry Warsaw:
> On Jul 07, 2010, at 07:30 PM, Georg Brandl wrote:
> 
>>Overall, I think that we can make stdlib docstrings valid reST -- even
>>if it's reST without much markup -- but valid, so that people pulling
>>in stdlib doc- strings into Sphinx docs won't get ugly warnings.
>>
>>What I would *not* like to see is heavy markup and Sphinx specifics --
>>that would only make sense if we included the docstrings in the docs,
>>and I don't see that coming.
> 
> Does it make sense to add (reST-style) epydoc markup for API signatures?
> E.g.
> 
> def create_foo(name, parent=None):
> """Create the named foo.
> 
> The named foo must not already exist, but if optional `parent` is given,
> it must exist.
> 
> :param name: The name of the new foo.
> :type name: string
> :param parent: The new foo's parent.  If given, this must exist.
> :type parent: string
> :return: The new foo.
> :rtype: `Foo`
> :raises BadFooNameError: when `name` is illegal.
> :raises FooAlreadyExistsError: when a foo with `name` already exists.
> :raises BadParentError: when the foo's parent does not exist.
> """
> 
> We could then generate automatic API docs from this, a la:
> 
> http://www.blender.org/documentation/248PythonDoc/

Yes, but: do we want this?

Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Martin Geisler

"C. Titus Brown"  writes:

> I guess docutils isn't in the stdlib (should it be?) or else we could
> modify 'help' to use it to prepare a straight text formatting.

We're using light-weight ReST markup in the Mercurial help texts and
transform it into straight text upon display in the terminal.

We want no external dependencies for Mercurial, so I wrote a "mini ReST"
parser in about 400 lines of code. It cheats a lot and can only handle
simple constructs... but maybe it would be interesting for Python's
help? You find it here:

  http://selenic.com/hg/file/tip/mercurial/minirst.py

Its test and the corresponding output shows the markup it can parse:

  http://selenic.com/hg/file/tip/tests/test-minirst.py
  http://selenic.com/hg/file/tip/tests/test-minirst.py.out

It would of course be much nicer to have Docutils in the standard
library. I'm not a Docutils developer, but to me it seems that Docutils
is now a very stable and widely used package, so it would IMHO make
sense to include it.

-- 
Martin Geisler

Mercurial links: http://mercurial.ch/

pgpf0R31gMZHO.pgp
Description: PGP signature
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Georg Brandl

Am 07.07.2010 21:11, schrieb Fred Drake:
> On Wed, Jul 7, 2010 at 2:27 PM, Georg Brandl  wrote:
>> I know, and this is what I originally intended for Sphinx.  However, the 
>> calls
>> for automatic doc generation are very loud, and it's understandable that most
>> project can't afford writing their documentation twice.
> 
> The ability to provide extended content beyond what's provided in the
> docstring using the auto* constructs may make it feasible to start
> avoiding some of those DRY violations for Python's standard library;
> I'm enjoying those for another project.
> 
> I hope we don't end up in a position where we can't use the auto*
> constructs in the Python documentation.

Let's say we were okay with giving up single-source docs, one potential
problem is that autodoc needs to import the modules in question, which
can become a problem, on one hand for platform-specific modules, on the
other because the Python building the docs is not necessarily the Python
that is documented.

Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] versioned .so files for Python 3.2

2010-07-07 Thread Georg Brandl

Am 07.07.2010 20:40, schrieb Barry Warsaw:

> Getting back to this after the US holiday.  Thanks for running these numbers
> Scott.  I've opened a bug in the Python tracker and attached my latest patch:
> 
> http://bugs.python.org/issue9193
> 
> The one difference from previous versions of the patch is that the .so tag is
> now settable via "./configure --with-so-abi-tag=foo".  This would generate
> shared libs like _multiprocessing.foo.so.
> 
> I'd like to get consensus as to whether folks feel that a PEP is needed.  My
> own thought is that I'd rather not do a PEP specific to this change, but I
> would update PEP 384 with the implications on .so versioning.  Please also
> feel free to review the patch in that issue.

I can see where this is going... writing it into PEP 384 would automatically get
the change accepted?

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Georg Brandl

Am 07.07.2010 21:52, schrieb Brett Cannon:
> On Wed, Jul 7, 2010 at 10:30, Georg Brandl  wrote:
>> Am 07.07.2010 18:09, schrieb Michael Foord:
 Hi all,

 over on the fellowship o' the packaging mailing list, one of our GSoC 
 students
 (merwok) asked about how much formatting info should go into Python 
 stdlib
 docstrings.  Right now the stdlib docstrings are primarily text, 
 AFAIK; but
 with the switch to Sphinx for the official Python docs, should we 
 permit
 ReST-general and/or Sphinx-specific markup in docstrings?
>>
>> I promised to write a PEP about that some time in the future.  (Probably 
>> after
>> 3.2 final.)
> 
> For those of you who aren't aware, there actually already is a PEP on
> using reST in docstrings: http://python.org/dev/peps/pep-0287/ .
> 
> But it could stand to be updated by Georg with current practice with
> our internal doc practices as 2002 was back when we were still using
> LaTeX.

Thanks for the reference, Brett.  I do not intend to do my work in PEP 287
(though it probably could be given a few updates, have to look in detail),
but rather refer to it in the new one, since my proposed PEP was less about
actual syntax of reST docstrings than about their use in the standard
library.  As far as I could see, most recommendations from PEP 287 would
apply to them.

Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Antoine Pitrou

On Wed, 07 Jul 2010 22:58:47 +0200
Martin Geisler  wrote:

> "C. Titus Brown"  writes:
> 
> > I guess docutils isn't in the stdlib (should it be?) or else we could
> > modify 'help' to use it to prepare a straight text formatting.
> 
> We're using light-weight ReST markup in the Mercurial help texts and
> transform it into straight text upon display in the terminal.
> 
> We want no external dependencies for Mercurial, so I wrote a "mini ReST"
> parser in about 400 lines of code. It cheats a lot and can only handle
> simple constructs... but maybe it would be interesting for Python's
> help? You find it here:
> 
>   http://selenic.com/hg/file/tip/mercurial/minirst.py

Given that Mercurial is GPL, this is probably of no use to us,
unfortunately.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Brett Cannon

On Wed, Jul 7, 2010 at 13:53, Alexander Belopolsky
 wrote:
> On Wed, Jul 7, 2010 at 4:33 PM, Brett Cannon  wrote:
>
> 2. There are test classes defined at the test_datetime module level
> that subclass from datetime classes.  The self.module is not available
> at the module level.  These should probably be moved to setUp()
> methods and attached to test case self.
> ..
>>> What about #2?
>
>> Either define two different subclasses or write a function that
>> returns the class using the superclass that you want.
>>
>
> Selecting one of two globally defined different subclasses will be
> ugly in parameterized tests.

Didn't say it was a pretty solution. =)

>  An in the other approach, the class
> definitions will have to be moved away from the module level and
> inside a scope where module variable is present.

Yep, which is not a big deal.

>  Yes, it looks like
> some refactoring is unavoidable.
>

=)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Michael Foord


On 07/07/2010 21:54, Georg Brandl wrote:

Am 07.07.2010 20:12, schrieb Barry Warsaw:
   

On Jul 07, 2010, at 07:30 PM, Georg Brandl wrote:

 

Overall, I think that we can make stdlib docstrings valid reST -- even
if it's reST without much markup -- but valid, so that people pulling
in stdlib doc- strings into Sphinx docs won't get ugly warnings.

What I would *not* like to see is heavy markup and Sphinx specifics --
that would only make sense if we included the docstrings in the docs,
and I don't see that coming.
   

Does it make sense to add (reST-style) epydoc markup for API signatures?
E.g.

def create_foo(name, parent=None):
 """Create the named foo.

 The named foo must not already exist, but if optional `parent` is given,
 it must exist.

 :param name: The name of the new foo.
 :type name: string
 :param parent: The new foo's parent.  If given, this must exist.
 :type parent: string
 :return: The new foo.
 :rtype: `Foo`
 :raises BadFooNameError: when `name` is illegal.
 :raises FooAlreadyExistsError: when a foo with `name` already exists.
 :raises BadParentError: when the foo's parent does not exist.
 """

We could then generate automatic API docs from this, a la:

http://www.blender.org/documentation/248PythonDoc/
 

Yes, but: do we want this?

   


-1 :-)

I find those epydoc style framed API docs very hard to read and 
navigate. On the other hand autogenerated API docs *can* be good looking 
and usable.


Michael


Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
   



--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] versioned .so files for Python 3.2

2010-07-07 Thread John Arbash Meinel

Scott Dial wrote:
> On 6/30/2010 2:53 PM, Barry Warsaw wrote:
>> It might be amazing, but it's still a significant overhead.  As I've
>> described, multiply that by all the py files in all the distro packages
>> containing Python source code, and then still try to fit it on a CDROM.
> 
> I decided to prove to myself that it was not a significant issue to have
> parallel directory structures in a .tar.bz2, and I was surprised to find
> it much worse at that then I had imagined. For example,
> 
> # cd /usr/lib/python2.6/site-packages
> # tar --exclude="*.pyc" --exclude="*.pyo" \
>   -cjf mercurial.tar.bz2 mercurial
> # du -h mercurial.tar.bz2
> 640Kmercurial.tar.bz2
> 
> # cp -a mercurial mercurial2
> # tar --exclude="*.pyc" --exclude="*.pyo" \
>   -cjf mercurial2.tar.bz2 mercurial mercurial2
> # du -h mercurial.tar.bz2
> 1.3Mmercurial2.tar.bz2
> 

I believe the standard (and largest) block size for .bz2 is 900kB, and I
*think* that is uncompressed. Though I know that bz2 can chain, since it
can compress all NULL bytes extremely well (multiple GB down to kB, IIRC).

There was a question as to whether LZMA would do better here, I'm using
7zip, but .xz should perform similarly.

$ du -sh mercurial*
2.6Mmercurial
2.6Mmercurial2

366K mercurial.tar.bz2
734K mercurial2.tar.bz2

303K mercurial.7z
310K mercurial2.7z

So LZMA with the 'normal' compression has a big enough window to find
almost all of the redundancy, and 310kB is certainly a very small
increase over the 303kB. And clearly bz2 does not, since 734kB is
actually slightly more than 2x 366kB.

John
=:->
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Michael Foord


On 07/07/2010 21:33, Brett Cannon wrote:

On Wed, Jul 7, 2010 at 13:16, Alexander Belopolsky
  wrote:
   

On Wed, Jul 7, 2010 at 3:45 PM, Brett Cannon  wrote:
 

On Wed, Jul 7, 2010 at 08:29, Alexander Belopolsky
  wrote:
   

..
 

For datetime.py this approach presents several problems:

1. replacing datetime with self.module.datetime everywhere can get
messy quickly.
2. There are test classes defined at the test_datetime module level
that subclass from datetime classes.  The self.module is not available
at the module level.  These should probably be moved to setUp()
methods and attached to test case self.
3. If #2 is resolved by moving definitions inside functions, the
classes will become unpickleable and pickle tests will break.  Some
hackery involving injecting these classes into __main__ or module
globals may be required.
 

So I have been thinking about this about how to possibly make this
standard test scaffolding a little cleaner. I think a class decorator
might do the trick. If you had all test methods take a module argument
you could pass in the module that should be used to test. Then you
simply rename test_* to _test_*, create test_*_(c|py), and then have
those methods call their _test_* equivalents with the proper module to
test. You could even make this generic by having the keyword arguments
to the decorator by what the test suffix is named.

   

Hmm, I've been playing with the idea of using a metaclass to do
essentially the same, but a class decorator may be a simpler solution.
  I still don't see how this address #1, though.  In the ideal world,
I would like not to touch the body of test_* methods.  These methods,
however are written assuming from datetime import date, time,
datetime, tzinfo, etc at the top of test_datetime.py.  Even if the
decorator will call _test_* with six additional arguments named date,
time, datetime, tzinfo, etc, it will not work because by the time
decorator (or even metaclass machinery) gets to operate, these names
are already resolved as globals.
 

Well, I personally would call that bad form to import those classes
explicitly, but that's just me. You will simply need to make them work
off of the module object. There is nothing wrong with "cleaning up"
the tests as part of your work; the tests code should not be enshrined
as perfect.

   
Yep - each test should take the module under test (either in C or 
Python) as the parameter and used classes / functions as attributes off 
the module object.


Using a class decorator to duplicate each _test_ into two test_* methods 
sounds  like a good approach.


Michael

--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Alexander Belopolsky

On Wed, Jul 7, 2010 at 5:56 PM, Michael Foord  wrote:
..
>> Well, I personally would call that bad form to import those classes
>> explicitly, but that's just me. You will simply need to make them work
>> off of the module object. There is nothing wrong with "cleaning up"
>> the tests as part of your work; the tests code should not be enshrined
>> as perfect.
>>
>>
>
> Yep - each test should take the module under test (either in C or Python) as
> the parameter and used classes / functions as attributes off the module
> object.
>

This is somewhat uncharted territory.  So far test_* methods had no
parameters except self and module was attached to the TestCase
subclass.  It would be accessed inside test_* methods as self.module.
 I think changing test_* methods' signature is too much of a price to
pay for saving self. prefix.  I will still have to touch every date,
time, datetime, etc symbols throughout the test file.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Fred Drake

On Wed, Jul 7, 2010 at 4:58 PM, Georg Brandl  wrote:
> Let's say we were okay with giving up single-source docs,

It's not clear that this is a goal.

> one potential
> problem is that autodoc needs to import the modules in question, which
> can become a problem, on one hand for platform-specific modules, on the
> other because the Python building the docs is not necessarily the Python
> that is documented.

This is an excellent point.  I'm less worried about the
platform-specific issues, since we can decide that those in particular
can't use the auto* support, but the need to build docs for different
versions/implementations of Python is an interesting use case.

  -Fred

-- 
Fred L. Drake, Jr.
"A storm broke loose in my mind."  --Albert Einstein
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] versioned .so files for Python 3.2

2010-07-07 Thread Nick Coghlan

On Thu, Jul 8, 2010 at 4:40 AM, Barry Warsaw  wrote:
> I'd like to get consensus as to whether folks feel that a PEP is needed.  My
> own thought is that I'd rather not do a PEP specific to this change, but I
> would update PEP 384 with the implications on .so versioning.  Please also
> feel free to review the patch in that issue.

I suspect you could write a new PEP faster than you could convince
those suggesting the change needs a PEP (including me) that one isn't
necessary. Presumably you were going to do a summary email for the
mailing list anyway - just tidy up the formatting a bit and check it
in as a PEP instead :)

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Include datetime.py in stdlib or not?

2010-07-07 Thread Terry Reedy


On 7/7/2010 3:32 PM, Brett Cannon wrote:


That's the idea. We already have contributors from the various VMs who
has commit privileges, but they all work in their own repos for
convenience. My hope is that if we break the stdlib out into its own
repository that people simply pull in then other VM contributors will
work directly off of the stdlib repo instead of their own, magnifying
the usefulness of their work.


I was wondering if you had more than 'hope', but thinking about it now, 
I think it premature to ask for commitments. Once a Python3 stdlib hg 
subrepository is set up and running, the logic of joining in should be 
obvious -- or not.


I am now seeing that a more complete common Python-level test suite is 
also important. Being able to move Python code, that only uses the 
stdlibk,between implementations and have it just work would be good for 
all of them.



3. What version of Python would be allowed for use in the stdlib? I would
like the stdlib for 3.x to be able to use 3.x code. This would be only a
minor concern for CPython as long as 2.7 is maintained, but a major concern
for the other implementation currently 'stuck' in 2.x only. A good 3to2
would be needed.


This will only affect py3k.


Good. The Python3 stdlib should gradually become modern Python3 code. 
(An example archaism -- the use in difflib of dicts with arbitrary 
values used as sets -- which I plan to fix.)



I generally favor having Python versions of modules available. My current
post on difflib.SequenceMatcher is based on experiments with an altered
version. I copied difflib.py to my test directory, renamed it diff2lib.py,
so I could import both versions, found and edited the appropriate method,
and off I went. If difflib were in C, my post would have been based on
speculation about how a fixed version would operate, rather than on data.



The effect upon CPython would be the extension modules become just
performance improvements, nothing more (unless they have to be in C as
in the case for sqlite3).


As pre- and jit compilation improve, the need for hand-coded C will go 
down. For instance, annotate (in a branch, not trunk) and compile with 
Cython.



4. Does not ctypes make it possible to replace a method of a Python-coded
class with a faster C version, with something like
  try:
connect to methods.dll
check that function xyx exists
replace Someclass.xyy with ctypes wrapper
  except: pass
For instance, the SequenceMatcher heuristic was added to speedup the
matching process that I believe is encapsulated in one O(n**2) or so
bottleneck method. I believe most everything else is O(n) bookkeeping.



There is no need to go that far. All one needs to do is structure the
extension code such that when the extension module is imported, it
overrides key objects in the Python version.


Is it possible to replace a python-coded function in a python-coded 
class with a C-coded function? I had the impression from the issue 
discussion that one would have to recode the entire class, even if only 
a single method really needed it.



Using ctypes is just added complexity.


Only to be used if easier than extra C coding.

--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Nick Coghlan

On Thu, Jul 8, 2010 at 7:36 AM, Brett Cannon  wrote:
>> Selecting one of two globally defined different subclasses will be
>> ugly in parameterized tests.
>
> Didn't say it was a pretty solution. =)
>
>>  An in the other approach, the class
>> definitions will have to be moved away from the module level and
>> inside a scope where module variable is present.
>
> Yep, which is not a big deal.
>
>>  Yes, it looks like
>> some refactoring is unavoidable.

If you want to run the same module twice with different instances of
an imported module (or any other parameterised globals), creative use
of run_module() can provide module level scoping without completely
restructuring your tests.

1. Move the current tests aside into a new file that isn't
automatically invoked by regrtest (e.g. _test_datetime_inner.py).
2. In that code, remove any imports from datetime (instead, assume
datetime will be injected into the module's namespace)*
3. In test_datetime.py itself, use runpy.run_module() to import the
renamed module twice, once with the Python version of datetime in
init_globals and once with the C version.

*How the removals work:
"import datetime" is dropped entirely
"from datetime import x, y, x" becomes "x, y, z = datetime.x,
datetime.y, datetime.z"

There would be additional things to do to make the attribution of the
test results clearer in order to make this effective in practice
though.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Nick Coghlan

On Thu, Jul 8, 2010 at 7:56 AM, Michael Foord  wrote:
> Using a class decorator to duplicate each _test_ into two test_* methods
> sounds  like a good approach.

Note that parameterised methods have a similar problem to
parameterised modules - unittest results are reported in terms of
"testmodule.testclass.testfunction", so proper attribution of results
in the test output will require additional work. The separate
subclasses approach doesn't share this issue, since it changes the
value of the second item in accordance with the module under test.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

2010-07-07 Thread Terry Reedy


On 7/7/2010 4:11 PM, Tres Seaver wrote:


Antoine Pitrou wrote:

On Wed, 7 Jul 2010 19:44:31 +0200
Eli Bendersky  wrote:

For what it's worth, my benchmarking showed that modifying the
heuristic to only kick in when there are more than 100 kinds of
elements (Terry's option A) didn't affect the runtime of matching
whatsoever, even when the heuristic *does* kick in. All it adds,
really, is the overhead of a single 'if' statement. So it wouldn't be
right to assume that somehow modifying the heuristic or allowing to
turn it off will negatively affect performance in the special case Tim
originally optimized for.


Just because it doesn't affect performance in your tests doesn't mean it
won't do so in the general case. Consider a case where Tim's junk
optimization kicked in and helped improve performance a lot, but where
there are still less than 100 alphabet symbols. The new heuristic will
ruin this use case.


That would describe pretty much every C program ever written, for
instance, and nearly as high a percentage of all Python modules /
scripts ever written:  the 'string.printable' set, less formfeed and
vertical tab, is 98 characters long.


In the primary use case, programs are compared by line, not characters, 
and there are more than 100 different lines in any sensible program of 
at least 200 lines.



--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Brett Cannon

On Wed, Jul 7, 2010 at 15:31, Nick Coghlan  wrote:
> On Thu, Jul 8, 2010 at 7:56 AM, Michael Foord  
> wrote:
>> Using a class decorator to duplicate each _test_ into two test_* methods
>> sounds  like a good approach.
>
> Note that parameterised methods have a similar problem to
> parameterised modules - unittest results are reported in terms of
> "testmodule.testclass.testfunction", so proper attribution of results
> in the test output will require additional work. The separate
> subclasses approach doesn't share this issue, since it changes the
> value of the second item in accordance with the module under test.

This is why a new method would need to be created with a special
suffix to delineate what module the test was called with. So instead
of testclass specifying what module was used, it would be
testfunction.

I guess it becomes a question of what boilerplate you prefer. One nice
benefit of the class decorator that I can think of is it could handle
the import trickery for you so you wouldn't even need to worry about
that issue. This could also allow the decorator to not bother running
the tests twice if the extension helper was not available.

-Brett

>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   [email protected]   |   Brisbane, Australia
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Alexander Belopolsky

On Wed, Jul 7, 2010 at 6:27 PM, Nick Coghlan  wrote:
..
> If you want to run the same module twice with different instances of
> an imported module (or any other parameterised globals), creative use
> of run_module() can provide module level scoping without completely
> restructuring your tests.
>

This is what the current patch at

http://bugs.python.org/file17848/issue7989.diff

does, but at expense of not exposing testcases to unittest correctly.

> 1. Move the current tests aside into a new file that isn't
> automatically invoked by regrtest (e.g. _test_datetime_inner.py).

Yes, I already have datetimetester.py.

> 2. In that code, remove any imports from datetime (instead, assume
> datetime will be injected into the module's namespace)*

Hmm.  That will make datetimetester not importable.

> 3. In test_datetime.py itself, use runpy.run_module() to import the
> renamed module twice, once with the Python version of datetime in
> init_globals and once with the C version.
>
> *How the removals work:
> "import datetime" is dropped entirely
> "from datetime import x, y, x" becomes "x, y, z = datetime.x,
> datetime.y, datetime.z"
>
I'll try that.


> There would be additional things to do to make the attribution of the
> test results clearer in order to make this effective in practice
> though.

Thanks.  I would really like to make it work first and improve later.
I hope this will do the trick.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Benjamin Peterson

2010/7/7 Nick Coghlan :
> On Thu, Jul 8, 2010 at 7:56 AM, Michael Foord  
> wrote:
>> Using a class decorator to duplicate each _test_ into two test_* methods
>> sounds  like a good approach.
>
> Note that parameterised methods have a similar problem to
> parameterised modules - unittest results are reported in terms of
> "testmodule.testclass.testfunction", so proper attribution of results
> in the test output will require additional work. The separate
> subclasses approach doesn't share this issue, since it changes the
> value of the second item in accordance with the module under test.

A good parameterized implementation, though, gives the repr() of the
parameters in failure output.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] versioned .so files for Python 3.2

2010-07-07 Thread Matthias Klose


On 07.07.2010 20:40, Barry Warsaw wrote:

Getting back to this after the US holiday.  Thanks for running these numbers
Scott.  I've opened a bug in the Python tracker and attached my latest patch:

http://bugs.python.org/issue9193

The one difference from previous versions of the patch is that the .so tag is
now settable via "./configure --with-so-abi-tag=foo".  This would generate
shared libs like _multiprocessing.foo.so.


 - imo, it's wrong to lookup _multiprocessing.so first, before looking
   up _multiprocessing.foo.so (at least for the use case to put the
   extensions for multiple python versions into one directory).

 - why is the flexibility of specifying the "foo" needed?  The
   naming for the __pycache__ files is fixed, why have it configurable
   for extensions?

Matthias
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Stephen J. Turnbull

Antoine Pitrou writes:

 > >   http://selenic.com/hg/file/tip/mercurial/minirst.py
 > 
 > Given that Mercurial is GPL, this is probably of no use to us,
 > unfortunately.

Given that Martin apparently is the only or main author, I don't see a
problem as long as he's willing.

Martin?
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Benjamin Peterson

2010/7/7 Stephen J. Turnbull :
> Antoine Pitrou writes:
>
>  > >   http://selenic.com/hg/file/tip/mercurial/minirst.py
>  >
>  > Given that Mercurial is GPL, this is probably of no use to us,
>  > unfortunately.
>
> Given that Martin apparently is the only or main author, I don't see a
> problem as long as he's willing.

And he hasn't assigned the copyright away.


-- 
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

2010-07-07 Thread Terry Reedy



I had considered the possibility of option A for 2.7 and A & C for 3.2. 
But see below.


Since posting, I did an experiment with a 700 char paragraph of text 
(the summary from the post) compared to an 'edited' version. I did the 
comparision with and without the current heuristic. I did not notice 
much time difference (a couple of seconds either way) and the edit list 
was essentially the same. The heuristic lowered the reported match ratio 
from .96 to .88, which would be bad when one wanted the unaltered value.


I do not know which, if any, chars other than 'e' were junked as that 
info currently is not exposed. I propose below that it should be.


I intentionally did not list as options

D. Keep the status quo that is buggy for certain uses.

Good things often have more uses than the inventor intended or expected. 
They should not be prevented.


E. Completely remove the heuristic, which would restore 'buggy' 
performance for other uses.


One of the closed issues was E, rejected for that reason.

---
I also did not list one of my original ideas, but after the discussion 
it is looking better to me. It is based on the following statement of 
the current heuristic:


"Disregard as junk common items that occur in more that 1% of the 
positions in second sequence b, as long as b is long enough so that 
duplicates cannot be called common."


Tim, I do not know if you remember why you choose 200 as the cutoff, but 
the above assumes that the following in not just a coincidence:


(2 > 199*.01) == True
(2 > 200*.01) == False

In other words, 200 is the smallest length for b that prevents the 
junking of duplicates.


F. Generalize the heuristic by replacing '1' with 'k', where k can be 
None (no heuristic) or 1-99. If not None, replace 200 by 200/k to 
minimally avoid junking of duplicates. If len(b) >= 200/k, then item 
counts should be greater than (len(b)*k)//100, which need only be 
calculated once.


Implementation: Add a new parameter named 'common' or 'threshold' or 
whatever that defaults to 1. After computing b2j without the heuristic, 
if 'common' is not None, prune b2j as described above.


My thinking here is that a user either knows or can easily find out the 
length of b and the size of the intented or actual alphabet of b (the 
latter is len(set(b)). So the user can conditionally call 
SequenceMatcher with 'common' set to None or an int as appropriate, 
perhaps after some experimentation. So the threshold is the only tuning 
parameter actually needed, and it also allows the heuristic to be turned 
off.


The reason I did not list this before is the problem with 2.7.1. F, 
unlike option A, overtly changes the api, and some would object to that 
for 2.7 even though is really is a bugfix. However, option F will not 
not break code while the covert change of option A could break code. So 
this may be the best option for a bad situation. It is a minimal change 
that gives the user complete flexibility.


In other words, I see three options for 2.7.1+:
D. keep the current sometimes buggy behavior
A. covertly change the heuristic to mostly fix it but possibly break 
some uses.
F. add a parameter, with a no-change default, that allows the user to 
select a change if and when wanted.


Another advantage of F is that then 2.7 and 3.2 would get the same change.

--
Other changes that apply regardless of the heuristic/api change:

Update the code to use sets (newer than difflib) instead of dicts with 
values set to 1.


Directly expose the set of 'common' items as an additional attribute of 
SequenceMatcher instances. Such instance attributes are currently 
undocumented, so adding one can hardly be a problem. Add documention 
thereof. Being able to see the effect of the heuristic when it is not 
turned off might help people decide whether or not to use it, or how to 
tune the threshold for smallish alphabets where 1 is too small.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Terry Reedy


On 7/7/2010 2:42 PM, Antoine Pitrou wrote:

I wrote

4. Does not ctypes make it possible to replace a method of a Python-coded
class with a faster C version, with something like
   try:
 connect to methods.dll


methods.dll to be written


 check that function xyx exists
 replace Someclass.xyy with ctypes wrapper
   except: pass
For instance, the SequenceMatcher heuristic was added to speedup the
matching process that I believe is encapsulated in one O(n**2) or so
bottleneck method. I believe most everything else is O(n) bookkeeping.



Except that ctypes doesn't help provide C extensions at all. It only
helps provide wrappers around existing C libraries, which is quite a
different thing.
Which, in the end, makes the original suggestion meaningless.


To you, so let me restate it. It would be easier for many people to only 
rewrite, for instance,  difflib.SequenceMatcher.get_longest_matching in 
C than to rewrite the whole SequenceMatcher class, let alone the whole 
difflib module.


I got the impression from the datetime issue tracker discussion that it 
is not possible to replace a single method of a Python-coded class with 
a C version. I got this from statement that seems to say that having 
parallel Python and C versions is a nuisance because one must replace 
large chunks of Python, at least a class if not the whole module. If 
that impression is wrong, and I hope it is, the suggestion is unnecessary.


If it is right, then replacing the Python-coded function with a 
Python-coded wrapper for a function in a miscellaneous shared library 
might be both possible and useful. But again, if the premise is wrong, 
skip the conclusion.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Michael Foord


On 08/07/2010 02:45, Terry Reedy wrote:

On 7/7/2010 2:42 PM, Antoine Pitrou wrote:

I wrote
4. Does not ctypes make it possible to replace a method of a 
Python-coded

class with a faster C version, with something like
try:
connect to methods.dll


methods.dll to be written


check that function xyx exists
replace Someclass.xyy with ctypes wrapper
except: pass
For instance, the SequenceMatcher heuristic was added to speedup the
matching process that I believe is encapsulated in one O(n**2) or so
bottleneck method. I believe most everything else is O(n) bookkeeping.



Except that ctypes doesn't help provide C extensions at all. It only
helps provide wrappers around existing C libraries, which is quite a
different thing.
Which, in the end, makes the original suggestion meaningless.


To you, so let me restate it. It would be easier for many people to 
only rewrite, for instance, 
difflib.SequenceMatcher.get_longest_matching in C than to rewrite the 
whole SequenceMatcher class, let alone the whole difflib module.


I got the impression from the datetime issue tracker discussion that 
it is not possible to replace a single method of a Python-coded class 
with a C version. I got this from statement that seems to say that 
having parallel Python and C versions is a nuisance because one must 
replace large chunks of Python, at least a class if not the whole 
module. If that impression is wrong, and I hope it is, the suggestion 
is unnecessary.


If it is right, then replacing the Python-coded function with a 
Python-coded wrapper for a function in a miscellaneous shared library 
might be both possible and useful. But again, if the premise is wrong, 
skip the conclusion.


Would it be possible to provide a single method in C by providing a C 
base class with a single method and have the full implementation inherit 
from the C base class if it is available or otherwise a pure Python base 
class?


Michael

--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Robert Collins

On pypi - testscenarios; Its been discussed on TIP before.

Its a 'run a function to parameterise some tests' API, it changes the
id() of the test to include the parameters, and it can be hooked in
via load_tests quite trivially.

Cheers,
Rob
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python equivalents in stdlib Was: Include datetime.py in stdlib or not?

2010-07-07 Thread Terry Reedy


On 7/7/2010 11:43 AM, Jesse Noller wrote:


The idea is to put CPython on a more equal footing with the other
implementations,


I would reverse this to "The idea is to put the other implementations on 
a more equal footing with CPython."


The subtle difference is the implication of whether the idea is to pull 
CPython down (the former) or raise the others up (the latter) ;-).


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] query: docstring formatting in python distutils code

2010-07-07 Thread Stephen J. Turnbull

Benjamin Peterson writes:
 > 2010/7/7 Stephen J. Turnbull :
 > > Antoine Pitrou writes:
 > >
 > >  > >   http://selenic.com/hg/file/tip/mercurial/minirst.py
 > >  >
 > >  > Given that Mercurial is GPL, this is probably of no use to us,
 > >  > unfortunately.
 > >
 > > Given that Martin apparently is the only or main author, I don't see a
 > > problem as long as he's willing.
 > 
 > And he hasn't assigned the copyright away.

(Or that the assignment has an automatic author-use-ok clause like the
standard FSF assignment does, etc.)

Just ask Martin, there are too many possibilities here to worry about.
If maybe we want it, and he is willing to contribute the parts he
wrote to Python under Python's license, then we can worry about
whether we really want it and about how much any required hoop-jumping
will cost.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

73 matches

Mail list logo