Re: [Python-Dev] PEP 446 (make FD non inheritable) ready for a final review

2013-08-23 Thread Charles-François Natali
Hello,

A couple remarks:

> The following functions are modified to make newly created file descriptors 
> non-inheritable by default:
> [...]
> os.dup()

then

> os.dup2() has a new optional inheritable parameter: os.dup2(fd, fd2, 
> inheritable=True). fd2 is created inheritable by default, but non-inheritable 
> if inheritable is False.

Why does dup2() create inheritable FD, and not dup()?

I think a hint is given a little later:

> Applications using the subprocess module with the pass_fds parameter or using 
> os.dup2() to redirect standard streams should not be affected.

But that's overly-optimistic.

For example, a lot of code uses the guarantee that dup()/open()...
returns the lowest numbered file descriptor available, so code like
this:

r, w = os.pipe()
if os.fork() == 0:
# child
os.close(r)
os.close(1)
dup(w)

*will break*

And that's a lot of code (e.g. that's what _posixsubprocess.c uses,
but since it's implemented in C it's wouldn't be affected).

We've already had this discussion, and I stand by my claim that
changing the default *will break* user code.
Furthermore, many people use Python for system programming, and this
change would be highly surprising.

So no matter what the final decision on this PEP is, it must be kept in mind.

> The programming languages Go, Perl and Ruby make newly created file 
> descriptors non-inheritable by default: since Go 1.0 (2009), Perl 1.0 (1987) 
> and Ruby 2.0 (2013).

OK, but do they expose OS file descriptors?
I'm sure such a change would be fine for Java, which doesn't expose
FDs and fork(), but Python's another story.

Last time, I said that to me, the FD inheritance issue is solved on
POSIX by the subprocess module which passes close_fds. In my own code,
I use subprocess, which is the "official", portable and safe way to
create child processes in Python. Someone using fork() + exec() should
know what he's doing, and be able to deal with the consequences: I'm
not only talking about FD inheritance, but also about
async-signal/multi-threaded safety ;-)

As for Windows, since it doesn't have fork(), it would make sense to
make its FD non heritable by default. And then use what you describe
here to selectively inherit FDs (i.e. implement keep_fds):
"""
Since Windows Vista, CreateProcess() supports an extension of the
STARTUPINFO struture: the STARTUPINFOEX structure. Using this new
structure, it is possible to specify a list of handles to inherit:
PROC_THREAD_ATTRIBUTE_HANDLE_LIST. Read Programmatically controlling
which handles are inherited by new processes in Win32 (Raymond Chen,
Dec 2011) for more information.
"""

cf
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Pre-PEP: Redesigning extension modules

2013-08-23 Thread Stefan Behnel
Hi,

this has been subject to a couple of threads on python-dev already, for
example:

http://thread.gmane.org/gmane.comp.python.devel/135764/focus=140986

http://thread.gmane.org/gmane.comp.python.devel/141037/focus=141046

It originally came out of issues 13429 and 16392.

http://bugs.python.org/issue13429

http://bugs.python.org/issue16392

Here's an initial attempt at a PEP for it. It is based on the (unfinished)
ModuleSpec PEP, which is being discussed on the import-sig mailing list.

http://mail.python.org/pipermail/import-sig/2013-August/000688.html

Stefan



PEP: 4XX
Title: Redesigning extension modules
Version: $Revision$
Last-Modified: $Date$
Author: Stefan Behnel 
BDFL-Delegate: ???
Discussions-To: ???
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2013
Python-Version: 3.4
Post-History: 23-Aug-2013
Resolution:


Abstract


This PEP proposes a redesign of the way in which extension modules interact
with the interpreter runtime. This was last revised for Python 3.0 in PEP
3121, but did not solve all problems at the time. The goal is to solve them
by bringing extension modules closer to the way Python modules behave.

An implication of this PEP is that extension modules can use arbitrary
types for their module implementation and are no longer restricted to
types.ModuleType. This makes it easy to support properties at the module
level and to safely store arbitrary global state in the module that is
covered by normal garbage collection and supports reloading and
sub-interpreters.

Motivation
==

Python modules and extension modules are not being set up in the same way.
For Python modules, the module is created and set up first, then the module
code is being executed. For extensions, i.e. shared libraries, the module
init function is executed straight away and does both the creation and
initialisation. This means that it knows neither the __file__ it is being
loaded from nor its package (i.e. its fully qualified module name, FQMN).
This hinders relative imports and resource loading. In Py3, it's also not
being added to sys.modules, which means that a (potentially transitive)
re-import of the module will really try to reimport it and thus run into an
infinite loop when it executes the module init function again. And without
the FQMN, it is not trivial to correctly add the module to sys.modules either.

This is specifically a problem for Cython generated modules, for which it's
not uncommon that the module init code has the same level of complexity as
that of any 'regular' Python module. Also, the lack of a FQMN and correct
file path hinders the compilation of __init__.py modules, i.e. packages,
especially when relative imports are being used at module init time.

Furthermore, the majority of currently existing extension modules has
problems with sub-interpreter support and/or reloading and it is neither
easy nor efficient with the current infrastructure to support these
features. This PEP also addresses these issues.

The current process
===

Currently, extension modules export an initialisation function named
"PyInit_modulename", named after the file name of the shared library. This
function is executed by the import machinery and must return either NULL in
the case of an exception, or a fully initialised module object. The
function receives no arguments, so it has no way of knowing about its
import context.

During its execution, the module init function creates a module object
based on a PyModuleDef struct. It then continues to initialise it by adding
attributes to the module dict, creating types, etc.

In the back, the shared library loader keeps a note of the fully qualified
module name of the last module that it loaded, and when a module gets
created that has a matching name, this global variable is used to determine
the FQMN of the module object. This is not entirely safe as it relies on
the module init function creating its own module object first, but this
assumption usually holds in practice.

The main problem in this process is the missing support for passing state
into the module init function, and for safely passing state through to the
module creation code.

The proposal


The current extension module initialisation will be deprecated in favour of
a new initialisation scheme. Since the current scheme will continue to be
available, existing code will continue to work unchanged, including binary
compatibility.

Extension modules that support the new initialisation scheme must export a
new public symbol "PyModuleCreate_modulename", where "modulename" is the
name of the shared library. This mimics the previous naming convention for
the "PyInit_modulename" function.

This symbol must resolve to a C function with the following signature::

PyObject* (*PyModuleTypeCreateFunction)(PyObject* module_spec)

The "module_spec" argument receives a "ModuleSpec" instance, as defined in
PEP 4XX (FIXME). (All names are obviously

Re: [Python-Dev] Pre-PEP: Redesigning extension modules

2013-08-23 Thread Antoine Pitrou

Hi,

Le Fri, 23 Aug 2013 10:50:18 +0200,
Stefan Behnel  a écrit :
> 
> Here's an initial attempt at a PEP for it. It is based on the
> (unfinished) ModuleSpec PEP, which is being discussed on the
> import-sig mailing list.

Thanks for trying this. I think the PEP should contain working example
code for module initialization (and creation), to help gauge the
complexity for module writers.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2013-08-23 Thread Python tracker

ACTIVITY SUMMARY (2013-08-16 - 2013-08-23)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open4168 (+16)
  closed 26426 (+49)
  total  30594 (+65)

Open issues with patches: 1914 


Issues opened (47)
==

#17702: os.environ converts key type from string to bytes in KeyError 
http://bugs.python.org/issue17702  reopened by haypo

#18757: Fix internal references for concurrent modules
http://bugs.python.org/issue18757  opened by serhiy.storchaka

#18758: Fix internal references in the documentation
http://bugs.python.org/issue18758  opened by serhiy.storchaka

#18760: Fix internal doc references for the xml package
http://bugs.python.org/issue18760  opened by serhiy.storchaka

#18763: subprocess: file descriptors should be closed after preexec_fn
http://bugs.python.org/issue18763  opened by neologix

#18764: The pdb print command prints repr instead of str in python3
http://bugs.python.org/issue18764  opened by r.david.murray

#18765: unittest needs a way to launch pdb.post_mortem or other debug 
http://bugs.python.org/issue18765  opened by gregory.p.smith

#18766: IDLE: Autocomplete in editor doesn't work for un-imported modu
http://bugs.python.org/issue18766  opened by philwebster

#18767: csv documentation does not note default quote constant
http://bugs.python.org/issue18767  opened by bemclaugh

#18769: argparse remove subparser
http://bugs.python.org/issue18769  opened by Michael.Bikovitsky

#18772: Fix gdb plugin for new sets dummy object
http://bugs.python.org/issue18772  opened by pitrou

#18774: There is still last bit of GNU Pth code in signalmodule.c
http://bugs.python.org/issue18774  opened by vajrasky

#18775: name attribute for HMAC
http://bugs.python.org/issue18775  opened by christian.heimes

#18776: atexit error display behavior changed in python 3
http://bugs.python.org/issue18776  opened by doughellmann

#18778: email docstrings and comments say about Unicode strings
http://bugs.python.org/issue18778  opened by serhiy.storchaka

#18779: Misleading documentations and comments in regular expression H
http://bugs.python.org/issue18779  opened by vajrasky

#18780: SystemError when formatting int subclass
http://bugs.python.org/issue18780  opened by serhiy.storchaka

#18783: No more refer to Python "long"
http://bugs.python.org/issue18783  opened by serhiy.storchaka

#18784: minor uuid.py loading optimization
http://bugs.python.org/issue18784  opened by eugals

#18785: Add get_body and iter_attachments to provisional email API
http://bugs.python.org/issue18785  opened by r.david.murray

#18787: Misleading error from getspnam function of spwd module
http://bugs.python.org/issue18787  opened by vajrasky

#18789: XML Vunerability Table Unclear
http://bugs.python.org/issue18789  opened by joe-tennies

#18790: incorrect text in argparse add_help example
http://bugs.python.org/issue18790  opened by purplezephyr

#18795: pstats - allow stats sorting by cumulative time per call and t
http://bugs.python.org/issue18795  opened by alexnvdias

#18796: Wrong documentation of show_code function from dis module
http://bugs.python.org/issue18796  opened by vajrasky

#18798: Typo and unused variables in test fcntl
http://bugs.python.org/issue18798  opened by vajrasky

#18799: Resurrect and fix test_404 in Lib/test/test_xmlrpc.py
http://bugs.python.org/issue18799  opened by vajrasky

#18800: Document Fraction's numerator and denominator properties
http://bugs.python.org/issue18800  opened by icedream91

#18801: inspect.classify_class_attrs() misclassifies object.__new__()
http://bugs.python.org/issue18801  opened by eric.snow

#18802: ipaddress documentation errors
http://bugs.python.org/issue18802  opened by jongfoster

#18803: Fix more typos in .py files
http://bugs.python.org/issue18803  opened by iwontbecreative

#18804: pythorun.c: is_valid_fd() should not duplicate the file descri
http://bugs.python.org/issue18804  opened by haypo

#18805: ipaddress netmask/hostmask parsing bugs
http://bugs.python.org/issue18805  opened by jongfoster

#18806: socketmodule: fix/improve setipaddr() numeric addresses handli
http://bugs.python.org/issue18806  opened by neologix

#18807: Allow venv to create copies, even when symlinks are supported
http://bugs.python.org/issue18807  opened by andrea.corbellini

#18808: Thread.join returns before PyThreadState is destroyed
http://bugs.python.org/issue18808  opened by Tamas.K

#18809: Expose mtime check in importlib.machinery.FileFinder
http://bugs.python.org/issue18809  opened by brett.cannon

#18810: Stop doing stat calls in importlib.machinery.FileFinder to see
http://bugs.python.org/issue18810  opened by brett.cannon

#18813: Speed up slice object processing
http://bugs.python.org/issue18813  opened by scoder

#18814: Add tools for "cleaning" surrogate escaped strings
http://bugs.python.org/issue18814  opened by ncoghlan

#188

Re: [Python-Dev] PEP 446 (make FD non inheritable) ready for a final review

2013-08-23 Thread Victor Stinner
Hi,

I will try to answer to your worries. Tell me if I should complete the
PEP with these answers.

2013/8/23 Charles-François Natali :
> Why does dup2() create inheritable FD, and not dup()?

Ah yes, there were as section explaining it. In a previous version of
the PEP (and its implementation), os.dup() and os.dup2() created
non-inheritable FD, but inheritable FD for standard steams (fd 0, 1,
2).

I did a research on http://code.ohloh.net/ to see how os.dup2() is
used in Python projects. 99.4% (169 projects on 170) uses os.dup2() to
replace standard streams: file descriptors 0, 1, 2 (stdin, stdout,
stderr); sometimes only stdout or stderr. I only found a short demo
script using dup2() with arbitrary file descriptor numbers (10 and 11)
to keep a copy of stdout and stderr before replacing them (also with
dup2).

I didn't find use cases of dup() to inherit file descriptors in the
Python standard library. It's the opposite: when os.dup() (or the C
function dup() is used), the FD must not be inherited. For example,
os.listdir(fd) duplicates fd: a child process must not inherit the
duplicated file descriptor, it may lead a security vulnerability (ex:
parent process running as root, whereas the child process is running
as a different used and not allowed to open the directory).


> For example, a lot of code uses the guarantee that dup()/open()...
> returns the lowest numbered file descriptor available, so code like
> this:
>
> r, w = os.pipe()
> if os.fork() == 0:
> # child
> os.close(r)
> os.close(1)
> dup(w)
>
> *will break*
>
> And that's a lot of code (e.g. that's what _posixsubprocess.c uses,
> but since it's implemented in C it's wouldn't be affected).

Yes, it will break. As I wrote in my previous email, we cannot solve
all issues listed in the Rationale section of the PEP without breaking
applications (or at least breaking backward compatibility). It is even
explicitly said in the "Backward Compatibility" section:
http://www.python.org/dev/peps/pep-0446/#backward-compatibility
"This PEP break applications relying on inheritance of file descriptors."

But I also added a hint to fix applications:
"Developers are encouraged to reuse the high-level Python module
subprocess which handles the inheritance of file descriptors in a
portable way."

If you don't want to use subprocess, yes, you will have to add
"os.set_inheritable(w)" in the child process.

About your example: I'm not sure that it is reliable/portable. I saw
daemon libraries closing *all* file descriptors and then expecting new
file descriptors to become 0, 1 and 2. Your example is different
because w is still open. On Windows, I have seen cases with only fd 0,
1, 2 open, and the next open() call gives the fd 10 or 13...

I'm optimistic and I expect that most Python applications and
libraries already use the subprocess module. The subprocess module
closes all file descriptors (except 0, 1, 2) since Python 3.2.
Developers relying on the FD inheritance and using the subprocess with
Python 3.2 or later already had to use the pass_fds parameter.


> Furthermore, many people use Python for system programming, and this
> change would be highly surprising.

Yes, it is a voluntary design choice (of the PEP). It is also said
explicitly in the "Backward Compatibility" section:
"Python does no more conform to POSIX, since file descriptors are now
made non-inheritable by default. Python was not designed to conform to
POSIX, but was designed to develop portable applications."

> So no matter what the final decision on this PEP is, it must be kept in mind.

The purpose of the PEP is to explain correctly the context and the
consequences of the changes, so Guido van Rossum can uses the PEP to
make its final decision.


>> The programming languages Go, Perl and Ruby make newly created file 
>> descriptors non-inheritable by default: since Go 1.0 (2009), Perl 1.0 (1987) 
>> and Ruby 2.0 (2013).
>
> OK, but do they expose OS file descriptors?

Yes:

- Perl: fileno() function
- Ruby: fileno() method of a file object
- Go: fd() method of a file object


> Last time, I said that to me, the FD inheritance issue is solved on
> POSIX by the subprocess module which passes close_fds. In my own code,
> I use subprocess, which is the "official", portable and safe way to
> create child processes in Python. Someone using fork() + exec() should
> know what he's doing, and be able to deal with the consequences: I'm
> not only talking about FD inheritance, but also about
> async-signal/multi-threaded safety ;-)

The subprocess module has still a (minor?) race condition in the child
process. Another C thread can create a new file descriptor after the
subprocess module closed all file descriptors and before exec(). I
hope that it is very unlikely, but it can happen. It's also explained
in the PEP (see "Closing All Open File Descriptors"). I suppose that
the race condition explains why Linux still has no closefrom() or
nextfd() system calls. IMO the kernel is the bes

Re: [Python-Dev] [Python-checkins] cpython (merge 3.3 -> default): merge for issue #18755

2013-08-23 Thread Victor Stinner
2013/8/23 brett.cannon :
> http://hg.python.org/cpython/rev/7d30ecf5c916
> changeset:   85339:7d30ecf5c916
> parent:  85336:391f36ef461a
> parent:  85337:ddd610cb65ef
> user:Brett Cannon 
> date:Fri Aug 23 11:52:19 2013 -0400
> summary:
>   merge for issue #18755
>
> files:
>   Lib/imp.py   |  9 +++--
>   Lib/test/test_imp.py |  9 +
>   2 files changed, 16 insertions(+), 2 deletions(-)

You didn't merge the log entry in Misc/NEWS. Is it voluntary?
Mercurial asked me to add the log entry when I merged 26c049dc1a4a
into default => 01f33959ddf6.

changeset:   85338:b107f7a8730d
branch:  3.3
user:Brett Cannon 
date:Fri Aug 23 11:47:26 2013 -0400
files:   Misc/NEWS
description:
NEW entry for issue #18755


diff -r ddd610cb65ef -r b107f7a8730d Misc/NEWS
--- a/Misc/NEWS Fri Aug 23 11:45:57 2013 -0400
+++ b/Misc/NEWS Fri Aug 23 11:47:26 2013 -0400
@@ -66,6 +66,9 @@ Core and Builtins
 Library
 ---

+- Issue #18755: Fixed the loader used in imp to allow get_data() to be called
+  multiple times.
+
 - Issue #16809: Fixed some tkinter incompabilities with Tcl/Tk 8.6.

 - Issue #16809: Tkinter's splitlist() and split() methods now accept Tcl_Obj

Victor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Benchmarks now run unmodified under Python 3

2013-08-23 Thread Brett Cannon
I just committed a change to the benchmarks suite so that there is no
longer a translation step to allow running under Python 3. Should make it
much easier to just keep a checkout lying around to use for both Python 2
and 3 benchmarking.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython (merge 3.3 -> default): merge for issue #18755

2013-08-23 Thread Brett Cannon
I explicitly did a second merge, reverted Misc/NEWS and committed, so that
shouldn't have come up. It's not a big deal having the entry, but I didn't
worry about it either.


On Fri, Aug 23, 2013 at 1:26 PM, Victor Stinner wrote:

> 2013/8/23 brett.cannon :
> > http://hg.python.org/cpython/rev/7d30ecf5c916
> > changeset:   85339:7d30ecf5c916
> > parent:  85336:391f36ef461a
> > parent:  85337:ddd610cb65ef
> > user:Brett Cannon 
> > date:Fri Aug 23 11:52:19 2013 -0400
> > summary:
> >   merge for issue #18755
> >
> > files:
> >   Lib/imp.py   |  9 +++--
> >   Lib/test/test_imp.py |  9 +
> >   2 files changed, 16 insertions(+), 2 deletions(-)
>
> You didn't merge the log entry in Misc/NEWS. Is it voluntary?
> Mercurial asked me to add the log entry when I merged 26c049dc1a4a
> into default => 01f33959ddf6.
>
> changeset:   85338:b107f7a8730d
> branch:  3.3
> user:Brett Cannon 
> date:Fri Aug 23 11:47:26 2013 -0400
> files:   Misc/NEWS
> description:
> NEW entry for issue #18755
>
>
> diff -r ddd610cb65ef -r b107f7a8730d Misc/NEWS
> --- a/Misc/NEWS Fri Aug 23 11:45:57 2013 -0400
> +++ b/Misc/NEWS Fri Aug 23 11:47:26 2013 -0400
> @@ -66,6 +66,9 @@ Core and Builtins
>  Library
>  ---
>
> +- Issue #18755: Fixed the loader used in imp to allow get_data() to be
> called
> +  multiple times.
> +
>  - Issue #16809: Fixed some tkinter incompabilities with Tcl/Tk 8.6.
>
>  - Issue #16809: Tkinter's splitlist() and split() methods now accept
> Tcl_Obj
>
> Victor
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 446 (make FD non inheritable) ready for a final review

2013-08-23 Thread Charles-François Natali
> About your example: I'm not sure that it is reliable/portable. I sa
> daemon libraries closing *all* file descriptors and then expecting new
> file descriptors to become 0, 1 and 2. Your example is different
> because w is still open. On Windows, I have seen cases with only fd 0,
> 1, 2 open, and the next open() call gives the fd 10 or 13...

Well, my example uses fork(), so obviously doesn't apply to Windows.
It's perfectly safe on Unix.

> I'm optimistic and I expect that most Python applications and
> libraries already use the subprocess module. The subprocess module
> closes all file descriptors (except 0, 1, 2) since Python 3.2.
> Developers relying on the FD inheritance and using the subprocess with
> Python 3.2 or later already had to use the pass_fds parameter.

As long as the PEP makes it clear that this breaks backward
compatibility, that's fine. IMO the risk of breakage outweights the
modicum benefit.

> The subprocess module has still a (minor?) race condition in the child
> process. Another C thread can create a new file descriptor after the
> subprocess module closed all file descriptors and before exec(). I
> hope that it is very unlikely, but it can happen.

No it can't, because after fork(), there's only one thread.
It's perfectly safe.

cf
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] please back out changeset f903cf864191 before alpha-2

2013-08-23 Thread Stefan Behnel
Hi,

ticket 17741 has introduced a new feature in the xml.etree.ElementTree
module that was added without any major review.

http://bugs.python.org/issue17741

http://hg.python.org/cpython/rev/f903cf864191

I only recently learned about this addition and after taking a couple of
closer looks, I found that it represents a rather seriously degradation of
the ET module API. Since I am not a core developer, it is rather easy for
the original committer to close the ticket and to sit and wait for the next
releases to come in order to get the implementation out. So I'm sorry for
having to ask publicly on this list for the code to be removed, before it
does any harm in the wild.

Let me explain why the addition is such a disaster. I'm personally involved
in this because I will eventually have to implement features that occur in
ElementTree also in the external lxml.etree package. And this is not an API
that I can implement with a clear conscience.

There are two parts to parsing in the ET API. The first is the parser, and
the second is the target object (similar to a SAX handler). The parser has
an incremental parsing API consisting of the functions .feed() and
.close(). When it receives data through the .feed() method, it parses it
and passes events on to the target. The target is commonly a TreeBuilder
that builds an in-memory tree, but is not limited to that. Calling the
.close() method tells the parser that the parsing is done and that it
should finish it up.

The class that was now added is called "IncrementalParser". It has two
methods for passing data in: "data_received()" and "eof_received()". So the
first thing to note is that this addition is only a copy of the existing
API and functionality, but under different names. It is hard to understand
for me how anyone could consider this a consistent design.

Then, the purpose of this addition was to provide a way to collect parse
events. That is the obvious role of the target object. In the new
implementation, the target object is being instantiated, but not actually
meant to collect the events. Instead, it's the parser collecting the
events, based on what the target object returns (which it doesn't currently
have to do at all). This is totally backwards. Instead, it should be up to
the target object to decide which events to collect, how to process them
and how to present them to the user. This is clearly how the API was
originally designed.

Also, the IncrementalParser doesn't directly collect the events itself but
gets them through a sort of backdoor in the underlying parser. That parser
object is actually being passed into the IncrementalParser as a parameter,
which means that user provided parser objects will also have to implement
that backdoor now, even though they may not actually be able to provide
that functionality.

My proposal for fixing these obvious design problems is to let each part of
the parsing chain do what it's there for. Use the existing XMLParser (or an
HTMLParser, as in lxml.etree) to feed in data incrementally, and let the
target object process and collect the events. So, instead of replacing the
parser interface with a new one, there should be a dedicated target object
(or maybe just a wrapper for a TreeBuilder) that collects the parse events
in this specific way.

Since the time is running in favour of the already added implementation,
I'm asking to back it out for the time being. I specifically do not want to
rush in a replacement. Once there is an implementation that matches the
established API, I'm all in favour of adding it, because the feature itself
is a good idea. But keeping a wrong design in, just because "it's there",
even before anyone has actually started using it, is just asking for future
deprecation hassle. It's not too late for removal now, but it will be in a
couple of weeks if it is not done now.

Stefan

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please back out changeset f903cf864191 before alpha-2

2013-08-23 Thread Antoine Pitrou
On Sat, 24 Aug 2013 00:57:48 +0200
Stefan Behnel  wrote:
> Hi,
> 
> ticket 17741 has introduced a new feature in the xml.etree.ElementTree
> module that was added without any major review.
> 
> http://bugs.python.org/issue17741

As I've already indicated on the tracker, I'm completely saturated
with Stefan's qualms about a minor API addition and I'm not willing to
process anymore of them. Hence I won't respond to the bulk of his
e-mail.

But I still want to clarify that claiming that the feature was "added
without any major review" is outrageous and manipulative. Perhaps
Stefan thinks that an ElementTree code review not by him (but, for
example, by Eli, who currently maintains ElementTree) is not "major".
Well, good for him. His best chance to influence the development
process, though, is to contribute more, not harass active developers.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please back out changeset f903cf864191 before alpha-2

2013-08-23 Thread Stefan Behnel
Antoine Pitrou, 24.08.2013 01:26:
> On Sat, 24 Aug 2013 00:57:48 +0200
> Stefan Behnel wrote:
>> ticket 17741 has introduced a new feature in the xml.etree.ElementTree
>> module that was added without any major review.
>>
>> http://bugs.python.org/issue17741
> 
> As I've already indicated on the tracker, I'm completely saturated
> with Stefan's qualms about a minor API addition and I'm not willing to
> process anymore of them. Hence I won't respond to the bulk of his
> e-mail.
> 
> But I still want to clarify that claiming that the feature was "added
> without any major review" is outrageous and manipulative. Perhaps
> Stefan thinks that an ElementTree code review not by him (but, for
> example, by Eli, who currently maintains ElementTree) is not "major".

The reason why I'm saying this is that the way the change came in is rather
- unorthodox. As Antoine noted in the ticket, he proposed the change on the
tulip mailing list. That is a completely wrong place to discuss a new XML
API, as can be seen from the replies.

http://thread.gmane.org/gmane.comp.python.tulip/171

Specifically, no-one noticed the major overlap with the existing API and
functionality at that point, nor the contradictions between the existing
API and the new one.

In the ticket, Eli stated that he didn't have time for the review ATM, and
then, two days later, commented that the patch looks good. To me (and I'm
really only interpreting here), this indicates that the review was mostly
at the patch level. Note that he didn't comment on the API overlap either,
so my guess is that he just didn't notice it. In my experience, reviewing
design and thinking about alternatives takes more time than that,
especially when you're "swamped", as he put it. I mean, it took *me* almost
a day to dig into the implications and into the patch (as can be seen by my
incremental comments), and I have the background of having written a
complete implementation of that library myself.

So, to put it more nicely, I think this feature was added without the
amount of review that it needs, and now that I've given it that review, I'm
asking for removal of the feature and a proper redesign that fits into the
existing library.

Stefan


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Redesigning extension modules

2013-08-23 Thread Nick Coghlan
On 23 August 2013 19:18, Antoine Pitrou  wrote:
>
> Hi,
>
> Le Fri, 23 Aug 2013 10:50:18 +0200,
> Stefan Behnel  a écrit :
>>
>> Here's an initial attempt at a PEP for it. It is based on the
>> (unfinished) ModuleSpec PEP, which is being discussed on the
>> import-sig mailing list.
>
> Thanks for trying this. I think the PEP should contain working example
> code for module initialization (and creation), to help gauge the
> complexity for module writers.

I've been thinking a lot about this as part of reviewing PEP 451 (the
ModuleSpec PEP that Stefan's pre-PEP mentions). The relevant feedback
on import-sig hasn't made it into PEP 451 yet (Eric is still
considering the suggestion), but what I'm proposing is a new
relatively *stateless* API for loaders, which consists of two methods:

  def create_module(self, spec):
  """Given a ModuleSpec, return the object to be added to sys.modules"""

  def exec_module(self, mod):
  """Execute the given module, updating it for the current system state"""

create_module would be optional - if not defined, the import system
would automatically create a normal module object. If it is defined,
the import system would call it and then take care of setting all the
standard attributes (__name__, __spec__, etc) on the result if the
loader hadn't already set them.

exec_module would be required, and is the part that actually fully
initialises the module.

"imp.reload" would then translate to calling exec_module on an
existing module without recreating it.

For loaders that provide the new API, the global import state
manipulation would all be handled by the import system. Such loaders
would still be free to provide load_module() anyway for backwards
compatibility with earlier Python versions, since the new API would
take precedence.

In this context, the API I was considering for extension modules was
slightly different from that in Stefan's proto-PEP (although it was
based on some of Stefan's suggestions in the earlier threads).
Specifically, I'm thinking of an API like this that does a better job
of supporting reloading:

PyObject * PyImportCreate_(PyObject *spec); /* Optional */
int PyImportExec_(PyObject *mod);

Implementing PyImportCreate would only be needed if you had C level
state to store - if you're happy storing everything in the module
globals, then you would only need to implement PyImportExec.

My current plan is to create an experimental prototype of this
approach this weekend. That will include stdlib test cases, so it will
also show how it looks from the extension developer's point of view.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] please back out changeset f903cf864191 before alpha-2

2013-08-23 Thread Nick Coghlan
On 24 August 2013 15:32, Stefan Behnel  wrote:
> So, to put it more nicely, I think this feature was added without the
> amount of review that it needs, and now that I've given it that review, I'm
> asking for removal of the feature and a proper redesign that fits into the
> existing library.

FWIW, it seems to me that this is something that could live in *tulip*
as an adapter between the tulip data processing APIs and the existing
ElementTree incremental parsing APIs, without needing to be added
directly to xml.etree at all.

It certainly seems premature to be adding tulip-inspired APIs to other
parts of the standard library, when tulip itself hasn't been deemed
ready for inclusion.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com