Re: [Python-Dev] please back out changeset f903cf864191 before alpha-2

2013-08-25 Thread Stephen J. Turnbull
Eli Bendersky writes:
 > On Sat, Aug 24, 2013 at 5:55 PM, Stephen J. Turnbull  
 > wrote:

 >> FWIW, as somebody who can recall using ET exactly once,
 >> IncrementalParser is what I used.

 > Just to be on the safe side, I want to make sure that you indeed
 > mean IncrementalParser, which was committed 4 months ago into the
 > Mercurial default branch (3.4) and has only seen an alpha release?
 > Eli

Oops, and thank you for your courtesy.

No, actually looking at the code this time, I meant
xml.sax.xmlreader.IncrementalParser, which has the same API as the new
etree.ElementTree.IncrementalParser.  No wonder it seems familiar.

As for the suggestion, AIUI, you proposed keeping the current layering
of iterparse on top of IncrementalParser, and then removing
Incrementalparser from the documentation.

My suggestion is to rename the current "IncrementalParser" class, and
then use the IncrementalParser interface for what is currently named
"iterparse".  Assuming that, as Stefan claims, data_received == feed,
and so on.

Steve


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Redesigning extension modules

2013-08-25 Thread Stefan Behnel
Nick Coghlan, 24.08.2013 23:43:
> On 25 Aug 2013 01:44, "Stefan Behnel" wrote:
>> Nick Coghlan, 24.08.2013 16:22:
>>> The new _PyImport_CreateAndExecExtensionModule function does the heavy
>>> lifting:
>>>
>>> https://bitbucket.org/ncoghlan/cpython_sandbox/src/081f8f7e3ee27dc309463b48e6c67cf4880fca12/Python/importdl.c?at=new_extension_imports#cl-65
>>>
>>> One key point to note is that it *doesn't* call
>>> _PyImport_FixupExtensionObject, which is the API that handles all the
>>> PEP 3121 per-module state stuff. Instead, the idea will be for modules
>>> that don't need additional C level state to just implement
>>> PyImportExec_NAME, while those that *do* need C level state implement
>>> PyImportCreate_NAME and return a custom object (which may or may not
>>> be a module subtype).
>>
>> Is it really a common case for an extension module not to need any C level
>> state at all? I mean, this might work for very simple accelerator modules
>> with only a few stand-alone functions. But anything non-trivial will
>> almost
>> certainly have some kind of global state, cache, external library, etc.,
>> and that state is best stored at the C level for safety reasons.
> 
> I'd prefer to encourage people to put that state on an exported *type*
> rather than directly in the module global state. So while I agree we need
> to *support* C level module globals, I'd prefer to provide a simpler
> alternative that avoids them.

But that has an impact on the API then. Why do you want the users of an
extension module to go through a separate object (even if it's just a
singleton, for example) instead of going through functions at the module
level? We don't currently encourage or propose this design for Python
modules either. Quite the contrary, it's extremely common for Python
modules to provide most of their functionality at the function level. And
IMHO that's a good thing.

Note that even global functions usually hold state, be it in the form of
globally imported modules, global caches, constants, ...


> We also need the create/exec split to properly support reloading. Reload
> *must* reinitialize the object already in sys.modules instead of inserting
> a different object or it completely misses the point of reloading modules
> over deleting and reimporting them (i.e. implicitly affecting the
> references from other modules that imported the original object).

Interesting. I never thought of it that way.

I'm not sure this can be done in general. What if the module has threads
running that access the global state? In that case, reinitialising the
module object itself would almost certainly lead to a crash.

And what if you do "from extmodule import some_function" in a Python
module? Then reloading couldn't replace that reference, just as for normal
Python modules. Meaning that you'd still have to keep both modules properly
alive in order to prevent crashes due to lost global state of the imported
function.

The difference to Python modules here is that in Python code, you'll get
some kind of exception if state is lost during a reload. In C code, you'll
most likely get a crash.

How would you even make sure global state is properly cleaned up? Would you
call tp_clear() on the module object before re-running the init code? Or
how else would you enable the init code to do the right thing during both
the first run (where global state is uninitialised) and subsequent runs
(where global state may hold valid state and owned Python references)?

Even tp_clear() may not be enough, because it's only meant to clean up
Python references, not C-level state. Basically, for reloading to be
correct without changing the object reference, it would have to go all the
way through tp_dealloc(), catch the object at the very end, right before it
gets freed, and then re-initialise it.

This sounds like we need some kind of indirection (as you mentioned above),
but without the API impact that a separate type implies. Simply making
modules an arbitrary extension type, as I proposed, cannot solve this.

(Actually, my intuition tells me that if it can't really be made to work
100% for Python modules, e.g. due to the from-import case, why bother with
it for extension types?)


>>> Such modules can still support reloading (e.g.
>>> to pick up reloaded or removed module dependencies) by providing
>>> PyImportExec_NAME as well.
>>>
>>> (in a PEP 451 world, this would likely be split up as two separate
>>> functions, one for create, one for exec)
>>
>> Can't we just always require extension modules to implement their own
>> type?
>> Sure, it's a lot of boiler plate code, but that could be handled by a
>> simple C code generator or maybe even a copy&paste example in the docs. I
>> would like to avoid making it too easy for users in the future to get
>> anything wrong with reloading or sub-interpreters. Most people won't test
>> these things for their own code and the harder it is to make them not
>> work,
>> the more likely it is that a given set of dependen

Re: [Python-Dev] Pre-PEP: Redesigning extension modules

2013-08-25 Thread Stefan Behnel
Hi,

thanks for bringing this up. It clearly shows that there is more to this
problem than I initially thought.

Let me just add one idea that your post gave me.

PJ Eby, 25.08.2013 06:12:
> My "Importing" package offers lazy imports by creating module objects
> in sys.modules that are a subtype of ModuleType, and use a
> __getattribute__ hook so that trying to use them fires off a reload()
> of the module.

I wonder if this wouldn't be an approach to fix the reloading problem in
general. What if extension module loading, at least with the new scheme,
didn't return the module object itself and put it into sys.modules but
created a wrapper that redirects its __getattr__ and __setattr__ to the
actual module object? That would have a tiny performance impact on
attribute access, but I'd expect that to be negligible given that the usual
reason for the extension module to exist is that it does non-trivial stuff
in whatever its API provides. Reloading could then really create a
completely new module object and replace the reference inside of the wrapper.

That way, code that currently uses "from extmodule import xyz" would
continue to see the original version of the module as of the time of its
import, and code that just did "import extmodule" and then used attribute
access at need would always see the current content of the module as it was
last loaded. I think that, together with keeping module global state in the
module object itself, would nicely fix both cases.

Stefan


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Redesigning extension modules

2013-08-25 Thread Terry Reedy

On 8/25/2013 7:54 AM, Stefan Behnel wrote:


And what if you do "from extmodule import some_function" in a Python
module? Then reloading couldn't replace that reference, just as for normal
Python modules. Meaning that you'd still have to keep both modules properly
alive in order to prevent crashes due to lost global state of the imported
function.


People who want to reload modules sometimes know before they start that 
they will want to. If so, they can just 'import' instead of 'from 
import' and access everything through the module. There is still the 
problem of persistent class instances directly accessing classes for 
attributes, but maybe that can be directed through the class also.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com