[Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-10 Thread Eli Bendersky
Hello,

Recently as part of the effort of untangling the tests of ElementTree and
general code improvements (e.g. http://bugs.python.org/issue15651), I ran
into something strange about PEP 3121-compliant modules. I'll demonstrate
with csv, just as an example.

PEP 3121 mandates this function to look up the module-specific state in the
current sub-interpreter:

  PyObject* PyState_FindModule(struct PyModuleDef*);

This appears to make the following assumption: a given sub-interpreter only
imports any C extension *once*. If it happens more than once, the
assumption breaks in troubling ways. In normal code, it should never happen
more than once because of the caching in sys.modules; However, many of our
tests monkey-patch sys.modules (mainly by calling
test.support.import_fresh_module) and hell breaks use. Here's a simple
example:


import sys

csv = __import__('csv')
csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)

print(csv.list_dialects())
# ==> ['unixpwd', 'excel-tab', 'excel', 'unix']

del sys.modules['csv']  # FUN
del sys.modules['_csv']
some_other_csv = __import__('csv')

print(csv.list_dialects())
# ==> ['excel-tab', 'excel', 'unix']


Note how doing some sys.modules acrobatics and re-importing suddenly
changes the internal state of a previously imported module. This happens
because:

1. The first import of 'csv' (which then imports `_csv) creates
module-specific state on the heap and associates it with the current
sub-interpreter. The list of dialects, amongst other things, is in that
state.
2. The 'del's wipe 'csv' and '_csv' from the cache.
3. The second import of 'csv' also creates/initializes a new '_csv' module
because it's not in sys.modules. This *replaces* the per-sub-interpreter
cached version of the module's state with the clean state of a new module

So essentially, while PEP 3121 moves state from C-file globals to
per-module state, the state is still global, and this fact can be exposed
from pure Python code.

The above is a toy example. Here's a more serious case I ran into with ET,
but once again is demonstrated with 'csv' for simplicity:



import io
from test.support import import_fresh_module

import csv

csv_other = import_fresh_module('csv', fresh=['_csv', 'csv'])

f = io.StringIO('foo\x00,bar\nbaz,42')
reader = csv.reader(f)

try:
for row in reader:
print(row)
except csv.Error as e:
print('Caught csv.error', e)
except Exception as e:
print('Caught Exception', e)


In the above, the reader throws 'csv.Error' (because of the NULL byte) but
the exception clause does not catch it where expected, because it's a
different exception class called `csv.Error`, due to the same problem
demonstrated above (if the seemingly innocent import_fresh_module is
removed, all is good).

Any ideas/suggestion regarding this are welcome. This is quite an esoteric
problem, but I believe it's serious. PEP 3121 is not used much (yet), but
recently there was talk again about committing some of the patches created
for converting Modules/*.c extensions to it during a GSoC project. I
believe that we should understand the implications first. There can be a
number of solutions; including modifying the PEP 3121 implementation
machinery to really create/keep state "per module" and not just "per kind
of module in a single sub-interpreter".

Eli
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Buildbot failure puzzle

2013-08-10 Thread Terry Reedy
At least the following 3.4 buildbots have failed today with an error I 
do not understand: AMD64 FreeBSD, PPC64, x86Ubuntu, x86 WinServer 2003.
Except for the Windows BB, it was the only failure and hence the only 
reason to not be green.


ERROR: test_xmlcharnamereplace (test.test_codeccallbacks.CodecCallbackTest)
--
Traceback (most recent call last):
  File 
"/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/test/test_codeccallbacks.py", 
line 112, in test_xmlcharnamereplace

self.assertEqual(sin.encode("ascii", "test.xmlcharnamereplace"), sout)
  File 
"/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/test/test_codeccallbacks.py", 
line 102, in xmlcharnamereplace

l.append("&%s;" % html.entities.codepoint2name[ord(c)])
AttributeError: 'module' object has no attribute 'entities'

test_codeccallbacks.py: lines from 2008-05-17
line 002: import html.entities
...
line 102:l.append("&%s;" % html.entities.codepoint2name[ord(c)])

I checked with an editor and these are the only two appearances of 
'html' (so it was not rebound to anything else) and the spellings in the 
file are the same. Indeed, the same code has worked on at least some of 
the same machines.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot failure puzzle

2013-08-10 Thread Antoine Pitrou
On Sat, 10 Aug 2013 20:25:04 -0400
Terry Reedy  wrote:
> At least the following 3.4 buildbots have failed today with an error I 
> do not understand: AMD64 FreeBSD, PPC64, x86Ubuntu, x86 WinServer 2003.

http://bugs.python.org/issue18706



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-10 Thread Nick Coghlan
In a similar vein, Antoine recently noted that the fact the per-module
state isn't a real PyObject creates a variety of interesting lifecycle
management challenges.

I'm not seeing an easy solution, either, except to automatically skip
reinitialization when the module has already been imported.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)

2013-08-10 Thread Eli Bendersky
n Sat, Aug 10, 2013 at 5:47 PM, Nick Coghlan  wrote:

> In a similar vein, Antoine recently noted that the fact the per-module
> state isn't a real PyObject creates a variety of interesting lifecycle
> management challenges.
>
> I'm not seeing an easy solution, either, except to automatically skip
> reinitialization when the module has already been imported.
>
This solution has problems. For example, in the case of ET it would
preclude testing what happens when pyexpat is disabled (remember we were
discussing this...). This is because there would be no real way to create
new instances of such modules (they would all cache themselves in the init
function - similarly to what ET now does in trunk, because otherwise some
of its global-dependent crazy tests fail).

A more radical solution would be to *really* have multiple instances of
state per sub-interpreter. Well, they already exist -- it's
PyState_FindModule which is the problematic one because it only remembers
the last one. But I see that it's only being used by extension modules
themselves, to efficiently find modules they belong to. It feels a bit like
a hack that was made to avoid rewriting lots of code, because in general a
module's objects *can* know which module instance they came from. E.g. it
can be saved as a private field in classes exported by the module.

So a more radical approach would be:

PyState_FindModule can be deprecated, but still exist and be documented to
return the state the *last* module created in this sub-interpreter. stdlib
extension modules that actually use this mechanism can be rewritten to just
remember the module for real, and not rely on PyState_FindModule to fetch
it from a global cache. I don't think this would be hard, and it would make
the good intention of PEP 3121 more real - actual intependent state per
module instance.

Eli
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Green buildbot failure.

2013-08-10 Thread Terry Reedy


This run recorded here shows a green test (it appears to have timed out)
http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/7017
but the corresponding log for this Windows bot
http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/7017/steps/test/logs/stdio
has the expected os.chown failure.

Are such green failures intended?

--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com