[Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)
Hello,
Recently as part of the effort of untangling the tests of ElementTree and
general code improvements (e.g. http://bugs.python.org/issue15651), I ran
into something strange about PEP 3121-compliant modules. I'll demonstrate
with csv, just as an example.
PEP 3121 mandates this function to look up the module-specific state in the
current sub-interpreter:
PyObject* PyState_FindModule(struct PyModuleDef*);
This appears to make the following assumption: a given sub-interpreter only
imports any C extension *once*. If it happens more than once, the
assumption breaks in troubling ways. In normal code, it should never happen
more than once because of the caching in sys.modules; However, many of our
tests monkey-patch sys.modules (mainly by calling
test.support.import_fresh_module) and hell breaks use. Here's a simple
example:
import sys
csv = __import__('csv')
csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE)
print(csv.list_dialects())
# ==> ['unixpwd', 'excel-tab', 'excel', 'unix']
del sys.modules['csv'] # FUN
del sys.modules['_csv']
some_other_csv = __import__('csv')
print(csv.list_dialects())
# ==> ['excel-tab', 'excel', 'unix']
Note how doing some sys.modules acrobatics and re-importing suddenly
changes the internal state of a previously imported module. This happens
because:
1. The first import of 'csv' (which then imports `_csv) creates
module-specific state on the heap and associates it with the current
sub-interpreter. The list of dialects, amongst other things, is in that
state.
2. The 'del's wipe 'csv' and '_csv' from the cache.
3. The second import of 'csv' also creates/initializes a new '_csv' module
because it's not in sys.modules. This *replaces* the per-sub-interpreter
cached version of the module's state with the clean state of a new module
So essentially, while PEP 3121 moves state from C-file globals to
per-module state, the state is still global, and this fact can be exposed
from pure Python code.
The above is a toy example. Here's a more serious case I ran into with ET,
but once again is demonstrated with 'csv' for simplicity:
import io
from test.support import import_fresh_module
import csv
csv_other = import_fresh_module('csv', fresh=['_csv', 'csv'])
f = io.StringIO('foo\x00,bar\nbaz,42')
reader = csv.reader(f)
try:
for row in reader:
print(row)
except csv.Error as e:
print('Caught csv.error', e)
except Exception as e:
print('Caught Exception', e)
In the above, the reader throws 'csv.Error' (because of the NULL byte) but
the exception clause does not catch it where expected, because it's a
different exception class called `csv.Error`, due to the same problem
demonstrated above (if the seemingly innocent import_fresh_module is
removed, all is good).
Any ideas/suggestion regarding this are welcome. This is quite an esoteric
problem, but I believe it's serious. PEP 3121 is not used much (yet), but
recently there was talk again about committing some of the patches created
for converting Modules/*.c extensions to it during a GSoC project. I
believe that we should understand the implications first. There can be a
number of solutions; including modifying the PEP 3121 implementation
machinery to really create/keep state "per module" and not just "per kind
of module in a single sub-interpreter".
Eli
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Buildbot failure puzzle
At least the following 3.4 buildbots have failed today with an error I
do not understand: AMD64 FreeBSD, PPC64, x86Ubuntu, x86 WinServer 2003.
Except for the Windows BB, it was the only failure and hence the only
reason to not be green.
ERROR: test_xmlcharnamereplace (test.test_codeccallbacks.CodecCallbackTest)
--
Traceback (most recent call last):
File
"/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/test/test_codeccallbacks.py",
line 112, in test_xmlcharnamereplace
self.assertEqual(sin.encode("ascii", "test.xmlcharnamereplace"), sout)
File
"/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/test/test_codeccallbacks.py",
line 102, in xmlcharnamereplace
l.append("&%s;" % html.entities.codepoint2name[ord(c)])
AttributeError: 'module' object has no attribute 'entities'
test_codeccallbacks.py: lines from 2008-05-17
line 002: import html.entities
...
line 102:l.append("&%s;" % html.entities.codepoint2name[ord(c)])
I checked with an editor and these are the only two appearances of
'html' (so it was not rebound to anything else) and the spellings in the
file are the same. Indeed, the same code has worked on at least some of
the same machines.
--
Terry Jan Reedy
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Buildbot failure puzzle
On Sat, 10 Aug 2013 20:25:04 -0400 Terry Reedy wrote: > At least the following 3.4 buildbots have failed today with an error I > do not understand: AMD64 FreeBSD, PPC64, x86Ubuntu, x86 WinServer 2003. http://bugs.python.org/issue18706 ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)
In a similar vein, Antoine recently noted that the fact the per-module state isn't a real PyObject creates a variety of interesting lifecycle management challenges. I'm not seeing an easy solution, either, except to automatically skip reinitialization when the module has already been imported. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Strange artifacts with PEP 3121 and monkey-patching sys.modules (in csv, ElementTree and others)
n Sat, Aug 10, 2013 at 5:47 PM, Nick Coghlan wrote: > In a similar vein, Antoine recently noted that the fact the per-module > state isn't a real PyObject creates a variety of interesting lifecycle > management challenges. > > I'm not seeing an easy solution, either, except to automatically skip > reinitialization when the module has already been imported. > This solution has problems. For example, in the case of ET it would preclude testing what happens when pyexpat is disabled (remember we were discussing this...). This is because there would be no real way to create new instances of such modules (they would all cache themselves in the init function - similarly to what ET now does in trunk, because otherwise some of its global-dependent crazy tests fail). A more radical solution would be to *really* have multiple instances of state per sub-interpreter. Well, they already exist -- it's PyState_FindModule which is the problematic one because it only remembers the last one. But I see that it's only being used by extension modules themselves, to efficiently find modules they belong to. It feels a bit like a hack that was made to avoid rewriting lots of code, because in general a module's objects *can* know which module instance they came from. E.g. it can be saved as a private field in classes exported by the module. So a more radical approach would be: PyState_FindModule can be deprecated, but still exist and be documented to return the state the *last* module created in this sub-interpreter. stdlib extension modules that actually use this mechanism can be rewritten to just remember the module for real, and not rely on PyState_FindModule to fetch it from a global cache. I don't think this would be hard, and it would make the good intention of PEP 3121 more real - actual intependent state per module instance. Eli ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Green buildbot failure.
This run recorded here shows a green test (it appears to have timed out) http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/7017 but the corresponding log for this Windows bot http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/7017/steps/test/logs/stdio has the expected os.chown failure. Are such green failures intended? -- Terry Jan Reedy ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
