[ python-Bugs-595601 ] file (& socket) I/O are not thread safe
Bugs item #595601, was opened at 2002-08-15 11:34 Message generated for change (Comment added) made by aegis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=595601&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Interpreter Core Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jeremy Hylton (jhylton) Assigned to: Jeremy Hylton (jhylton) Summary: file (& socket) I/O are not thread safe Initial Comment: We recently found an assertion failure in the universal newline support when running a multithreaded program where two threads used the same Python file object. The assert(stream != NULL) test in Py_UniversalNewlineFread() fails once in a blue moon, where stream is the stdio FILE * that the fileobject wraps. Further analysis suggests that there is a race condition between checking FILE * and using FILE * that exists in at least Python 2.1 and up. I'll actually describe the problem as it exists in Python 2.2, because it is simpler to avoid the universal newline code. That code isn't the source of the problem, although it's assert() uncovers it in a clear way. In file_read() (rev 2.141.6.5), the first thing it does is check if f_fp (the FILE *) is NULL. If so it raises an IOError -- operation on closed file object. Later, file_read() enters a for loop that calls fread() until enough bytes have been read. for (;;) { Py_BEGIN_ALLOW_THREADS errno = 0; chunksize = fread(BUF(v) + bytesread, 1, buffersize - bytesread, f->f_fp); Py_END_ALLOW_THREADS if (chunksize == 0) { if (!ferror(f->f_fp)) break; PyErr_SetFromErrno(PyExc_IOError); clearerr(f->f_fp); Py_DECREF(v); return NULL; } The problem is that fread() is called after the global interpreter lock is released. Since the lock is released, another Python thread could run and modify the file object, changing the value of f->f_fp. Under the current interpreter lock scheme, it isn't safe to use f->f_fp without holding the interpreter lock. The current file_read() code can fail in a variety of ways. It's possible for a second thread to close the file, which will set f->f_fp to NULL. Who knows what fread() will do when NULL is passed. The universal newline code is squirrels the FILE * in a local variable, which is worse. If it happens that another thread closes the file, at best the local points to a closed FILE *. But that memory could get recycled and then there's no way to know what it points to. socket I/O has a similar problem with unsafe sharing of the file descriptor. However, this problem seems less severe in general, because we'd just be passing a bogus file descriptor to a system call. We don't have to worry about whether stdio will dump core when passed a bogus pointer. There is a chance the a socket will be closed and its file descriptor used for a different socket. So a call to recv() with one socket ends up using a different socket. That will be a nightmare to debug, but it won't cause a segfault. (And, in general, files and sockets shouldn't be shared between application threads unless the application is going to make sure its safe.) The solution to this problem is to use a per-file-object lock to guard access to f->f_fp. No thread should read or right f->f_fp without holding the lock. To make sure that other threads get a chance to run when there is contention for the file, the file-object lock should never be held when the GIL is held. -- Comment By: Chad Austin (aegis) Date: 2006-05-07 06:38 Message: Logged In: YES user_id=7212 I'd like to add that this particular problem cost me about a week of trying to figure out what the heck was going on, a stack trace thrown from Python is MUCH better than intermittent last-chance exceptions thrown from our binaries in the field. :) http://aegisknight.livejournal.com/128191.html -- Comment By: Jeremy Hylton (jhylton) Date: 2002-08-20 15:49 Message: Logged In: YES user_id=31392 Here's a checkpoint of current progress. The patch applies cleanly and even compiles. It works most of the time, but it causes a bunch of test failures. I haven't had time to debug the errors, two likely errors are incorrect propagation of errors from across the release lock boundary. (The error checking goes on inside so that clearerr() can be called while the file lock is held but PyErr_SetFromErrno() can be called while the GIL is held.) The other source of errors is
[ python-Bugs-1471427 ] tarfile.py chokes on long names
Bugs item #1471427, was opened at 2006-04-16 22:34 Message generated for change (Comment added) made by alexanderweb You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1471427&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 Status: Open >Resolution: Fixed Priority: 5 Submitted By: Alexander Schremmer (alexanderweb) Assigned to: Nobody/Anonymous (nobody) Summary: tarfile.py chokes on long names Initial Comment: The following bug is reproducible on Py 2.4.3 and 2.5. It was tested on Windows. You need a tarfile with a long file name that triggers the GNU LONGNAME extension. Extracting such a file gives me an IO error because it tries to create a file with a slash at the end. This is because # Some old tar programs represent a directory as a regular # file with a trailing slash. if tarinfo.isreg() and tarinfo.name.endswith("/ "): tarinfo.type = DIRTYPE sets the type incorrectly after it was called from the callback proc which has no possiblity to set the name of the intermediary tarinfo class because it is instantiated in the next-method. So this yields a directory which should be a file which is obviously wrong. Might be related to commit 41340 "Patch #1338314, Bug #1336623". (At least the code changed there is causing this bug). -- >Comment By: Alexander Schremmer (alexanderweb) Date: 2006-05-07 13:55 Message: Logged In: YES user_id=254738 Thanks, that seems to work. Try to get this into Py 2.5 :) -- Comment By: Lars Gustäbel (gustaebel) Date: 2006-04-25 22:59 Message: Logged In: YES user_id=642936 Fixing this issue is not quite as simple as I hoped it to be. It would be possible to implement a quick fix that solves the problem, but that would be too ugly for a stdlib module. Instead, I have been busy writing a preliminary fix for my development version of tarfile.py which is available at http://www.gustaebel.de/lars/tarfile/. It would be nice of you, if you'd download the 0.8.0 version there and give it a try. Thank you. -- Comment By: Alexander Schremmer (alexanderweb) Date: 2006-04-16 22:34 Message: Logged In: YES user_id=254738 Hmm, I just want to clarify that tarfile doesn't give the IO error (it passes silently) but my code that expects a file instead of a directory ;-) -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1471427&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1481770 ] hpux ia64 shared lib ext should be ".so"
Bugs item #1481770, was opened at 2006-05-04 05:43 Message generated for change (Comment added) made by deckrider You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1481770&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Interpreter Core Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: David Everly (deckrider) Assigned to: Nobody/Anonymous (nobody) Summary: hpux ia64 shared lib ext should be ".so" Initial Comment: On hpux ia64, the shared library extension should be ".so". This is currently problematic in that other add-on python modules (such as those for subversion) correctly detect the host_os/host_cpu and build _module.so, which is not seen by python built using ".sl". According to http://devresource.hp.com/drc/resources/portguideipf/index.jsp#dynlinkfac "Shared library names Since dynamic linking APIs operate on shared libraries, it is also important to note that the shared library naming scheme on Linux is lib*.so; whereas, on HP-UX 11i Version 1.5 the naming scheme is lib*.sl for PA and lib*.so on IPF. Also APIs may reside in different libraries files on Linux and HP-UX, so you may need to dynamically load a different shared library name on HP-UX and Linux." To translate this quote, PA=hppa and IPF=ia64. -- >Comment By: David Everly (deckrider) Date: 2006-05-07 07:22 Message: Logged In: YES user_id=1113403 Here is a patch against http://svn.python.org/projects/python/branches/release24-maint I don't have many evironments to test against, and only Linux right now (will test on HPUX ia64 tomorrow and report back). -- Comment By: David Everly (deckrider) Date: 2006-05-05 06:07 Message: Logged In: YES user_id=1113403 The patch I'm using now only works on hppa/ia64 and isn't anything that can coexist nicely in the source package on other hardware/os combinations. I've looked at http://svn.python.org/projects/python/branches/release24-maint/ I'm accustomed to a system using autoconf/libtool/automake (recent versions) and never committing the output of those tools, but only running them at source package generation time. I say this, only to point out that I'm not understanding the principles behind what I see in subversion. I see configure, and also configure.in. Which should be patched? And if I don't patch configure, what is the process for regenerating it (and with what versions of automake, autoconf, and libtool?). Also, the most recent libtool already correctly determines shared library extension. So I could probably provide a patch, but would need to understand the environment better in order to do so. -- Comment By: Neal Norwitz (nnorwitz) Date: 2006-05-05 01:02 Message: Logged In: YES user_id=33168 Do you think you could work on a patch to address this issue? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1481770&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Feature Requests-1110010 ] 'attrmap' function, attrmap(x)['attname'] == x.attname
Feature Requests item #1110010, was opened at 2005-01-26 11:28 Message generated for change (Comment added) made by gregsmith You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1110010&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: None Status: Open Resolution: None Priority: 5 Submitted By: Gregory Smith (gregsmith) Assigned to: Nobody/Anonymous (nobody) Summary: 'attrmap' function, attrmap(x)['attname'] == x.attname Initial Comment: One of the side effects of the new-style classes is that objects don't necessarily have __dict__ attached to them. It used to be possible to write things like def __str__(self): return "Node %(name)s, %(nlinks)d links, active: %(active)s" % self.__dict__ ... but this doesn't work if the class doesn't have a __dict__. Even if does, I'm not sure it will always get members from base classes. There is a 'vars' function; you could put 'vars(self)' in the above instead of self.__dict__, but it still doesn't work if the class doesn't have a __dict__. I can see different solutions for this: (1) change the 'string %' operator so that it allows %(.name)s, leading to a getattr() on the right-side argument rather than a getitem. return "Node %(.name)s, %(.nlinks)d links, active: %(.active)s" % self (2) Make a builtin like vars, but which works when the object doesn't have a __dict__ I.e. attrmap(x) would return a mapping which is bound to x, and reading attrmap(x)['attname'] is the same as getattr(x,'attname'). Thus return "Node %(name)s, %(nlinks)d links, active: %(active)s" % attrmap(self) This attrmap() function can be implemented in pure python, of course. I originally thought (1) made a lot of sense, but (2) seems to work just as well and doesn't require changing much. Also, (1) allows cases like "%(name)s %(.name2)s", which are not very useful, but are very likely to be created by accident; whereas in (2) you are deciding on the right of the '%' whether you are naming attributes or providing mapping keys. I'm not sure it's a good idea change 'vars' to have this behaviour, since vars(x).keys() currently works in a predictable way when vars(x) works; whereas attrmap(x).keys() may not be complete, or possible, even when attrmap(x) is useful. I.e. when x has a __getattr__ defined. On the other hand, vars(x) doesn't currently do much at all, so maybe it's possible to enhance it like this without breaking anything. The motivation for this came from the "%(name)s" issue, but the attrmap() function would be useful in other places e.g. processdata( infile, outfile, **attrmap(options)) ... where options might be obtained from optparse, e.g. Or, an attrmap can be used with the new Templates: string.Template('Node $name').substitute( attrmap(node)) Both of these examples will work with vars(), but only when the object actually has __dict__. This is why I'm thinking it may make sense to enhance vars: some code may be broken by the change; but other code, broken by new-style classes, may be unbroken by this change. The proxy could be writable, so that attrmap(x)['a'] = y is the same as x.a = y .. which could have more uses. A possible useful (possibly weird) variation: attrmap accepts 1 or more parameters, and the resulting proxy is bound to all of them. when attrmap(x,y,z)['a'] is done, the proxy will try x.a, y.a, z.a until one of them doesn't raise AttributeError. So it's equivalent to merging dictionaries. This would be useful in the %(name)s or Template cases, where you want information from several objects. -- >Comment By: Gregory Smith (gregsmith) Date: 2006-05-07 11:27 Message: Logged In: YES user_id=292741 I can't disagree with that -- one of the things I like about python is that simple funcs I use fairly often can usually be retyped out of my head in less time than it takes to find them and copy them from another software project- and more importantly, there's basically no risk that the fresh one will be buggy, if it's expression is simple and clear. So, the overhead of maintaining a zillion 'standard' utility funcs outweighs the cost of having to recode them instead, when they are small and simple. This applies as much to the core library as it does to a site-specific library. I do prefer if they have the same names each time I use them though, since it makes it easier to transplant higher-level chunks of code from one program to another. When I ran across this issue and its solution, I figured it would be something that, if available, could be used often enough to justify have a standard name. But I agree now it shouldn't be a builtin; having it as operator.attrmap still means you can copy code using it from one application to another
[ python-Bugs-1483384 ] Add set.member() method
Bugs item #1483384, was opened at 2006-05-07 11:41 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1483384&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Feature Request Status: Open Resolution: None Priority: 5 Submitted By: Michael Tsai (michaeltsai) Assigned to: Nobody/Anonymous (nobody) Summary: Add set.member() method Initial Comment: Right now, when I check membership in a set, the __in__ method just returns True/False if there is an object in the set that's == to the argument. I would like to have a member() method that returns the object in the set or raises KeyError if the argument is not in the set. This would be useful for interning and other cases where right now I'd use a degenerate dictionary where the keys and values are equal. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1483384&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Feature Requests-1483384 ] Add set.member() method
Feature Requests item #1483384, was opened at 2006-05-07 15:41 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1483384&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: Extension Modules >Group: None Status: Open Resolution: None Priority: 5 Submitted By: Michael Tsai (michaeltsai) >Assigned to: Raymond Hettinger (rhettinger) Summary: Add set.member() method Initial Comment: Right now, when I check membership in a set, the __in__ method just returns True/False if there is an object in the set that's == to the argument. I would like to have a member() method that returns the object in the set or raises KeyError if the argument is not in the set. This would be useful for interning and other cases where right now I'd use a degenerate dictionary where the keys and values are equal. -- >Comment By: Georg Brandl (gbrandl) Date: 2006-05-07 20:36 Message: Logged In: YES user_id=849994 Moving to Feature Requests. Note that I do not think Raymond will agree to this. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1483384&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Feature Requests-1483384 ] Add set.member() method
Feature Requests item #1483384, was opened at 2006-05-07 10:41 Message generated for change (Comment added) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1483384&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Extension Modules Group: None Status: Open Resolution: None Priority: 5 Submitted By: Michael Tsai (michaeltsai) Assigned to: Raymond Hettinger (rhettinger) Summary: Add set.member() method Initial Comment: Right now, when I check membership in a set, the __in__ method just returns True/False if there is an object in the set that's == to the argument. I would like to have a member() method that returns the object in the set or raises KeyError if the argument is not in the set. This would be useful for interning and other cases where right now I'd use a degenerate dictionary where the keys and values are equal. -- >Comment By: Raymond Hettinger (rhettinger) Date: 2006-05-08 00:37 Message: Logged In: YES user_id=80475 I'm curious to see some of your dictionary examples that do not seem to tranlate cleanly with the existing set API. In published code, I've not seen people writing anything like what is being requested, i.e. I haven't seen fragments like: if x in s: return x else: raise KeyError -- Comment By: Georg Brandl (gbrandl) Date: 2006-05-07 15:36 Message: Logged In: YES user_id=849994 Moving to Feature Requests. Note that I do not think Raymond will agree to this. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1483384&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com