New submission from johansen <[EMAIL PROTECTED]>: We're using Python to build the new packaging system for OpenSolaris. Yesterday, a user reported that when they ran the pkg command, piped the output to grep, and then typed ^C, sometimes they'd get this error:
$ pkg list | grep office ^Cclose failed: [Errno 11] Resource temporarily unavailable We assumed that this might be a problem in the signal handling we've employed to catch SIGPIPE; however, it turns out that the problem is in the file_dealloc() code. For the perversely curious, additional details may be found in the original bug located here: http://defect.opensolaris.org/bz/show_bug.cgi?id=2083 Essentially we found the following: The error message is emitted from fileobject.c: file_dealloc() The relevant portion of the routine looks like this: static void file_dealloc(PyFileObject *f) { int sts = 0; if (f->weakreflist != NULL) PyObject_ClearWeakRefs((PyObject *) f); if (f->f_fp != NULL && f->f_close != NULL) { Py_BEGIN_ALLOW_THREADS sts = (*f->f_close)(f->f_fp); Py_END_ALLOW_THREADS if (sts == EOF) #ifdef HAVE_STRERROR PySys_WriteStderr("close failed: [Errno %d] %s\n", errno, strerror(errno)); In the cases we encountered, the function pointer f_close is actually a call to sysmodule.c: _check_and_flush() That routine looks like this: static int _check_and_flush (FILE *stream) { int prev_fail = ferror (stream); return fflush (stream) || prev_fail ? EOF : 0; } check_and_flush calls ferror(3C) and then fflush(3C) on the FILE stream associated with the file object. There's just one problem here. If it finds an error that was previously encountered on the file stream, there's no guarantee that errno will be valid. Should an error be encountered in fflush(3C), errno will get set; however, the contents of errno are undefined should fflush() return successfully. Here's what happens in the code I observed: I set a write watchpoint on errno and observed the different times it was accessed. After sifting through a bunch of red-herrings, I found that a call to PyThread_acquire_lock() that sets errno to 11 (EAGAIN). This occurs when PyThread_acquire_lock() calls sem_trywait(3C) and finds the semaphore already locked. Errno doesn't get accessed again until a call to libc.so.1`isseekable() that simply saves and restores the existing errno. Since we've taken a ^C (SIGINT), the interpreter begins the finalization process and eventually calls file_dealloc(). This routine calls _check_and_flush(). In the case that I observed, ferror(3C) returns a non-zero value but fflush(3C) completes successfully. This causes the routine to return EOF to the caller. file_dealloc() assumes that since it received an EOF an error occurred and it should call strerror(errno). However, since this is just returning the state of a previous error, errno is invalid. This is what causes the spurious EAGAIN message. Just to be sure, I traced the return value and errno of failed syscalls that were invoked by the interpreter. I was unable to observe any syscalls returning EAGAIN. This is because (at least on OpenSolaris) sem_trywait(3C) calls sema_trywait(3C). The sema_trywait returns EBUSY if the semaphore is held and sem_trywait converts this to EAGAIN. None of these errors are passed out of the kernel. It's not clear to me whether _check_and_flush(), file_dealloc(), or both need modification. At a minimum, it's not safe for file_dealloc() to assume that errno is set correctly if the function underneath it is using ferror(3C) to find the presence of an error on the stream. ---------- components: Interpreter Core messages: 67560 nosy: johansen severity: normal status: open title: file_dealloc() assumes errno is set when EOF is returned type: behavior versions: Python 2.4 _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3014> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com