[ python-Bugs-595601 ] file (& socket) I/O are not thread safe

2006-05-07 Thread SourceForge.net
Bugs item #595601, was opened at 2002-08-15 11:34
Message generated for change (Comment added) made by aegis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=595601&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jeremy Hylton (jhylton)
Assigned to: Jeremy Hylton (jhylton)
Summary: file (& socket) I/O are not thread safe

Initial Comment:
We recently found an assertion failure in the universal
newline support when running a multithreaded program
where two threads used the same Python file object. 
The assert(stream != NULL) test in 
Py_UniversalNewlineFread() fails once in a blue moon,
where stream is the stdio FILE * that the fileobject
wraps.  Further analysis suggests that there is a race
condition between checking FILE * and using FILE * that
exists in at least Python 2.1 and up.

I'll actually describe the problem as it exists in
Python 2.2, because it is simpler to avoid the
universal newline code.  That code isn't the source of
the problem, although it's assert() uncovers it in a
clear way.

In file_read() (rev 2.141.6.5), the first thing it does
is check if f_fp (the FILE *) is NULL.  If so it raises
an IOError -- operation on closed file object.  Later,
file_read() enters a for loop that calls fread() until
enough bytes have been read.

for (;;) {
Py_BEGIN_ALLOW_THREADS
errno = 0;
chunksize = fread(BUF(v) + bytesread, 1,
  buffersize - bytesread, f->f_fp);
Py_END_ALLOW_THREADS
if (chunksize == 0) {
if (!ferror(f->f_fp))
break;
PyErr_SetFromErrno(PyExc_IOError);
clearerr(f->f_fp);
Py_DECREF(v);
return NULL;
}

The problem is that fread() is called after the global
interpreter lock is released.  Since the lock is
released, another Python thread could run and modify
the file object, changing the value of f->f_fp.  Under
the current interpreter lock scheme, it isn't safe to
use f->f_fp without holding the interpreter lock.

The current file_read() code can fail in a variety of
ways.  It's possible for a second thread to close the
file, which will set f->f_fp to NULL.  Who knows what
fread() will do when NULL is passed.

The universal newline code is squirrels the FILE * in a
local variable, which is worse.  If it happens that
another thread closes the file, at best the local
points to a closed FILE *.  But that memory could get
recycled and then there's no way to know what it points to.

socket I/O has a similar problem with unsafe sharing of
the file descriptor.  However, this problem seems less
severe in general, because we'd just be passing a bogus
file descriptor to a system call.  We don't have to
worry about whether stdio will dump core when passed a
bogus pointer.  There is a chance the a socket will be
closed and its file descriptor used for a different
socket.  So a call to recv() with one socket ends up
using a different socket.  That will be a nightmare to
debug, but it won't cause a segfault.  (And, in
general, files and sockets shouldn't be shared between
application threads unless the application is going to
make sure its safe.)

The solution to this problem is to use a
per-file-object lock to guard access to f->f_fp.  No
thread should read or right f->f_fp without holding the
lock.  To make sure that other threads get a chance to
run when there is contention for the file, the
file-object lock should never be held when the GIL is held.


--

Comment By: Chad Austin (aegis)
Date: 2006-05-07 06:38

Message:
Logged In: YES 
user_id=7212

I'd like to add that this particular problem cost me about a
week of trying to figure out what the heck was going on, a
stack trace thrown from Python is MUCH better than
intermittent last-chance exceptions thrown from our binaries
in the field.  :)

http://aegisknight.livejournal.com/128191.html


--

Comment By: Jeremy Hylton (jhylton)
Date: 2002-08-20 15:49

Message:
Logged In: YES 
user_id=31392

Here's a checkpoint of current progress.  The patch applies
cleanly and even compiles.  It works most of the time, but
it causes a bunch of test failures.  I haven't had time to
debug the errors, two likely errors are incorrect
propagation of errors from across the release lock boundary.
 (The error checking goes on inside so that clearerr() can
be called while the file lock is held but
PyErr_SetFromErrno() can be called while the GIL is held.) 
The other source of errors is 

[ python-Bugs-1471427 ] tarfile.py chokes on long names

2006-05-07 Thread SourceForge.net
Bugs item #1471427, was opened at 2006-04-16 22:34
Message generated for change (Comment added) made by alexanderweb
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1471427&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
>Resolution: Fixed
Priority: 5
Submitted By: Alexander Schremmer (alexanderweb)
Assigned to: Nobody/Anonymous (nobody)
Summary: tarfile.py chokes on long names

Initial Comment:
The following bug is reproducible on Py 2.4.3 and 2.5. 
It was tested on Windows. You need a tarfile with a 
long file name that triggers the GNU LONGNAME 
extension.

Extracting such a file gives me an IO error because it 
tries to create a file with a slash at the end. This is 
because 

# Some old tar programs represent a directory 
as a regular
# file with a trailing slash.
if tarinfo.isreg() and tarinfo.name.endswith("/
"):
tarinfo.type = DIRTYPE

sets the type incorrectly after it was called from the 
callback proc which has no possiblity to set the name 
of the intermediary tarinfo class because it is 
instantiated in the next-method.

So this yields a directory which should be a file which 
is obviously wrong. Might be related to commit 41340 
"Patch #1338314, Bug #1336623". (At least the code 
changed there is causing this bug).

--

>Comment By: Alexander Schremmer (alexanderweb)
Date: 2006-05-07 13:55

Message:
Logged In: YES 
user_id=254738

Thanks, that seems to work. Try to get this into Py 2.5 :)

--

Comment By: Lars Gustäbel (gustaebel)
Date: 2006-04-25 22:59

Message:
Logged In: YES 
user_id=642936

Fixing this issue is not quite as simple as I hoped it to
be. It would be possible to implement a quick fix that
solves the problem, but that would be too ugly for a stdlib
module. Instead, I have been busy writing a preliminary fix
for my development version of tarfile.py which is available
at http://www.gustaebel.de/lars/tarfile/.
It would be nice of you, if you'd download the 0.8.0 version
there and give it a try. Thank you.


--

Comment By: Alexander Schremmer (alexanderweb)
Date: 2006-04-16 22:34

Message:
Logged In: YES 
user_id=254738

Hmm, I just want to clarify that tarfile doesn't give the IO 
error (it passes silently) but my code that expects a file 
instead of a directory ;-)

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1471427&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1481770 ] hpux ia64 shared lib ext should be ".so"

2006-05-07 Thread SourceForge.net
Bugs item #1481770, was opened at 2006-05-04 05:43
Message generated for change (Comment added) made by deckrider
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1481770&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: David Everly (deckrider)
Assigned to: Nobody/Anonymous (nobody)
Summary: hpux ia64 shared lib ext should be ".so"

Initial Comment:
On hpux ia64, the shared library extension should be
".so".  This is currently problematic in that other
add-on python modules (such as those for subversion)
correctly detect the host_os/host_cpu and build
_module.so, which is not seen by python built using ".sl".

According to
http://devresource.hp.com/drc/resources/portguideipf/index.jsp#dynlinkfac


"Shared library names

Since dynamic linking APIs operate on shared libraries,
it is also important to note that the shared library
naming scheme on Linux is lib*.so; whereas, on HP-UX
11i Version 1.5 the naming scheme is lib*.sl for PA and
lib*.so on IPF. Also APIs may reside in different
libraries files on Linux and HP-UX, so you may need to
dynamically load a different shared library name on
HP-UX and Linux."

To translate this quote, PA=hppa and IPF=ia64.


--

>Comment By: David Everly (deckrider)
Date: 2006-05-07 07:22

Message:
Logged In: YES 
user_id=1113403

Here is a patch against
http://svn.python.org/projects/python/branches/release24-maint

I don't have many evironments to test against, and only
Linux right now (will test on HPUX ia64 tomorrow and report
back).

--

Comment By: David Everly (deckrider)
Date: 2006-05-05 06:07

Message:
Logged In: YES 
user_id=1113403

The patch I'm using now only works on hppa/ia64 and isn't
anything that can coexist nicely in the source package on
other hardware/os combinations.

I've looked at
http://svn.python.org/projects/python/branches/release24-maint/

I'm accustomed to a system using autoconf/libtool/automake
(recent versions) and never committing the output of those
tools, but only running them at source package generation time.

I say this, only to point out that I'm not understanding the
principles behind what I see in subversion.  I see
configure, and also configure.in.  Which should be patched?
 And if I don't patch configure, what is the process for
regenerating it (and with what versions of automake,
autoconf, and libtool?).

Also, the most recent libtool already correctly determines
shared library extension.

So I could probably provide a patch, but would need to
understand the environment better in order to do so.

--

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-05-05 01:02

Message:
Logged In: YES 
user_id=33168

Do you think you could work on a patch to address this issue?

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1481770&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Feature Requests-1110010 ] 'attrmap' function, attrmap(x)['attname'] == x.attname

2006-05-07 Thread SourceForge.net
Feature Requests item #1110010, was opened at 2005-01-26 11:28
Message generated for change (Comment added) made by gregsmith
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1110010&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Gregory Smith (gregsmith)
Assigned to: Nobody/Anonymous (nobody)
Summary: 'attrmap' function, attrmap(x)['attname'] == x.attname

Initial Comment:
One of the side effects of the new-style classes is that
objects don't necessarily have __dict__ attached to them.
It used to be possible to write things like

   def __str__(self):
  return "Node %(name)s, %(nlinks)d links,
active: %(active)s" % self.__dict__


... but this doesn't work if the class doesn't have a
__dict__. Even if does, I'm not sure it will always get
members from base classes.

There is a 'vars' function; you could put 'vars(self)'
in the above instead of self.__dict__, but it still
doesn't work if
the class doesn't have a __dict__.

I can see different solutions for this:

(1) change the 'string %' operator so that it allows
 %(.name)s, leading to a getattr() on the right-side
argument
  rather than a getitem. 

  return "Node %(.name)s, %(.nlinks)d links,
active: %(.active)s" % self

(2) Make a builtin like vars, but which works when the
object doesn't have a __dict__ I.e. attrmap(x) would
return a mapping which is bound to x, and reading
attrmap(x)['attname'] is the same as
getattr(x,'attname'). Thus

  return "Node %(name)s, %(nlinks)d links,
active: %(active)s" % attrmap(self)


This attrmap() function can be implemented in pure
python, of course.

I originally thought (1) made a lot of sense, but (2) seems
to work just as well and doesn't require changing much.
Also, (1) allows  cases like "%(name)s %(.name2)s",
which are not very useful, but are very likely to be
created by
accident; whereas in (2) you are deciding on the right
of the '%' whether you are naming attributes or
providing mapping keys. 

I'm not sure it's a good idea change 'vars' to have
this behaviour, since vars(x).keys() currently works in
a predictable way when vars(x) works; whereas
attrmap(x).keys() may not be complete, or possible, 
even when attrmap(x) is useful. I.e. when x has a
__getattr__ defined.
On the other hand, vars(x) doesn't currently do much at
all, so maybe it's possible to enhance it like this
without breaking anything.

The motivation for this came from the "%(name)s" issue,
but the attrmap() function would be useful in other
places e.g.

processdata( infile,  outfile, **attrmap(options))

... where options might be obtained from optparse, e.g.
  
Or, an attrmap can be used with the new Templates:
string.Template('Node $name').substitute( attrmap(node))

Both of these examples will work with vars(), but only
when the object actually has __dict__. This is why
I'm thinking it may make sense to enhance vars: some
code may be broken by the change; but other code,
broken by new-style classes, may be unbroken by this
change.

The proxy could be writable, so that
attrmap(x)['a'] = y
is the same as
 x.a = y
.. which could have more uses.

A possible useful (possibly weird) variation: attrmap
accepts 1 or more parameters, and the resulting
proxy is bound to all of them. when attrmap(x,y,z)['a']
is done, the proxy will try x.a, y.a, z.a until one of
them doesn't raise AttributeError. So it's equivalent
to merging dictionaries. This would be useful
in the %(name)s or Template cases, where you want
information from several objects.





--

>Comment By: Gregory Smith (gregsmith)
Date: 2006-05-07 11:27

Message:
Logged In: YES 
user_id=292741

I can't disagree with that -- one of the things I like about
python is that simple funcs I use fairly often can usually
be retyped out of my head in less time than it takes to find
them and copy them from another software project- and more
importantly, there's basically no risk that the fresh one
will be buggy, if it's expression is simple and clear.
So, the overhead of maintaining a zillion 'standard' utility
funcs outweighs the cost of having to recode them instead,
when they are small and simple. This applies as much to the
core library as it does to a site-specific library.

I do prefer if they have the same names each time I use them
though, since it makes it easier to transplant higher-level
chunks of code from one program to another. When I ran
across this issue and its solution, I figured it would be
something that, if available, could be used often enough to
justify have a standard name. But I agree now it shouldn't
be a builtin; having it as operator.attrmap still means you
can copy code using it from one application to another

[ python-Bugs-1483384 ] Add set.member() method

2006-05-07 Thread SourceForge.net
Bugs item #1483384, was opened at 2006-05-07 11:41
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1483384&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Feature Request
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Tsai (michaeltsai)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add set.member() method

Initial Comment:
Right now, when I check membership in a set, the __in__ method just 
returns True/False if there is an object in the set that's == to the 
argument. I would like to have a member() method that returns the object 
in the set or raises KeyError if the argument is not in the set. This would 
be useful for interning and other cases where right now I'd use a 
degenerate dictionary where the keys and values are equal.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1483384&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Feature Requests-1483384 ] Add set.member() method

2006-05-07 Thread SourceForge.net
Feature Requests item #1483384, was opened at 2006-05-07 15:41
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1483384&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: Extension Modules
>Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Tsai (michaeltsai)
>Assigned to: Raymond Hettinger (rhettinger)
Summary: Add set.member() method

Initial Comment:
Right now, when I check membership in a set, the __in__ method just 
returns True/False if there is an object in the set that's == to the 
argument. I would like to have a member() method that returns the object 
in the set or raises KeyError if the argument is not in the set. This would 
be useful for interning and other cases where right now I'd use a 
degenerate dictionary where the keys and values are equal.

--

>Comment By: Georg Brandl (gbrandl)
Date: 2006-05-07 20:36

Message:
Logged In: YES 
user_id=849994

Moving to Feature Requests. Note that I do not think Raymond
will agree to this.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1483384&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Feature Requests-1483384 ] Add set.member() method

2006-05-07 Thread SourceForge.net
Feature Requests item #1483384, was opened at 2006-05-07 10:41
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1483384&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Extension Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Michael Tsai (michaeltsai)
Assigned to: Raymond Hettinger (rhettinger)
Summary: Add set.member() method

Initial Comment:
Right now, when I check membership in a set, the __in__ method just 
returns True/False if there is an object in the set that's == to the 
argument. I would like to have a member() method that returns the object 
in the set or raises KeyError if the argument is not in the set. This would 
be useful for interning and other cases where right now I'd use a 
degenerate dictionary where the keys and values are equal.

--

>Comment By: Raymond Hettinger (rhettinger)
Date: 2006-05-08 00:37

Message:
Logged In: YES 
user_id=80475

I'm curious to see some of your dictionary examples that 
do not seem to tranlate cleanly with the existing set API.

In published code, I've not seen people writing anything 
like what is being requested, i.e. I haven't seen 
fragments like:
   if x in s:
  return x
   else:
  raise KeyError



--

Comment By: Georg Brandl (gbrandl)
Date: 2006-05-07 15:36

Message:
Logged In: YES 
user_id=849994

Moving to Feature Requests. Note that I do not think Raymond
will agree to this.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1483384&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com