threading.Thread vs. signal.signal
I'd like to create a program that invokes a function once a second, and terminates when the user types ctrl-c. So I created a signal handler, created a threading.Thread which does the invocation every second, and started the thread. The signal handler seems to be ineffective. Any idea what I'm doing wrong? This is on Fedora FC4 and Python 2.4.1. The code appears below. If I do the while ... sleep in the main thread, then the signal handler works as expected. (This isn't really a satisfactory implementation because the function called every second might take a significant fraction of a second to execute.) Jack Orenstein import sys import signal import threading import datetime import time class metronome(threading.Thread): def __init__(self, interval, function): threading.Thread.__init__(self) self.interval = interval self.function = function self.done = False def cancel(self): print '>>> cancel' self.done = True def run(self): while not self.done: time.sleep(self.interval) if self.done: print '>>> break!' break else: self.function() def ctrl_c_handler(signal, frame): print '>>> ctrl c' global t t.cancel() sys.stdout.close() sys.stderr.close() sys.exit(0) signal.signal(signal.SIGINT, ctrl_c_handler) def hello(): print datetime.datetime.now() t = metronome(1, hello) t.start() -- http://mail.python.org/mailman/listinfo/python-list
Threading and consuming output from processes
I am developing a Python program that submits a command to each node of a cluster and consumes the stdout and stderr from each. I want all the processes to run in parallel, so I start a thread for each node. There could be a lot of output from a node, so I have a thread reading each stream, for a total of three threads per node. (I could probably reduce to two threads per node by having the process thread handle stdout or stderr.) I've developed some code and have run into problems using the threading module, and have questions at various levels of detail. 1) How should I solve this problem? I'm an experienced Java programmer but new to Python, so my solution looks very Java-like (hence the use of the threading module). Any advice on the right way to approach the problem in Python would be useful. 2) How many active Python threads is it reasonable to have at one time? Our clusters have up to 50 nodes -- is 100-150 threads known to work? (I'm using Python 2.2.2 on RedHat 9.) 3) I've run into a number of problems with the threading module. My program seems to work about 90% of the time. The remaining 10%, it looks like notify or notifyAll don't wake up waiting threads; or I find some other problem that makes me wonder about the stability of the threading module. I can post details on the problems I'm seeing, but I thought it would be good to get general feedback first. (Googling doesn't turn up any signs of trouble.) Thanks. Jack Orenstein -- http://mail.python.org/mailman/listinfo/python-list
Re: Threading and consuming output from processes
I asked: I am developing a Python program that submits a command to each node of a cluster and consumes the stdout and stderr from each. I want all the processes to run in parallel, so I start a thread for each node. There could be a lot of output from a node, so I have a thread reading each stream, for a total of three threads per node. (I could probably reduce to two threads per node by having the process thread handle stdout or stderr.) Simon Wittber said: > In the past, I have used the select module to manage asynchronous > IO operations. > > I pass the select.select function a list of file-like objects, and it > returns a list of file-like objects which are ready for reading and > writing. Donn Cave said: As I see another followup has already mentioned, the classic "pre threads" solution to multiple I/O sources is the select(2) function, ... Thanks for your replies. The streams that I need to read contain pickled data. The select call returns files that have available input, and I can use read(file_descriptor, max) to read some of the input data. But then how can I convert the bytes just read into a stream for unpickling? I somehow need to take the bytes arriving for a given file descriptor and buffer them until the unpickler has enough data to return a complete unpickled object. (It would be nice to do this without copying the bytes from one place to another, but I don't even see how do solve the problem with copying.) Jack -- http://mail.python.org/mailman/listinfo/python-list
Thread scheduling
I am using Python 2.2.2 on RH9, and just starting to work with Python threads. I started using the threading module and found that 10-20% of the runs of my test program would hang. I developed smaller and smaller test cases, finally arriving at the program at the end of this message, which uses the thread module, not threading. This program seems to point to problems in Python thread scheduling. The program is invoked like this: python threadtest.py THREADS COUNT THREADS is the number of threads created. Each thread contains a loop that runs COUNT times, and all threads increment a counter. (The counter is incremented without locking -- I expect to see a final count of less than THREADS * COUNT.) Running with THREADS = 2 and COUNT = 10, most of the time, the program runs to completion. About 20% of the time however, I see one thread finish, but the other thread never resumes. Here is output from a run that completes normally: [EMAIL PROTECTED] python threadtest.py 2 10 nThreads: 2 nCycles: 10 thread 1: started thread 1: i = 0, counter = 1 thread 2: started thread 2: i = 0, counter = 2691 thread 1: i = 1, counter = 13496 thread 2: i = 1, counter = 22526 thread 1: i = 2, counter = 27120 thread 2: i = 2, counter = 40365 thread 1: i = 3, counter = 41264 thread 1: i = 4, counter = 55922 thread 2: i = 3, counter = 58416 thread 2: i = 4, counter = 72647 thread 1: i = 5, counter = 74602 thread 1: i = 6, counter = 88468 thread 2: i = 5, counter = 99319 thread 1: i = 7, counter = 110144 thread 2: i = 6, counter = 110564 thread 2: i = 7, counter = 125306 thread 1: i = 8, counter = 129252 Still waiting, done = 0 thread 2: i = 8, counter = 141375 thread 1: i = 9, counter = 147459 thread 2: i = 9, counter = 155268 thread 1: leaving thread 2: leaving Still waiting, done = 2 All threads have finished, counter = 168322 Here is output from a run that hangs. I killed the process using ctrl-c. [EMAIL PROTECTED] python threadtest.py 2 10 nThreads: 2 nCycles: 10 thread 1: started thread 1: i = 0, counter = 1 thread 2: started thread 2: i = 0, counter = 990 thread 1: i = 1, counter = 11812 thread 2: i = 1, counter = 13580 thread 1: i = 2, counter = 19127 thread 2: i = 2, counter = 25395 thread 1: i = 3, counter = 31457 thread 1: i = 4, counter = 44033 thread 2: i = 3, counter = 48563 thread 1: i = 5, counter = 55131 thread 1: i = 6, counter = 65291 thread 1: i = 7, counter = 78145 thread 2: i = 4, counter = 82715 thread 1: i = 8, counter = 92073 thread 2: i = 5, counter = 101784 thread 1: i = 9, counter = 104294 thread 2: i = 6, counter = 112866 Still waiting, done = 0 thread 1: leaving Still waiting, done = 1 Still waiting, done = 1 Still waiting, done = 1 Still waiting, done = 1 Still waiting, done = 1 Still waiting, done = 1 Still waiting, done = 1 Still waiting, done = 1 Traceback (most recent call last): File "threadtest.py", line 26, in ? time.sleep(1) KeyboardInterrupt [EMAIL PROTECTED] osh]$ In this case, thread 1 finishes but thread 2 never runs again. Is this a known problem? Any ideas for workarounds? Are threads widely used in Python? Jack Orenstein # threadtest.py import sys import thread import time nThreads = int(sys.argv[1]) nCycles = int(sys.argv[2]) print 'nThreads: %d' % nThreads print 'nCycles: %d' % nCycles counter = 0 done = 0 def run(id): global done print 'thread %d: started' % id global counter for i in range(nCycles): counter += 1 if i % 1 == 0: print 'thread %d: i = %d, counter = %d' % (id, i, counter) print 'thread %d: leaving' % id done += 1 for i in range(nThreads): thread.start_new_thread(run, (i + 1,)) while done < nThreads: time.sleep(1) print 'Still waiting, done = %d' % done print 'All threads have finished, counter = %d' % counter -- http://mail.python.org/mailman/listinfo/python-list
Re: Thread scheduling
Peter Hansen wrote: > Jack Orenstein wrote: > >> I am using Python 2.2.2 on RH9, and just starting to work with Python >> threads. > > > Is this also the first time you've worked with threads in general, > or do you have much experience with them in other situations? Yes, I've used threading in Java. > You've got two shared global variables, "done" and "counter". > Each of these is modified in a manner that is not thread-safe. > I don't know if "counter" is causing trouble, but it seems > likely that "done" is. I understand that. As I said in my posting, "The counter is incremented without locking -- I expect to see a final count of less than THREADS * COUNT." This is a test case, and I threw out more and more code, including synchronization around counter and done, until it got as simple as possible and still showed the problem. > Basically, the statement "done += 1" is equivalent to the > statement "done = done + 1" which, in Python or most other > languages is not thread-safe. The "done + 1" part is > evaluated separately from the assignment, so it's possible > that two threads will be executing the "done + 1" part > at the same time and that the following assignment of > one thread will be overwritten immediately by the assignment > in the next thread, but with a value that is now one less > than what you really wanted. Understood. I was counting on this being unlikely for my test case. I realize this isn't something to rely on in real software. > If you really want to increment globals from the thread, you > should look into locks. Using the "threading" module (as is > generally recommended, instead of using "thread"), you would > use threading.Lock(). As my note said, I did start with the threading module. And variables updated by different threads were protected by threading.Condition variables. As I analyzed my test cases, and threading.py, I started suspecting thread scheduling. I then wrote the test case in my email, which does not rely on the threading module at all. The point of the test is not to maintain counter -- it's to show that sometimes even after one thread completes, the other thread never is scheduled again. This seems wrong. Try running the code, and let me see if you see this behavior. If you'd like, replace this: counter += 1 by this: time.sleep(0.01 * id) You should see the same problem. So that removes counter from the picture. And the two increments of done (one by each thread) are still almost certainly not going to coincide and cause a problem. Also, if you look at the output from the code on a hang, you will see that 'thread X: leaving' only prints once. This has nothing to do with what happens with the done variable. Jack -- http://mail.python.org/mailman/listinfo/python-list
Re: Thread scheduling
On my machines (one Py2.4 on WinXP, one Py2.3.4 on RH9.0) I don't see this behaviour. Across about fifty runs each. Thanks for trying this. One thing you might try is experimenting with sys.setcheckinterval(), just to see what effect it might have, if any. That does seem to have an impact. At 0, the problem was completely reproducible. At 100, I couldn't get it to occur. It's also possible there were some threading bugs in Py2.2 under Linux. Maybe you could repeat the test with a more recent version and see if you get different behaviour. (Not that that proves anything conclusively, but at least it might be a good solution for your immediate problem.) 2.3 (on the same machine) does seem better, even with setcheckinterval(0). Thanks for your suggestions. Can anyone with knowledge of Python internals comment on these results? (Look earlier in the thread for details. But basically, a very simple program with the thread module, running two threads, shows that on occasion, one thread finishes and the other never runs again. python2.3 seems better, as does python2.2 with sys.setcheckinterval(100).) Jack -- http://mail.python.org/mailman/listinfo/python-list
distutils setup ignoring scripts
I'm using Python 2.2 on RH9. I have a set of Python modules organized into a root package and one other package named foobar. setup.py looks like this: from distutils.core import setup setup( name = 'foobar', version = '0.3', description = 'Foo Bar', author = 'Jack Orenstein', author_email = '[EMAIL PROTECTED]', packages = ['', 'xyz'], scripts = ['bin/foobar'] ) The resulting package has everything in the specified directories, but does not include the script. I've tried making the path bin/foobar absolute, but that doesn't help. I've googled for known bugs of this sort but have come up emtpy. (The first line of bin/foobar is #!/usr/bin/python.) I've also tried using DISTUTIL_DEBUG, which has been uninformative, (e.g. no mention of bin/foobar at all). Can anyone see what I'm doing wrong? Jack Orenstein -- http://mail.python.org/mailman/listinfo/python-list
Re: how to remove 50000 elements from a 100000 list?
On May 5, 2006, at 9:36 AM, Ju Hui wrote: >>> a=range(10) >>> b=range(5) >>> for x in b: > ... a.remove(x) > ... > it will very slowly. Shall I change to another data structure and choos > a better arithmetic? > any suggestion is welcome. If removal is an O(n) operation, then removing 1/2 the list will take O(n**2), which you don't want. You'd be better off with the contents of "a" being in a hash table (O(1) removal in practice) or a balanced tree (O(log n) removal). Another possibility: If the a and b lists are ordered in the same way, then you could walk through the lists in order using a merge procedure, generating a new list as you go. After ruling out slow data structures and algorithms, you'll almost certainly be better off using something built in to Python rather than coding your own data structure in Python. Jack Orenstein -- http://mail.python.org/mailman/listinfo/python-list