please critique my thread code
I wrote a Python program (103 lines, below) to download developer data from SourceForge for research about social networks. Please critique the code and let me know how to improve it. An example use of the program: prompt> python download.py 1 24 The above command downloads data for the projects with IDs between 1 and 24, inclusive. As it runs, it prints status messages, with a plus sign meaning that the project ID exists. Else, it prints a minus sign. Questions: --- Are my setup and use of threads, the queue, and "while True" loop correct or conventional? --- Should the program sleep sometimes, to be nice to the SourceForge servers, and so they don't think this is a denial-of-service attack? --- Someone told me that popen is not thread-safe, and to use mechanize. I installed it and followed an example on the web site. There wasn't a good description of it on the web site, or I didn't find it. Could someone explain what mechanize does? --- How do I choose the number of threads? I am using a MacBook Pro 2.4GHz Intel Core 2 Duo with 4 GB 667 MHz DDR2 SDRAM, running OS 10.5.3. Thank you. Winston #!/usr/bin/env python # Winston C. Yang # Created 2008-06-14 from __future__ import with_statement import mechanize import os import Queue import re import sys import threading import time lock = threading.RLock() # Make the dot match even a newline. error_pattern = re.compile(".*\n\n.*", re.DOTALL) def now(): return time.strftime("%Y-%m-%d %H:%M:%S") def worker(): while True: try: id = queue.get() except Queue.Empty: continue request = mechanize.Request("http://sourceforge.net/project/"\ "memberlist.php?group_id=%d" % id) response = mechanize.urlopen(request) text = response.read() valid_id = not error_pattern.match(text) if valid_id: f = open("%d.csv" % id, "w+") f.write(text) f.close() with lock: print "\t".join((str(id), now(), "+" if valid_id else "-")) def fatal_error(): print "usage: python application start_id end_id" print print "Get the usernames associated with each SourceForge project with" print "ID between start_id and end_id, inclusive." print print "start_id and end_id must be positive integers and satisfy" print "start_id <= end_id." sys.exit(1) if __name__ == "__main__": if len(sys.argv) == 3: try: start_id = int(sys.argv[1]) if start_id <= 0: raise Exception end_id = int(sys.argv[2]) if end_id < start_id: raise Exception except: fatal_error() else: fatal_error() # Print the start time. start_time = now() print start_time # Create a directory whose name contains the start time. dir = start_time.replace(" ", "_").replace(":", "_") os.mkdir(dir) os.chdir(dir) queue = Queue.Queue(0) for i in xrange(32): t = threading.Thread(target=worker, name="worker %d" % (i + 1)) t.setDaemon(True) t.start() for id in xrange(start_id, end_id + 1): queue.put(id) # When the queue has size zero, exit in three seconds. while True: if queue.qsize() == 0: time.sleep(3) break print now() -- http://mail.python.org/mailman/listinfo/python-list
Revisiting Generators and Subgenerators
I have been reading PEP 380 because I am writing a video game/ simulation in Jython and I need cooperative multitasking. PEP 380 hits on my problem, but does not quite solve it for me. I have the following proposal as an alternative to PEP380. I don't know if this is the right way for me to introduce my idea, but below is my writeup. Any thoughts? Proposal for a new Generator Syntax in Python 3K-- A Baton object for generators to allow subfunction to yield, and to make them symetric. Abstract Generators can be used to make coroutines. But they require the programmer to take special care in how he writes his generator. In particular, only the generator function may yield a value. We propose a modification to generators in Python 3 where a "Baton" object is given to both sides of a generator. Both sides use the baton object to pass execution to the other side, and also to pass values to the other side. The advantages of a baton object over the current scheme are: (1) the generator function can pass the baton to a subfunction, solving the needs of PEP 380, (2) after creation both sides of the generator function are symetric--they both can call yield(), send(), next(). They do the same thing. This means programming with generators is the same as programming with normal functions. No special contortions are needed to pass values back up to a yield command at the top. Motivation -- Generators make certain programming tasks easier, such as (a) an iterator which is of infinite length, (b) using a "trampoline function" they can emulate coroutines and cooperative multitasking, (c) they can be used to make both sides of a producer-consumer pattern easy to write-- both sides can appear to be the caller. On the down side, generators as they currently are implemented in Python 3.1 require the programmer to take special care in how he writes his generator. In particular, only the generator function may yield a value--subfunctions called by the generator function may not yield a value. Here are two use-cases in which generators are commonly used, but where the current limitation causes less readable code: 1) a long generator function which the programmer wants to split into several functions. The subfunctions should be able to yield a result. Currently the subfunctions have to pass values up to the main generator and have it yield the results back. Similarly subfunctions cannot receive values that the caller sends with generator.send() 2) generators are great for cooperative multitasking. A common use- case is agent simulators where many small "tasklets" need to run and then pass execution over to other tasklets. Video games are a common scenario, as is SimPy. Without cooperative multitasking, each tasklet must be contorted to run in a small piece and then return. Generators help this, but a more complicated algorithm which is best decomposed into several functions must be contorted because the subfuctions cannot yield or recive data from the generator.send(). Here is also a nice description of how coroutines make programs easier to read and write: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html Proposal If there is a way to make a sub-function of a generator yield and receive data from generator.send(), then the two problems above are solved. For example, this declares a generator. The first parameter of the generator is the "context" which represents the other side of the execution frame. a Baton object represents a passing of the execution from one line of code to another. A program creates a Baton like so: generator f( baton ): # compute something baton.yield( result ) # compute something baton.yield( result ) baton = f() while True: print( baton.yield() ) A generator function, denoted with they keyword "generator" instead of "def" will return a "baton". Generators have the following methods: __call__( args... ) -- This creates a Baton object which is passed back to the caller, i.e. the code that executed the Baton() command. Once the baton starts working, the two sides are symetric. So we will call the first frame, frame A and the code inside 'function' frame B. Frame is is returned a baton object. As soon as frame A calls baton.yield(), frame B begins, i.e. 'function' starts to run. function is passed the baton as its first argument, and any additional arguments are also passed in. When frame B yields, any value that it yields will be returned to frame A as the result of it's yield(). Baton
Revisiting Generators and Subgenerators
Here's my proposal again, but hopefully with better formatting so you can read it easier. -Winston - Proposal for a new Generator Syntax in Python 3K-- A Baton object for generators to allow subfunction to yield, and to make them symetric. Abstract Generators can be used to make coroutines. But they require the programmer to take special care in how he writes his generator. In particular, only the generator function may yield a value. We propose a modification to generators in Python 3 where a "Baton" object is given to both sides of a generator. Both sides use the baton object to pass execution to the other side, and also to pass values to the other side. The advantages of a baton object over the current scheme are: (1) the generator function can pass the baton to a subfunction, solving the needs of PEP 380, (2) after creation both sides of the generator function are symetric--they both can call yield(), send(), next(). They do the same thing. This means programming with generators is the same as programming with normal functions. No special contortions are needed to pass values back up to a yield command at the top. Motivation -- Generators make certain programming tasks easier, such as (a) an iterator which is of infinite length, (b) using a "trampoline function" they can emulate coroutines and cooperative multitasking, (c) they can be used to make both sides of a producer-consumer pattern easy to write--both sides can appear to be the caller. On the down side, generators as they currently are implemented in Python 3.1 require the programmer to take special care in how he writes his generator. In particular, only the generator function may yield a value--subfunctions called by the generator function may not yield a value. Here are two use-cases in which generators are commonly used, but where the current limitation causes less readable code: 1) a long generator function which the programmer wants to split into several functions. The subfunctions should be able to yield a result. Currently the subfunctions have to pass values up to the main generator and have it yield the results back. Similarly subfunctions cannot receive values that the caller sends with generator.send() 2) generators are great for cooperative multitasking. A common use-case is agent simulators where many small "tasklets" need to run and then pass execution over to other tasklets. Video games are a common scenario, as is SimPy. Without cooperative multitasking, each tasklet must be contorted to run in a small piece and then return. Generators help this, but a more complicated algorithm which is best decomposed into several functions must be contorted because the subfuctions cannot yield or recive data from the generator.send(). Here is also a nice description of how coroutines make programs easier to read and write: http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html Proposal If there is a way to make a sub-function of a generator yield and receive data from generator.send(), then the two problems above are solved. For example, this declares a generator. The first parameter of the generator is the "context" which represents the other side of the execution frame. a Baton object represents a passing of the execution from one line of code to another. A program creates a Baton like so: generator f( baton ): # compute something baton.yield( result ) # compute something baton.yield( result ) baton = f() while True: print( baton.yield() ) A generator function, denoted with they keyword "generator" instead of "def" will return a "baton". Generators have the following methods: __call__( args... ) -- This creates a Baton object which is passed back to the caller, i.e. the code that executed the Baton() command. Once the baton starts working, the two sides are symetric. So we will call the first frame, frame A and the code inside 'function' frame B. Frame is is returned a baton object. As soon as frame A calls baton.yield(), frame B begins, i.e. 'function' starts to run. function is passed the baton as its first argument, and any additional arguments are also passed in. When frame B yields, any value that it yields will be returned to frame A as the result of it's yield(). Batons have the following methods: yield( arg=None ) -- This method will save the current execution stat
Re: Revisiting Generators and Subgenerators
On Mar 26, 7:29 am, Sebastien Binet wrote: > > Proposal for a new Generator Syntax in Python 3K-- > > A Baton object for generators to allow subfunction to yield, and to > > make > > them symetric. > > isn't a Baton what CSP calls a channel ? > > there is this interesting PyCSP library (which implements channels > over greenlets, os-processes (via multiprocessing) or python > threads)http://code.google.com/p/pycsp > > cheers, > sebastien. Thanks for the link. After reading about Greenlets, it seems my Baton is a Greenlet. It is not passed in to the new greenlet as I wrote above, but both sides use it to pass execution to the other, and to send a value on switching. I'm glad my thinking is matching other people's thinking. Now I have to search for a greenlet written for Jython. And thanks to others for their thoughts on this subject. -Winston -- http://mail.python.org/mailman/listinfo/python-list
Re: interrupted system call w/ Queue.get
On Feb 18, 10:23 am, Jean-Paul Calderone wrote: > The exception is caused by a syscall returning EINTR. A syscall will > return EINTR when a signal arrives and interrupts whatever that > syscall > was trying to do. Typically a signal won't interrupt the syscall > unless you've installed a signal handler for that signal. However, > you can avoid the interruption by using `signal.siginterrupt` to > disable interruption on that signal after you've installed the > handler. > > As for the other questions - I don't know, it depends how and why it > happens, and whether it prevents your application from working > properly. We did not try "signal.siginterrupt" because we were not installing any signals, perhaps some library code is doing it without us knowing about it. Plus I still don't know what signal was causing the problem. Instead based on Dan Stromberg's reply (http://code.activestate.com/ lists/python-list/595310/) I wrote a drop-in replacement for Queue called RetryQueue which fixes the problem for us: from multiprocessing.queues import Queue import errno def retry_on_eintr(function, *args, **kw): while True: try: return function(*args, **kw) except IOError, e: if e.errno == errno.EINTR: continue else: raise class RetryQueue(Queue): """Queue which will retry if interrupted with EINTR.""" def get(self, block=True, timeout=None): return retry_on_eintr(Queue.get, self, block, timeout) As to whether this is a bug or just our own malignant signal-related settings I'm not sure. Certainly it's not desirable to have your blocking waits interrupted. I did see several EINTR issues in Python but none obviously about Queue exactly: http://bugs.python.org/issue1068268 http://bugs.python.org/issue1628205 http://bugs.python.org/issue10956 -Philip -- http://mail.python.org/mailman/listinfo/python-list
reading argv argument of unittest.main()
I've read that unittest.main() can take an optional argv argument, and that if it is None, it will be assigned sys.argv. Is there a way to pass command line arguments through unittest.main() to the setUp method of a class derived from unittest.TestCase? Thank you in advance. Winston -- http://mail.python.org/mailman/listinfo/python-list
using PyUnit to test with multiple threads
Is it possible to use PyUnit to test with multiple threads? I want to send many commands to a database at the same time. The order of execution of the commands is indeterminate, and therefore, so is the status message returned. For example, say that I send the commands "get" and "delete" for a given record to the database at the same time. If the get executes before the delete, I expect a success message (assuming that the record exists in the database). If the delete executes before the get, I expect a failure message. Is there a way to write tests in PyUnit for this type of situation? Thank you in advance. Winston -- http://mail.python.org/mailman/listinfo/python-list
MakeBot - IDE for learning Python
I have just released a Windows and Macintosh OS X version of MakeBot, an IDE intended for students learning Python. It includes a very nice graphics/video game package based on PyGame. You can read all about it here: http://stratolab.com/misc/makebot/ -Winston -- http://mail.python.org/mailman/listinfo/python-list
Re: Revisiting Generators and Subgenerators
Coroutines achieve very similar things to threads, but avoid problems resulting from the pre-emptive nature of threads. Specifically, a coroutine indicates where it will yield to the other coroutine. This avoids lots of problems related to synchronization. Also the lightweight aspect is apparently important for some simulations when they have many thousands of agents to simulate--this number of threads becomes a problem. -Winston Winston Wolff Stratolab - Games for Learning tel: (646) 827-2242 web: www.stratolab.com On Mar 25, 2010, at 5:23 PM, Cameron Simpson wrote: > > Having quickly read the Abstract and Motivation, why is this any better > than a pair of threads and a pair of Queue objects? (Aside from > co-routines being more lightweight in terms of machine resources?) > > On the flipside, given that generators were recently augumented to > support coroutines I can see your motivation within that framework. > > Cheers, > -- > Cameron Simpson DoD#743 > http://www.cskk.ezoshosting.com/cs/ > > C makes it easy for you to shoot yourself in the foot. C++ makes that > harder, but when you do, it blows away your whole leg. > - Bjarne Stroustrup -- http://mail.python.org/mailman/listinfo/python-list
interrupted system call w/ Queue.get
We have a multiprocess Python program that uses Queue to communicate between processes. Recently we've seen some errors while blocked waiting on Queue.get: IOError: [Errno 4] Interrupted system call What causes the exception? Is it necessary to catch this exception and manually retry the Queue operation? Thanks. We have some Python 2.5 and 2.6 machines that have run this program for many 1,000 hours with no errors. But we have one 2.5 machine and one 2.7 machine that seem to get the error very often. -- http://mail.python.org/mailman/listinfo/python-list
Regarding the error: TypeError: canât pickle _thread.lock objects
Hi, It would be of immense help, if someone could provide a suitable solution or related information that helps to sort out the below stated issue- à I had installed the Python version 3.6.4 à Then I installed the package: Tensorflow à Installed g2p.exe by downloading from GitHub à Then tried running the below command- g2p-seq2seq --interactive --model (model_folder_path: is the path to an English model 2-layer LSTM with 512 hidden units CMU Sphinx dictionary downloaded from the CMU Sphinx website) Following the above procedure, I encountered the following error: TypeError: canât pickle _thread.lock objects-please find the attached screenshot for your reference. Thanks, A.Winston Manuel Vijay This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. This email is sent for the intended recipient(s) only. If by an addressing or transmission error, this mail has been misdirected to you, you are requested to delete this mail immediately. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. Where permitted by applicable law, this e-mail and other e-mail communications sent to and from GSR e-mail addresses may be monitored. -- https://mail.python.org/mailman/listinfo/python-list