multiprocessing pipes with custom pickler
Hi, I need inter-process communication in Python, and was looking at the documentation here: http://docs.python.org/2/library/multiprocessing.html I am using a custom pickler, though, in order to deal with some objects that are not serialize-able through the built-in pickler. Is there any way to tell the pipe's send method to use my pickler? I could also just send my already-pickled binary data using the existing send method, but pickling/unpickling twice seems like a hack. Maybe the send_bytes method would be the best option, if it doesn't pickle the data? thanks for the help, imran -- http://mail.python.org/mailman/listinfo/python-list
collecting variable assignments through settrace
Hi, I'm writing a custom profiler that uses sys.settrace. I was wondering if there was any way of tracing the assignments of variables inside a function as its executed, without looking at locals() at every single line and comparing them to see if anything has changed. Sort of like xdebug's collect_assignments parameter in PHP. thanks, imran -- http://mail.python.org/mailman/listinfo/python-list
settrace doesn't trace builtin functions
Hi, I've been using the settrace function to write a tracer for my program, which is working great except that it doesn't seem to work for built-in functions, like open('filename.txt'). This doesn't seem to be documented, so I'm not sure if I'm doing something wrong or that's the expected behavior. If settrace's behavior in this regard is fixed, is there any way to trace calls to open()? I don't want to use Linux's strace, as it'll run for whole program (not just the part I want) and won't show my python line numbers/file names, etc. The other option I considered was monkey-patching the open function through a wrapper, like: def wrapped_open(*arg,**kw): print 'open called' traceback.print_stack() f = __builtin__.open(*arg,**kw) return f open = wrapped_open but that seemed very brittle to me. Could someone suggest a better way of doing this? thank you, imran -- http://mail.python.org/mailman/listinfo/python-list
UnpicklingError: NEWOBJ class argument isn't a type object
Hi, I'm using a custom pickler that replaces any un-pickleable objects (such as sockets or files) with a string representation of them, based on the code from Shane Hathaway here: http://stackoverflow.com/questions/4080688/python-pickling-a-dict-with-some-unpicklable-items It works most of the time, but when I try to unpickle a Django HttpResponse, I get the following error: UnpicklingError: NEWOBJ class argument isn't a type object I have no clue what the error actually means. If it pickles okay, why should it not be able to unpickle? Any ideas? thanks for the help, imran Here is my code: from cPickle import Pickler, Unpickler, UnpicklingError class FilteredObject: def __init__(self, about): self.about = about def __repr__(self): return 'FilteredObject(%s)' % repr(self.about) class MyPickler(object): def __init__(self, file, protocol=2): pickler = Pickler(file, protocol) pickler.persistent_id = self.persistent_id self.dump = pickler.dump self.clear_memo = pickler.clear_memo def persistent_id(self, obj): if not hasattr(obj, '__getstate__') and not isinstance(obj, (basestring, bool, int, long, float, complex, tuple, list, set, dict)): return ["filtered:%s" % str(obj)] else: return None class MyUnpickler(object): def __init__(self, file): unpickler = Unpickler(file) unpickler.persistent_load = self.persistent_load self.load = unpickler.load self.noload = unpickler.noload def persistent_load(self, obj_id): if obj_id[0].startswith('filtered:'): return FilteredObject(obj_id[0][9:]) else: raise UnpicklingError('Invalid persistent id') ## serialize to file f = open('test.txt','wb') p = MyPickler(f) p.dump(data) f.close() ## unserialize from file f = open('test.txt','rb') pickled_data = f.read() f.seek(0) u = MyUnpickler(f) data = u.load() -- http://mail.python.org/mailman/listinfo/python-list
Re: UnpicklingError: NEWOBJ class argument isn't a type object
On Monday, July 8, 2013 12:45:55 AM UTC-7, Peter Otten wrote: > skunkwerk wrote: > > > > > Hi, > > > I'm using a custom pickler that replaces any un-pickleable objects (such > > > as sockets or files) with a string representation of them, based on the > > > code from Shane Hathaway here: > > > http://stackoverflow.com/questions/4080688/python-pickling-a-dict-with- > > some-unpicklable-items > > > > > > It works most of the time, but when I try to unpickle a Django > > > HttpResponse, I get the following error: UnpicklingError: NEWOBJ class > > > argument isn't a type object > > > > > > I have no clue what the error actually means. If it pickles okay, why > > > should it not be able to unpickle? Any ideas? > > > > A simple way to provoke the error is to rebind the name referring to the > > class of the pickled object: > > > > >>> import cPickle > > >>> class A(object): pass > > ... > > >>> p = cPickle.dumps(A(), -1) > > >>> cPickle.loads(p) > > <__main__.A object at 0x7fce7bb58c50> > > >>> A = 42 > > >>> cPickle.loads(p) > > Traceback (most recent call last): > > File "", line 1, in > > cPickle.UnpicklingError: NEWOBJ class argument isn't a type object > > > > You may be doing something to that effect. Hey Peter, I tried unpickling even from another file with no other code in it, but came up with the same error - so I don't think it's a rebinding issue. But I got the error to disappear when I removed the "hasattr(obj, '__getstate__')" from this line of code in the persistent_id function: if not hasattr(obj, '__getstate__') and isinstance(obj,(basestring, bool, int, long, float, complex, tuple, list, set, dict)): return ["filtered:%s" % type(obj)] When I do that, I get a few more FilteredObjects in the result, for things like: I figured these classes must have __getstate__ methods which leads to them being pickled without a persistent_id (it turns out they actually have __repr__ methods). So these classes get pickled fine, but run into problems when trying to unpickle them. I understand why ImportErrors would happen if the necessary modules haven't been loaded, but this NEWOBJ error is still kind of mystifying. I guess I just won't pickle any classes for now, if unpickling them is going to be dicey. thanks for the help guys, imran -- http://mail.python.org/mailman/listinfo/python-list
automated unit test generation
Hi, I've been working on an open source project to auto-generate unit tests for web apps based on traces collected from the web server and static code analysis. I've got an alpha version online at www.splintera.com, and the source is at https://github.com/splintera/python-django-client. I'd love to get some feedback from the community and extend it to work with other languages as well. I wrote it originally because I was sick of coming into companies where I had to inherit tens of thousands of lines of code without any tests, and never had time to write them manually - being careful to mock out dependencies, specify the correct inputs and outputs, and figure out which path it was taking through the code. I'd like to get some sense of: - how difficult/tedious is writing unit tests, and why? - do you wish you had better code coverage? - how important is testing to you? thanks, imran -- https://mail.python.org/mailman/listinfo/python-list
subprocess.popen function with quotes
Hi, i'm trying to call subprocess.popen on the 'rename' function in linux. When I run the command from the shell, like so: rename -vn 's/\.htm$/\.html/' *.htm it works fine... however when I try to do it in python like so: p = subprocess.Popen(["rename","-vn","'s/\.htm$/ \.html/'","*.htm"],stdout=subprocess.PIPE,stderr=subprocess.PIPE) print p.communicate()[0] nothing gets printed out (even for p.communicate()[1]) I think the problem is the quoted string the rename command wants - when I put it in triple quotes like """s/\.htm$/\.html/""" I get some output, but not the correct output. I've also tried escaping the single quotes with \' and putting it in regular double quotes but that didn't work either. i'd appreciate any help -- http://mail.python.org/mailman/listinfo/python-list
Re: subprocess.popen function with quotes
also, i've tried the Shell=True parameter for Popen, but that didn't seem to make a difference On Mar 25, 8:31 pm, skunkwerk <[EMAIL PROTECTED]> wrote: > Hi, >i'm trying to call subprocess.popen on the 'rename' function in > linux. When I run the command from the shell, like so: > > rename -vn 's/\.htm$/\.html/' *.htm > > it works fine... however when I try to do it in python like so: > p = subprocess.Popen(["rename","-vn","'s/\.htm$/ > \.html/'","*.htm"],stdout=subprocess.PIPE,stderr=subprocess.PIPE) > > print p.communicate()[0] > > nothing gets printed out (even for p.communicate()[1]) > > I think the problem is the quoted string the rename command wants - > when I put it in triple quotes like """s/\.htm$/\.html/""" I get some > output, but not the correct output. I've also tried escaping the > single quotes with \' and putting it in regular double quotes but that > didn't work either. > > i'd appreciate any help -- http://mail.python.org/mailman/listinfo/python-list
Re: subprocess.popen function with quotes
On Mar 25, 9:25 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Wed, 26 Mar 2008 00:39:05 -0300, skunkwerk <[EMAIL PROTECTED]> > escribió: > > >> i'm trying to call subprocess.popen on the 'rename' function in > >> linux. When I run the command from the shell, like so: > > >> rename -vn 's/\.htm$/\.html/' *.htm > > >> it works fine... however when I try to do it in python like so: > >> p = subprocess.Popen(["rename","-vn","'s/\.htm$/ > >> \.html/'","*.htm"],stdout=subprocess.PIPE,stderr=subprocess.PIPE) > > >> print p.communicate()[0] > > >> nothing gets printed out (even for p.communicate()[1]) > > I'd try with: > > p = subprocess.Popen(["rename", "-vn", r"'s/\.htm$/\.html/'", "*.htm"], > stdout=subprocess.PIPE, stderr=subprocess.PIPE, > shell=True) > > (note that I added shell=True and I'm using a raw string to specify the > reg.expr.) > > -- > Gabriel Genellina Thanks Gabriel, I tried the new command and one with the raw string and single quotes, but it is still giving me the same results (no output). any other suggestions? cheers -- http://mail.python.org/mailman/listinfo/python-list
Re: subprocess.popen function with quotes
On Mar 25, 11:04 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Wed, 26 Mar 2008 02:15:28 -0300, skunkwerk <[EMAIL PROTECTED]> > escribió: > > > > > On Mar 25, 9:25 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> > > wrote: > >> En Wed, 26 Mar 2008 00:39:05 -0300, skunkwerk <[EMAIL PROTECTED]> > >> escribió: > > >> >> i'm trying to call subprocess.popen on the 'rename' function in > >> >> linux. When I run the command from the shell, like so: > > >> >> rename -vn 's/\.htm$/\.html/' *.htm > > >> >> it works fine... however when I try to do it in python like so: > >> >> p = subprocess.Popen(["rename","-vn","'s/\.htm$/ > >> >> \.html/'","*.htm"],stdout=subprocess.PIPE,stderr=subprocess.PIPE) > > >> >> print p.communicate()[0] > > >> >> nothing gets printed out (even for p.communicate()[1]) > > >> I'd try with: > > >> p = subprocess.Popen(["rename", "-vn", r"'s/\.htm$/\.html/'", "*.htm"], > >> stdout=subprocess.PIPE, stderr=subprocess.PIPE, > >> shell=True) > > >> (note that I added shell=True and I'm using a raw string to specify the > >> reg.expr.) > > > Thanks Gabriel, > > I tried the new command and one with the raw string and single > > quotes, but it is still giving me the same results (no output). any > > other suggestions? > > My next try would be without the single quotes... > > -- > Gabriel Genellina thanks for the input guys, I've tried the suggestions but can't get it to work. I have a file named test.htm in my directory, and when I run the following command: rename -vn 's/(.*)\.htm$/model.html/' *.htm from the shell in that directory I get the following output: test.htm renamed as model.html now my python script is called test.py, is located in the same directory, and is called from the shell with 'python test.py' the contents of test.py: import subprocess p = subprocess.Popen(['rename','-vn','s/(.*)\.htm$/ model.html/','*.htm'],stdout=subprocess.PIPE,stderr=subprocess.PIPE) print p.communicate()[0] i change to print p.communicate()[1] in case the output is blank the first time this is the output: *.htm renamed as model.html when I add shell=True to the subprocess command, I get the following output: Usage: rename [-v] [-n] [-f] perlexpr [filenames] am i doing something wrong? -- http://mail.python.org/mailman/listinfo/python-list
Re: subprocess.popen function with quotes
On Mar 26, 6:44 am, skunkwerk <[EMAIL PROTECTED]> wrote: > On Mar 25, 11:04 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> > wrote: > > > > > En Wed, 26 Mar 2008 02:15:28 -0300, skunkwerk <[EMAIL PROTECTED]> > > escribió: > > > > On Mar 25, 9:25 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> > > > wrote: > > >> En Wed, 26 Mar 2008 00:39:05 -0300, skunkwerk <[EMAIL PROTECTED]> > > >> escribió: > > > >> >> i'm trying to call subprocess.popen on the 'rename' function in > > >> >> linux. When I run the command from the shell, like so: > > > >> >> rename -vn 's/\.htm$/\.html/' *.htm > > > >> >> it works fine... however when I try to do it in python like so: > > >> >> p = subprocess.Popen(["rename","-vn","'s/\.htm$/ > > >> >> \.html/'","*.htm"],stdout=subprocess.PIPE,stderr=subprocess.PIPE) > > > >> >> print p.communicate()[0] > > > >> >> nothing gets printed out (even for p.communicate()[1]) > > > >> I'd try with: > > > >> p = subprocess.Popen(["rename", "-vn", r"'s/\.htm$/\.html/'", "*.htm"], > > >> stdout=subprocess.PIPE, stderr=subprocess.PIPE, > > >> shell=True) > > > >> (note that I added shell=True and I'm using a raw string to specify the > > >> reg.expr.) > > > > Thanks Gabriel, > > > I tried the new command and one with the raw string and single > > > quotes, but it is still giving me the same results (no output). any > > > other suggestions? > > > My next try would be without the single quotes... > > > -- > > Gabriel Genellina > > thanks for the input guys, > I've tried the suggestions but can't get it to work. I have a file > named test.htm in my directory, and when I run the following command: > > rename -vn 's/(.*)\.htm$/model.html/' *.htm > > from the shell in that directory I get the following output: > test.htm renamed as model.html > > now my python script is called test.py, is located in the same > directory, and is called from the shell with 'python test.py' > the contents of test.py: > import subprocess > > p = subprocess.Popen(['rename','-vn','s/(.*)\.htm$/ > model.html/','*.htm'],stdout=subprocess.PIPE,stderr=subprocess.PIPE) > print p.communicate()[0] > > i change to print p.communicate()[1] in case the output is blank the > first time > > this is the output: > *.htm renamed as model.html > > when I add shell=True to the subprocess command, I get the following > output: > Usage: rename [-v] [-n] [-f] perlexpr [filenames] > > am i doing something wrong? in addition, when I use Popen without any quotes, or without quotes for the regular expression, I get an exception. I'm running ubuntu linux 7.10 with python 2.5.1 thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: subprocess.popen function with quotes
On Mar 26, 8:05 am, Jeffrey Froman <[EMAIL PROTECTED]> wrote: > skunkwerk wrote: > > p = subprocess.Popen(['rename','-vn','s/(.*)\.htm$/ > > model.html/','*.htm'],stdout=subprocess.PIPE,stderr=subprocess.PIPE) > > print p.communicate()[0] > > > i change to print p.communicate()[1] in case the output is blank the > > first time > > > this is the output: > > *.htm renamed as model.html > > Without shell=True, your glob characters will not be expanded. Hence, the > command looks for a file actually named "*.htm" > > > when I add shell=True to the subprocess command, I get the following > > output: > > Usage: rename [-v] [-n] [-f] perlexpr [filenames] > > Here the use of the shell may be confounding the arguments passed. Your > command will probably work better if you avoid using shell=True. However, > you will need to perform your own globbing: > > # Untested (no perl-rename here): > > command = ['rename','-vn', 's/(.*)\.htm$/model.html/'] > files = glob.glob('*.htm') > command.extend(files) > p = subprocess.Popen( > command, > stdout=subprocess.PIPE, > stderr=subprocess.PIPE, > ) > > Jeffrey thanks Jeffrey, that worked like a charm! -- http://mail.python.org/mailman/listinfo/python-list
Re: threading - race condition?
On May 11, 1:55 pm, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote: > On Sun, 11 May 2008 09:16:25 -0700 (PDT),skunkwerk > <[EMAIL PROTECTED]> declaimed the following in comp.lang.python: > > > > > the only issue i have now is that it takes a long time for 100 threads > > to initialize that connection (>5 minutes) - and as i'm doing this on > > a webserver any time i update the code i have to restart all those > > threads, which i'm doing right now in a for loop. is there any way I > > can keep the thread stuff separate from the rest of the code for this > > file, yet allow access? It wouldn't help having a .pyc or using > > psycho, correct, as the time is being spent in the runtime? something > > along the lines of 'start a new thread every minute until you get to a > > 100' without blocking the execution of the rest of the code in that > > file? or maybe any time i need to do a search, start a new thread if > > the #threads is <100? > > Is this running as part of the server process, or as a client > accessing the server? > > Alternative question: Have you tried measuring the performance using > /fewer/ threads... 25 or less? I believe I'd mentioned prior that you > seem to have a lot of overhead code for what may be a short query. > > If the .get_item() code is doing a full sequence of: connect to > database; format&submit query; fetch results; disconnect from > database... I'd recommend putting the connect/disconnect outside of the > thread while loop (though you may then need to put sentinel values into > the feed queue -- one per thread -- so they can cleanly exit and > disconnect rather than relying on daemonization for exit). > > thread: > dbcon = ... > while True: > query = Q.get() > if query == SENTINEL: break > result = get_item(dbcon, query) > ... > dbcon.close() > > Third alternative: Find some way to combine the database queries. > Rather than 100 threads each doing a single lookup (from your code, it > appears that only 1 result is expected per search term), run 10 threads > each looking up 10 items at once... > > thread: > dbcon = ... > terms = [] > terminate = False > while not terminate: > while len(terms) < 10: > query = Q.get_nowait() > if not query: break > if query == SENTINEL: > terminate = True > break > terms.append(query) > results = get_item(dbcon, terms) > terms = [] > #however you are returning items; match the query term to the > #key item in the list of returned data? > dbcon.close() > > where the final select statement looks something like: > > SQL = """select key, title, scraped from *** > where key in ( %s )""" % ", ".join("?" for x in terms) > #assumes database adapter uses ? for placeholder > dbcur.execute(SQL, terms) > -- > WulfraedDennis Lee Bieber KD6MOG > [EMAIL PROTECTED] [EMAIL PROTECTED] > HTTP://wlfraed.home.netcom.com/ > (Bestiaria Support Staff: [EMAIL PROTECTED]) > HTTP://www.bestiaria.com/ thanks again Dennis, i chose 100 threads so i could do 10 simultaneous searches (where each search contains 10 terms - using 10 threads). the .get_item() code is not doing the database connection - rather the intialization is done in the initialization of each thread. so basically once a thread starts the database connection is persistent and .get_item queries are very fast. this is running as a server process (using django). cheers -- http://mail.python.org/mailman/listinfo/python-list
Re: threading - race condition?
On May 12, 1:40 am, Rhamphoryncus <[EMAIL PROTECTED]> wrote: > On May 11, 10:16 am,skunkwerk<[EMAIL PROTECTED]> wrote: > > > > > On May 10, 1:31 pm, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote: > > > > On Fri, 9 May 2008 08:40:38 -0700 (PDT),skunkwerk<[EMAIL PROTECTED]> > > > declaimed the following in comp.lang.python: > > > > Coming in late... > > > > > On May 9, 12:12 am, John Nagle <[EMAIL PROTECTED]> wrote: > > > > >skunkwerkwrote: > > > > > > i've declared a bunch of workerthreads(100) and a queue into which > > > > > > new requests are inserted, like so: > > > > > > > > > > > queue = Queue.Queue(0) > > > > > > WORKERS=100 > > > > > > for i in range(WORKERS): > > > > > >thread = SDBThread(queue) > > > > > >thread.setDaemon(True) > > > > > >thread.start() > > > > > > > the thread: > > > > > > > class SimpleDBThread ( threading.Thread ): > > > > > >def __init__ ( self, queue ): > > > > > >self.__queue = queue > > > > Note: double-leading __ means "name mangling" -- typically only > > > needed when doing multiple layers of inheritance where different parents > > > have similar named items that need to be kept independent; a single _ is > > > the convention for "don't touch me unless you know what you are doing" > > > > > > >threading.Thread.__init__ ( self ) > > > > > >def run ( self ): > > > > > >while 1: > > > > > >item = self.__queue.get() > > > > > >if item!=None: > > > > > >model = domain.get_item(item[0]) > > > > > >logger.debug('sdbthread item:'+item[0]) > > > > > >title = model['title'] > > > > > >scraped = model['scraped'] > > > > > >logger.debug("sdbthread title:"+title) > > > > > > > any suggestions? > > > > > > thanks > > > > > > > > > thanks John, Gabriel, > > > > here's the 'put' side of the requests: > > > > > def prepSDBSearch(results): > > > >modelList = [0] > > > >counter=1 > > > >for result in results: > > > >data = [result.item, counter, modelList] > > > >queue.put(data) > > > >counter+=1 > > > >while modelList[0] < len(results): > > > >print 'waiting...'#wait for them to come home > > > >modelList.pop(0)#now remove '0' > > > >return modelList > > > > My suggestion, if you really want diagnostic help -- follow the > > > common recommendation of posting the minimal /runable (if erroneous)/ > > > code... If "domain.get_item()" is some sort of RDBM access, you might > > > fake it using a pre-loaded dictionary -- anything that allows it to > > > return something when given the key value. > > > > > responses to your follow ups: > > > > 1) 'item' in thethreadsis a list that corresponds to the 'data' > > > > list in the above function. it's not global, and the initial values > > > > seem ok, but i'm not sure if every time i pass in data to the queue it > > > > passes in the same memory address or declares a new 'data' list (which > > > > I guess is what I want) > > > > Rather confusing usage... In your "put" you have a list whose > > > first > > > element is "result.item", but then in the work thread, you refer to the > > > entire list as "item" > > > > > 3) the first item in the modelList is a counter that keeps track of > > > > the number ofthreadsfor this call that have completed - is there any > > > > better way of doing this? > > > > Where? None of your posted code shows either "counter" or > > > modelList > > > being used by thethreads. > > > > And yes, if you havethreadstrying to update a shared mutable, y
Re: threading - race condition?
On May 11, 9:10 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Sun, 11 May 2008 13:16:25 -0300,skunkwerk<[EMAIL PROTECTED]> escribió: > > > the only issue i have now is that it takes a long time for 100 threads > > to initialize that connection (>5 minutes) - and as i'm doing this on > > a webserver any time i update the code i have to restart all those > > threads, which i'm doing right now in a for loop. is there any way I > > can keep the thread stuff separate from the rest of the code for this > > file, yet allow access? > > Like using a separate thread to create the other 100? > > -- > Gabriel Genellina thanks Gabriel, i think that could do it - let me try it out. don't know why i didn't think of it earlier. -- http://mail.python.org/mailman/listinfo/python-list
lots of futex_wait calls
I've got a python program written for the django web framework that starts about 100 threads. When I start the server, it sometimes eats up 100% of the CPU for a good minute or so... though none of the threads are CPU-intensive doing a strace on the program, i found lots of calls like this: select(5, [4], [], [], {1, 0}) = 0 (Timeout) futex(0x86a3ce0, FUTEX_WAIT, 0, NULL) = 0 i've read the man page for futex... but is this normal? thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: lots of futex_wait calls
On Jun 6, 10:03 am, André Malo <[EMAIL PROTECTED]> wrote: > skunkwerkwrote: > > I've got a python program written for the django web framework that > > starts about 100 threads. When I start the server, it sometimes eats > > up 100% of the CPU for a good minute or so... though none of the > > threads are CPU-intensive > > > doing a strace on the program, i found lots of calls like this: > > > select(5, [4], [], [], {1, 0}) = 0 (Timeout) > > futex(0x86a3ce0, FUTEX_WAIT, 0, NULL) = 0 > > > i've read the man page for futex... but is this normal? > > More or less. Most of the futex calls (if not all) are grabbing or releasing > the global interpreter lock (GIL). > > It's usually helpful to increase the thread-schedule-checkinterval in order > to lessen the system load (especially the number of context switches). See > sys.setcheckinterval. > > nd I've set the checkinterval to 200, and it seems to be ok... but after one or two days, the python processes will start hogging 100% of the CPU and bring the system to a crawl. I ran strace again, and all of the calls are: select(5, [4],[],[],{1,0}) = 0 (Timeout) futex(0x877d0c8, FUTEX_WAIT, 0 NULL) = 0 futex(0x877d0c8, FUTEX_WAKE,1) = 0 is there any way to find out what's causing this? would you need to look at my threading code? thanks, imran -- http://mail.python.org/mailman/listinfo/python-list
popen pipe limit
I'm getting errors when reading from/writing to pipes that are fairly large in size. To bypass this, I wanted to redirect output to a file in the subprocess.Popen function, but couldn't get it to work (even after setting Shell=True). I tried adding ">","temp.sql" after the password field but mysqldump gave me an error. the code: p1 = subprocess.Popen(["mysqldump","--all-databases","--user=user","-- password=password"], shell=True) p2 = subprocess.Popen(["gzip","-9"], stdin=p1.stdout) output = p2.communicate()[0] file=open('test.sql.gz','w') file.write(str(output)) file.close() the output: gzip: compressed data not written to a terminal. Use -f to force compression. For help, type: gzip -h mysqldump: Got errno 32 on write I'm using python rather than a shell script for this because I need to upload the resulting file to a server as soon as it's done. thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: popen pipe limit
On Apr 7, 6:17 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Mon, 07 Apr 2008 20:52:54 -0300,skunkwerk<[EMAIL PROTECTED]> > escribió: > > > I'm getting errors when reading from/writing to pipes that are fairly > > large in size. To bypass this, I wanted to redirect output to a file > > in the subprocess.Popen function, but couldn't get it to work (even > > after setting Shell=True). I tried adding ">","temp.sql" after the > > password field but mysqldump gave me an error. > > > the code: > > p1 = subprocess.Popen(["mysqldump","--all-databases","--user=user","-- > > password=password"], shell=True) > > p2 = subprocess.Popen(["gzip","-9"], stdin=p1.stdout) > > output = p2.communicate()[0] > > file=open('test.sql.gz','w') > > file.write(str(output)) > > file.close() > > You need a pipe to chain subprocesses: > > import subprocess > p1 = > subprocess.Popen(["mysqldump","--all-databases","--user=user","--password=password"], > > stdout=subprocess.PIPE) > ofile = open("test.sql.gz", "wb") > p2 = subprocess.Popen(["gzip","-9"], stdin=p1.stdout, stdout=ofile) > p1.wait() > p2.wait() > ofile.close() > > If you don't want the final file on disk: > > p1 = > subprocess.Popen(["mysqldump","--all-databases","--user=user","--password=password"], > > stdout=subprocess.PIPE) > p2 = subprocess.Popen(["gzip","-9"], stdin=p1.stdout, > stdout=subprocess.PIPE) > while True: > chunk = p2.stdout.read(4192) > if not chunk: break > # do something with read chunk > > p1.wait() > p2.wait() > > -- > Gabriel Genellina thanks Gabriel - tried the first one and it worked great! -- http://mail.python.org/mailman/listinfo/python-list
Re: subprocess.popen function with quotes
On Mar 26, 10:33 pm, skunkwerk <[EMAIL PROTECTED]> wrote: > On Mar 26, 8:05 am, Jeffrey Froman <[EMAIL PROTECTED]> wrote: > > > > >skunkwerkwrote: > > > p = subprocess.Popen(['rename','-vn','s/(.*)\.htm$/ > > > model.html/','*.htm'],stdout=subprocess.PIPE,stderr=subprocess.PIPE) > > > print p.communicate()[0] > > > > i change to print p.communicate()[1] in case the output is blank the > > > first time > > > > this is the output: > > > *.htm renamed as model.html > > > Without shell=True, your glob characters will not be expanded. Hence, the > > command looks for a file actually named "*.htm" > > > > when I add shell=True to the subprocess command, I get the following > > > output: > > > Usage: rename [-v] [-n] [-f] perlexpr [filenames] > > > Here the use of the shell may be confounding the arguments passed. Your > > command will probably work better if you avoid using shell=True. However, > > you will need to perform your own globbing: > > > # Untested (no perl-rename here): > > > command = ['rename','-vn', 's/(.*)\.htm$/model.html/'] > > files = glob.glob('*.htm') > > command.extend(files) > > p = subprocess.Popen( > > command, > > stdout=subprocess.PIPE, > > stderr=subprocess.PIPE, > > ) > > > Jeffrey > > thanks Jeffrey, that worked like a charm! I'm trying to detect when the subprocess has terminated using the wait() function - but when there is an error with the call to rename (ie the file doesn't exist) rename (when run from the command line just terminates and displays the error). In the code above, though, my call to p.wait() just hangs when rename should throw an error... I've tried adding shell=True but that stops the rename from working. any ideas? thanks -- http://mail.python.org/mailman/listinfo/python-list
logger output
i'm redirecting the stdout & stderr of my python program to a log. Tests i've done on a simple program with print statements, etc. work fine. however, in my actual program i get weird output like this: 2008-05-04 20:20:44,790 DEBUG Grabbing message from queue, if any 2008-05-04 20:20:44,790 DEBUG DEBUG:doit:Grabbing message from queue, if any 2008-05-04 20:20:44,790 DEBUG DEBUG:doit:DEBUG:doit:Grabbing message from queue, if any 2008-05-04 20:20:44,790 DEBUG DEBUG:doit:DEBUG:doit:DEBUG:doit:Grabbing message from queue, if any followed by: 2008-05-04 20:20:44,815 DEBUG DEBUG:doit:Traceback (most recent call last): 2008-05-04 20:20:44,815 DEBUG DEBUG:doit:DEBUG:doit:Traceback (most recent call last): 2008-05-04 20:20:44,815 DEBUG DEBUG:doit:DEBUG:doit:DEBUG:doit:Traceback (most recent call last): the code I'm using for the log stuff: import logging logger = logging.getLogger('doit') hdlr = logging.FileHandler('/home/imran/doit.log') formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s') hdlr.setFormatter(formatter) logger.addHandler(hdlr) logger.setLevel(logging.DEBUG) class write2Log: def write(self, x): if x!='\n': logger.debug(str(x)) sys.stdout = write2Log() sys.stderr= write2Log() any ideas what might be causing the problems? some of the messages being output are quite long - might this be a problem? thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: logger output
On May 4, 10:40 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Mon, 05 May 2008 00:33:12 -0300,skunkwerk<[EMAIL PROTECTED]> escribió: > > > > > i'm redirecting the stdout & stderr of my python program to a log. > > Tests i've done on a simple program with print statements, etc. work > > fine. however, in my actual program i get weird output like this: > > > 2008-05-04 20:20:44,790 DEBUG Grabbing message from queue, if any > > 2008-05-04 20:20:44,790 DEBUG DEBUG:doit:Grabbing message from queue, > > if any > > 2008-05-04 20:20:44,790 DEBUG DEBUG:doit:DEBUG:doit:Grabbing message > > from queue, if any > > 2008-05-04 20:20:44,790 DEBUG > > DEBUG:doit:DEBUG:doit:DEBUG:doit:Grabbing message from queue, if any > > > class write2Log: > >def write(self, x): > >if x!='\n': > >logger.debug(str(x)) > > > any ideas what might be causing the problems? some of the messages > > being output are quite long - might this be a problem? > > Try this simplified example and see by yourself: > > import sys > > class Write2Log: > def write(self, x): > sys.__stdout__.write('[%s]' % x) > > sys.stdout = Write2Log() > > print "Hello world!" > age = 27 > name = "John" > print "My name is", name, "and I am", age, "years old." > > -- > Gabriel Genellina thanks Gabriel, i tried the code you sent and got output like the following: [My name is][][john][][and I am][][27][][years old.] it doesn't really help me though. does this have any advantages over the syntax i was using? are there any limits on what kind of objects the logger can write? ie ascii strings of any length? thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: logger output
On May 5, 3:44 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote: > En Mon, 05 May 2008 13:02:12 -0300,skunkwerk<[EMAIL PROTECTED]> escribió: > > > > > On May 4, 10:40 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> > > wrote: > >> En Mon, 05 May 2008 00:33:12 -0300,skunkwerk<[EMAIL PROTECTED]> escribió: > > >> > i'm redirecting the stdout & stderr of my python program to a log. > >> > Tests i've done on a simple program with print statements, etc. work > >> > fine. however, in my actual program i get weird output like this: > > >> > 2008-05-04 20:20:44,790 DEBUG Grabbing message from queue, if any > >> > 2008-05-04 20:20:44,790 DEBUG DEBUG:doit:Grabbing message from queue, > >> > if any > >> > 2008-05-04 20:20:44,790 DEBUG DEBUG:doit:DEBUG:doit:Grabbing message > > >> Try this simplified example and see by yourself: > > >> import sys > > >> class Write2Log: > >> def write(self, x): > >> sys.__stdout__.write('[%s]' % x) > > >> sys.stdout = Write2Log() > > >> print "Hello world!" > >> age = 27 > >> name = "John" > >> print "My name is", name, "and I am", age, "years old." > > > thanks Gabriel, > > i tried the code you sent and got output like the following: > > [My name is][][john][][and I am][][27][][years old.] > > > it doesn't really help me though. does this have any advantages over > > the syntax i was using? > > are there any limits on what kind of objects the logger can write? ie > > ascii strings of any length? > > The example doesn't use any logger, so loggers aren't the problem here, ok? > > The write function above puts square brackets [] around anything it receives. > This way you can see exactly how write() is called: once per *item* in the > print statement, plus once per comma used (with an space character that you > didn't copy correctly). > > Back to your original code, you have to call logger.debug with a *line* of > text, but you are calling it with many small pieces - that's the problem. > Accumulate output until you see a '\n' - then join all the pieces into a > single, complete line and finally call logger.debug > > -- > Gabriel Genellina thanks Gabriel, i wrote the function below, but am now getting an "Error in sys.exitfunc:" error (which disappears when i comment out the last two lines below): class write2Log: def write(self, x): if x!=',':#ignore if a comma if str(x).count('\n')==0: buffer += str(x) else: list = str(x).split('\n') logger.debug(buffer) buffer = "" for text in list: logger.debug(text) sys.stdout = write2Log() sys.stderr= write2Log() any ideas what might be wrong? thanks again -- http://mail.python.org/mailman/listinfo/python-list
threading - race condition?
i'm getting the wrong output for the 'title' attributes for this data. the queue holds a data structure (item name, position, and list to store results in). each thread takes in an item name and queries a database for various attributes. from the debug statements the item names are being retrieved correctly, but the attributes returned are those of other items in the queue - not its own item. however, the model variable is not a global variable... so i'm not sure what's wrong. i've declared a bunch of worker threads (100) and a queue into which new requests are inserted, like so: queue = Queue.Queue(0) WORKERS=100 for i in range(WORKERS): thread = SDBThread(queue) thread.setDaemon(True) thread.start() the thread: class SimpleDBThread ( threading.Thread ): def __init__ ( self, queue ): self.__queue = queue threading.Thread.__init__ ( self ) def run ( self ): while 1: item = self.__queue.get() if item!=None: model = domain.get_item(item[0]) logger.debug('sdbthread item:'+item[0]) title = model['title'] scraped = model['scraped'] logger.debug("sdbthread title:"+title) any suggestions? thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: threading - race condition?
On May 8, 4:54 pm, [EMAIL PROTECTED] wrote: > On May 8, 5:45 pm, skunkwerk <[EMAIL PROTECTED]> wrote: > > > > > i'm getting the wrong output for the 'title' attributes for this > > data. the queue holds a data structure (item name, position, and list > > to store results in). each thread takes in an item name and queries a > > database for various attributes. from the debug statements the item > > names are being retrieved correctly, but the attributes returned are > > those of other items in the queue - not its own item. however, the > > model variable is not a global variable... so i'm not sure what's > > wrong. > > > i've declared a bunch of workerthreads(100) and a queue into which > > new requests are inserted, like so: > > > queue = Queue.Queue(0) > > WORKERS=100 > > for i in range(WORKERS): > > thread = SDBThread(queue) > > thread.setDaemon(True) > > thread.start() > > > the thread: > > > class SimpleDBThread ( threading.Thread ): > >def __init__ ( self, queue ): > > self.__queue = queue > > threading.Thread.__init__ ( self ) > >def run ( self ): > > while 1: > > item = self.__queue.get() > > if item!=None: > > model = domain.get_item(item[0]) > > logger.debug('sdbthread item:'+item[0]) > > title = model['title'] > > scraped = model['scraped'] > > logger.debug("sdbthread title:"+title) > > > any suggestions? > > thanks > > I'll base this on terminology: if a model is in a brain (throughout > the known universe), and a dollar's a dollar, it may not pay to build > a computer out of brains. > > If man arises as a tool-carrier, we will carry tools, not people. > Don't use Python to make people; make money, and not too much. Pick a > wage and you might get somewhere. excuse me? -- http://mail.python.org/mailman/listinfo/python-list
Re: threading - race condition?
On May 9, 12:12 am, John Nagle <[EMAIL PROTECTED]> wrote: > skunkwerk wrote: > > i'm getting the wrong output for the 'title' attributes for this > > data. the queue holds a data structure (item name, position, and list > > to store results in). each thread takes in an item name and queries a > > database for various attributes. from the debug statements the item > > names are being retrieved correctly, but the attributes returned are > > those of other items in the queue - not its own item. however, the > > model variable is not a global variable... so i'm not sure what's > > wrong. > > > i've declared a bunch of workerthreads(100) and a queue into which > > new requests are inserted, like so: > > > queue = Queue.Queue(0) > > WORKERS=100 > > for i in range(WORKERS): > > thread = SDBThread(queue) > > thread.setDaemon(True) > > thread.start() > > > the thread: > > > class SimpleDBThread ( threading.Thread ): > > def __init__ ( self, queue ): > > self.__queue = queue > > threading.Thread.__init__ ( self ) > > def run ( self ): > > while 1: > > item = self.__queue.get() > > if item!=None: > > model = domain.get_item(item[0]) > > logger.debug('sdbthread item:'+item[0]) > > title = model['title'] > > scraped = model['scraped'] > > logger.debug("sdbthread title:"+title) > > > any suggestions? > > thanks > > Hm. We don't have enough code here to see what's wrong. > For one thing, we're not seeing how items get put on the queue. The > trouble might be at the "put" end. > > Make sure that "model", "item", "title", and "scraped" are not globals. > Remember, any assignment to them in a global context makes them a global. > > You should never get "None" from the queue unless you put a "None" > on the queue. "get()" blocks until there's work to do. > > John Nagle thanks John, Gabriel, here's the 'put' side of the requests: def prepSDBSearch(results): modelList = [0] counter=1 for result in results: data = [result.item, counter, modelList] queue.put(data) counter+=1 while modelList[0] < len(results): print 'waiting...'#wait for them to come home modelList.pop(0)#now remove '0' return modelList responses to your follow ups: 1) 'item' in the threads is a list that corresponds to the 'data' list in the above function. it's not global, and the initial values seem ok, but i'm not sure if every time i pass in data to the queue it passes in the same memory address or declares a new 'data' list (which I guess is what I want) 2) john, i don't think any of the variables you mentioned are global. the 'none' check was just for extra safety. 3) the first item in the modelList is a counter that keeps track of the number of threads for this call that have completed - is there any better way of doing this? thanks again -- http://mail.python.org/mailman/listinfo/python-list
Re: threading - race condition?
On May 10, 1:31 pm, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote: > On Fri, 9 May 2008 08:40:38 -0700 (PDT),skunkwerk<[EMAIL PROTECTED]> > declaimed the following in comp.lang.python: > > Coming in late... > > > On May 9, 12:12 am, John Nagle <[EMAIL PROTECTED]> wrote: > > >skunkwerkwrote: > > > > i've declared a bunch of workerthreads(100) and a queue into which > > > > new requests are inserted, like so: > > > > > > > > > queue = Queue.Queue(0) > > > > WORKERS=100 > > > > for i in range(WORKERS): > > > > thread = SDBThread(queue) > > > > thread.setDaemon(True) > > > > thread.start() > > > > > the thread: > > > > > class SimpleDBThread ( threading.Thread ): > > > > def __init__ ( self, queue ): > > > > self.__queue = queue > > Note: double-leading __ means "name mangling" -- typically only > needed when doing multiple layers of inheritance where different parents > have similar named items that need to be kept independent; a single _ is > the convention for "don't touch me unless you know what you are doing" > > > > > threading.Thread.__init__ ( self ) > > > > def run ( self ): > > > > while 1: > > > > item = self.__queue.get() > > > > if item!=None: > > > > model = domain.get_item(item[0]) > > > > logger.debug('sdbthread item:'+item[0]) > > > > title = model['title'] > > > > scraped = model['scraped'] > > > > logger.debug("sdbthread title:"+title) > > > > > any suggestions? > > > > thanks > > > > > thanks John, Gabriel, > > here's the 'put' side of the requests: > > > def prepSDBSearch(results): > > modelList = [0] > > counter=1 > > for result in results: > > data = [result.item, counter, modelList] > > queue.put(data) > > counter+=1 > > while modelList[0] < len(results): > > print 'waiting...'#wait for them to come home > > modelList.pop(0)#now remove '0' > > return modelList > > My suggestion, if you really want diagnostic help -- follow the > common recommendation of posting the minimal /runable (if erroneous)/ > code... If "domain.get_item()" is some sort of RDBM access, you might > fake it using a pre-loaded dictionary -- anything that allows it to > return something when given the key value. > > > responses to your follow ups: > > 1) 'item' in thethreadsis a list that corresponds to the 'data' > > list in the above function. it's not global, and the initial values > > seem ok, but i'm not sure if every time i pass in data to the queue it > > passes in the same memory address or declares a new 'data' list (which > > I guess is what I want) > > Rather confusing usage... In your "put" you have a list whose first > element is "result.item", but then in the work thread, you refer to the > entire list as "item" > > > 3) the first item in the modelList is a counter that keeps track of > > the number ofthreadsfor this call that have completed - is there any > > better way of doing this? > > Where? None of your posted code shows either "counter" or modelList > being used by thethreads. > > And yes, if you havethreadstrying to update a shared mutable, you > have a race condition. > > You also have a problem if you are using "counter" to define where > in modelList a thread is supposed to store its results -- as you can not > access an element that doesn't already exist... > > a = [0] > a[3] = 1 #failure, need to create elements 1, 2, 3 first > > Now, if position is irrelevant, and a thread just appends its > results to modelList, then you don't need some counter, all you need is > to check the length of modelList against the count expected. > > Overall -- even though you are passing things via the queue, the > contents being pass via the queue are being treated as if they were > global entities (you could make modelList a global, remove it from the > queue entries, and have the same net access).