from:"skunkwerk"

multiprocessing pipes with custom pickler

2013-06-18 Thread skunkwerk

Hi,
   I need inter-process communication in Python, and was looking at the 
documentation here:
http://docs.python.org/2/library/multiprocessing.html

I am using a custom pickler, though, in order to deal with some objects that 
are not serialize-able through the built-in pickler.  Is there any way to tell 
the pipe's send method to use my pickler?

I could also just send my already-pickled binary data using the existing send 
method, but pickling/unpickling twice seems like a hack.

Maybe the send_bytes method would be the best option, if it doesn't pickle the 
data?

thanks for the help,
imran
-- 
http://mail.python.org/mailman/listinfo/python-list

collecting variable assignments through settrace

2013-06-18 Thread skunkwerk

Hi,
  I'm writing a custom profiler that uses sys.settrace.  I was wondering if 
there was any way of tracing the assignments of variables inside a function as 
its executed, without looking at locals() at every single line and comparing 
them to see if anything has changed.

Sort of like xdebug's collect_assignments parameter in PHP.

thanks,
imran
-- 
http://mail.python.org/mailman/listinfo/python-list

settrace doesn't trace builtin functions

2013-06-30 Thread skunkwerk

Hi,
  I've been using the settrace function to write a tracer for my program, which 
is working great except that it doesn't seem to work for built-in functions, 
like open('filename.txt').  This doesn't seem to be documented, so I'm not sure 
if I'm doing something wrong or that's the expected behavior.

If settrace's behavior in this regard is fixed, is there any way to trace calls 
to open()?
I don't want to use Linux's strace, as it'll run for whole program (not just 
the part I want) and won't show my python line numbers/file names, etc.
The other option I considered was monkey-patching the open function through a 
wrapper, like:

def wrapped_open(*arg,**kw):
print 'open called'
traceback.print_stack()
f = __builtin__.open(*arg,**kw)
return f
open = wrapped_open

but that seemed very brittle to me.

Could someone suggest a better way of doing this?

thank you,
imran
-- 
http://mail.python.org/mailman/listinfo/python-list

UnpicklingError: NEWOBJ class argument isn't a type object

2013-07-07 Thread skunkwerk

Hi,
  I'm using a custom pickler that replaces any un-pickleable objects (such as 
sockets or files) with a string representation of them, based on the code from 
Shane Hathaway here:
http://stackoverflow.com/questions/4080688/python-pickling-a-dict-with-some-unpicklable-items

It works most of the time, but when I try to unpickle a Django HttpResponse, I 
get the following error:
UnpicklingError: NEWOBJ class argument isn't a type object

I have no clue what the error actually means.  If it pickles okay, why should 
it not be able to unpickle?  Any ideas?

thanks for the help,
imran

Here is my code:

from cPickle import Pickler, Unpickler, UnpicklingError

class FilteredObject:
def __init__(self, about):
self.about = about
def __repr__(self):
return 'FilteredObject(%s)' % repr(self.about)

class MyPickler(object):
def __init__(self, file, protocol=2):
pickler = Pickler(file, protocol)
pickler.persistent_id = self.persistent_id
self.dump = pickler.dump
self.clear_memo = pickler.clear_memo

def persistent_id(self, obj):
   if not hasattr(obj, '__getstate__') and not isinstance(obj,
(basestring, bool, int, long, float, complex, tuple, 
list, set, dict)):
return ["filtered:%s" % str(obj)]
else:
return None

class MyUnpickler(object):
def __init__(self, file):
unpickler = Unpickler(file)
unpickler.persistent_load = self.persistent_load
self.load = unpickler.load
self.noload = unpickler.noload

def persistent_load(self, obj_id):
if obj_id[0].startswith('filtered:'):
return FilteredObject(obj_id[0][9:])
else:
raise UnpicklingError('Invalid persistent id')

## serialize to file

f = open('test.txt','wb')
p = MyPickler(f)
p.dump(data)
f.close()

## unserialize from file

f = open('test.txt','rb')
pickled_data = f.read()
f.seek(0)
u = MyUnpickler(f)
data = u.load()
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: UnpicklingError: NEWOBJ class argument isn't a type object

2013-07-08 Thread skunkwerk

On Monday, July 8, 2013 12:45:55 AM UTC-7, Peter Otten wrote:
> skunkwerk wrote:
> 
> 
> 
> > Hi,
> 
> >   I'm using a custom pickler that replaces any un-pickleable objects (such
> 
> >   as sockets or files) with a string representation of them, based on the
> 
> >   code from Shane Hathaway here:
> 
> > http://stackoverflow.com/questions/4080688/python-pickling-a-dict-with-
> 
> some-unpicklable-items
> 
> > 
> 
> > It works most of the time, but when I try to unpickle a Django
> 
> > HttpResponse, I get the following error: UnpicklingError: NEWOBJ class
> 
> > argument isn't a type object
> 
> > 
> 
> > I have no clue what the error actually means.  If it pickles okay, why
> 
> > should it not be able to unpickle?  Any ideas?
> 
> 
> 
> A simple way to provoke the error is to rebind the name referring to the 
> 
> class of the pickled object:
> 
> 
> 
> >>> import cPickle
> 
> >>> class A(object): pass
> 
> ... 
> 
> >>> p = cPickle.dumps(A(), -1)
> 
> >>> cPickle.loads(p)
> 
> <__main__.A object at 0x7fce7bb58c50>
> 
> >>> A = 42
> 
> >>> cPickle.loads(p)
> 
> Traceback (most recent call last):
> 
>   File "", line 1, in 
> 
> cPickle.UnpicklingError: NEWOBJ class argument isn't a type object
> 
> 
> 
> You may be doing something to that effect.

Hey Peter,
  I tried unpickling even from another file with no other code in it, but came 
up with the same error - so I don't think it's a rebinding issue.

But I got the error to disappear when I removed the "hasattr(obj, 
'__getstate__')" from this line of code in the persistent_id function:
if not hasattr(obj, '__getstate__') and isinstance(obj,(basestring, bool, int, 
long, float, complex, tuple, list, set, dict)):
return ["filtered:%s" % type(obj)]

When I do that, I get a few more FilteredObjects in the result, for things like:

I figured these classes must have __getstate__ methods which leads to them 
being pickled without a persistent_id (it turns out they actually have __repr__ 
methods).

So these classes get pickled fine, but run into problems when trying to 
unpickle them.  I understand why ImportErrors would happen if the necessary 
modules haven't been loaded, but this NEWOBJ error is still kind of mystifying.
I guess I just won't pickle any classes for now, if unpickling them is going to 
be dicey.

thanks for the help guys,
imran
-- 
http://mail.python.org/mailman/listinfo/python-list

automated unit test generation

2013-09-28 Thread skunkwerk

Hi,
   I've been working on an open source project to auto-generate unit tests for 
web apps based on traces collected from the web server and static code 
analysis.  I've got an alpha version online at www.splintera.com, and the 
source is at https://github.com/splintera/python-django-client.  I'd love to 
get some feedback from the community and extend it to work with other languages 
as well.  

  I wrote it originally because I was sick of coming into companies where I had 
to inherit tens of thousands of lines of code without any tests, and never had 
time to write them manually - being careful to mock out dependencies, specify 
the correct inputs and outputs, and figure out which path it was taking through 
the code.

   I'd like to get some sense of:
- how difficult/tedious is writing unit tests, and why?
- do you wish you had better code coverage?
- how important is testing to you?

thanks,
imran
-- 
https://mail.python.org/mailman/listinfo/python-list

subprocess.popen function with quotes

2008-03-25 Thread skunkwerk

Hi,
   i'm trying to call subprocess.popen on the 'rename' function in
linux.  When I run the command from the shell, like so:

rename -vn 's/\.htm$/\.html/' *.htm

it works fine... however when I try to do it in python like so:
p = subprocess.Popen(["rename","-vn","'s/\.htm$/
\.html/'","*.htm"],stdout=subprocess.PIPE,stderr=subprocess.PIPE)

print p.communicate()[0]

nothing gets printed out (even for p.communicate()[1])

I think the problem is the quoted string the rename command wants -
when I put it in triple quotes like """s/\.htm$/\.html/""" I get some
output, but not the correct output.  I've also tried escaping the
single quotes with \' and putting it in regular double quotes but that
didn't work either.

i'd appreciate any help

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: subprocess.popen function with quotes

2008-03-25 Thread skunkwerk

also, i've tried the Shell=True parameter for Popen, but that didn't
seem to make a difference

On Mar 25, 8:31 pm, skunkwerk <[EMAIL PROTECTED]> wrote:
> Hi,
>i'm trying to call subprocess.popen on the 'rename' function in
> linux.  When I run the command from the shell, like so:
>
> rename -vn 's/\.htm$/\.html/' *.htm
>
> it works fine... however when I try to do it in python like so:
> p = subprocess.Popen(["rename","-vn","'s/\.htm$/
> \.html/'","*.htm"],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
>
> print p.communicate()[0]
>
> nothing gets printed out (even for p.communicate()[1])
>
> I think the problem is the quoted string the rename command wants -
> when I put it in triple quotes like """s/\.htm$/\.html/""" I get some
> output, but not the correct output.  I've also tried escaping the
> single quotes with \' and putting it in regular double quotes but that
> didn't work either.
>
> i'd appreciate any help

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: subprocess.popen function with quotes

2008-03-25 Thread skunkwerk

On Mar 25, 9:25 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:
> En Wed, 26 Mar 2008 00:39:05 -0300, skunkwerk <[EMAIL PROTECTED]>  
> escribió:
>
> >>    i'm trying to call subprocess.popen on the 'rename' function in
> >> linux.  When I run the command from the shell, like so:
>
> >> rename -vn 's/\.htm$/\.html/' *.htm
>
> >> it works fine... however when I try to do it in python like so:
> >> p = subprocess.Popen(["rename","-vn","'s/\.htm$/
> >> \.html/'","*.htm"],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
>
> >> print p.communicate()[0]
>
> >> nothing gets printed out (even for p.communicate()[1])
>
> I'd try with:
>
> p = subprocess.Popen(["rename", "-vn", r"'s/\.htm$/\.html/'", "*.htm"],
>        stdout=subprocess.PIPE, stderr=subprocess.PIPE,
>        shell=True)
>
> (note that I added shell=True and I'm using a raw string to specify the  
> reg.expr.)
>
> --
> Gabriel Genellina

Thanks Gabriel,
   I tried the new command and one with the raw string and single
quotes, but it is still giving me the same results (no output).  any
other suggestions?

cheers
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: subprocess.popen function with quotes

2008-03-26 Thread skunkwerk

On Mar 25, 11:04 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:
> En Wed, 26 Mar 2008 02:15:28 -0300, skunkwerk <[EMAIL PROTECTED]>  
> escribió:
>
>
>
> > On Mar 25, 9:25 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
> > wrote:
> >> En Wed, 26 Mar 2008 00:39:05 -0300, skunkwerk <[EMAIL PROTECTED]>  
> >> escribió:
>
> >> >>    i'm trying to call subprocess.popen on the 'rename' function in
> >> >> linux.  When I run the command from the shell, like so:
>
> >> >> rename -vn 's/\.htm$/\.html/' *.htm
>
> >> >> it works fine... however when I try to do it in python like so:
> >> >> p = subprocess.Popen(["rename","-vn","'s/\.htm$/
> >> >> \.html/'","*.htm"],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
>
> >> >> print p.communicate()[0]
>
> >> >> nothing gets printed out (even for p.communicate()[1])
>
> >> I'd try with:
>
> >> p = subprocess.Popen(["rename", "-vn", r"'s/\.htm$/\.html/'", "*.htm"],
> >>        stdout=subprocess.PIPE, stderr=subprocess.PIPE,
> >>        shell=True)
>
> >> (note that I added shell=True and I'm using a raw string to specify the  
> >> reg.expr.)
>
> > Thanks Gabriel,
> >    I tried the new command and one with the raw string and single
> > quotes, but it is still giving me the same results (no output).  any
> > other suggestions?
>
> My next try would be without the single quotes...
>
> --
> Gabriel Genellina

thanks for the input guys,
  I've tried the suggestions but can't get it to work.  I have a file
named test.htm in my directory, and when I run the following command:

rename -vn 's/(.*)\.htm$/model.html/' *.htm

from the shell in that directory I get the following output:
test.htm renamed as model.html

now my python script is called test.py, is located in the same
directory, and is called from the shell with 'python test.py'
the contents of test.py:
import subprocess

p = subprocess.Popen(['rename','-vn','s/(.*)\.htm$/
model.html/','*.htm'],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
print p.communicate()[0]

i change to print p.communicate()[1] in case the output is blank the
first time

this is the output:
*.htm renamed as model.html

when I add shell=True to the subprocess command, I get the following
output:
Usage: rename [-v] [-n] [-f] perlexpr [filenames]

am i doing something wrong?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: subprocess.popen function with quotes

2008-03-26 Thread skunkwerk

On Mar 26, 6:44 am, skunkwerk <[EMAIL PROTECTED]> wrote:
> On Mar 25, 11:04 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
> wrote:
>
>
>
> > En Wed, 26 Mar 2008 02:15:28 -0300, skunkwerk <[EMAIL PROTECTED]>  
> > escribió:
>
> > > On Mar 25, 9:25 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
> > > wrote:
> > >> En Wed, 26 Mar 2008 00:39:05 -0300, skunkwerk <[EMAIL PROTECTED]>  
> > >> escribió:
>
> > >> >>    i'm trying to call subprocess.popen on the 'rename' function in
> > >> >> linux.  When I run the command from the shell, like so:
>
> > >> >> rename -vn 's/\.htm$/\.html/' *.htm
>
> > >> >> it works fine... however when I try to do it in python like so:
> > >> >> p = subprocess.Popen(["rename","-vn","'s/\.htm$/
> > >> >> \.html/'","*.htm"],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
>
> > >> >> print p.communicate()[0]
>
> > >> >> nothing gets printed out (even for p.communicate()[1])
>
> > >> I'd try with:
>
> > >> p = subprocess.Popen(["rename", "-vn", r"'s/\.htm$/\.html/'", "*.htm"],
> > >>        stdout=subprocess.PIPE, stderr=subprocess.PIPE,
> > >>        shell=True)
>
> > >> (note that I added shell=True and I'm using a raw string to specify the  
> > >> reg.expr.)
>
> > > Thanks Gabriel,
> > >    I tried the new command and one with the raw string and single
> > > quotes, but it is still giving me the same results (no output).  any
> > > other suggestions?
>
> > My next try would be without the single quotes...
>
> > --
> > Gabriel Genellina
>
> thanks for the input guys,
>   I've tried the suggestions but can't get it to work.  I have a file
> named test.htm in my directory, and when I run the following command:
>
> rename -vn 's/(.*)\.htm$/model.html/' *.htm
>
> from the shell in that directory I get the following output:
> test.htm renamed as model.html
>
> now my python script is called test.py, is located in the same
> directory, and is called from the shell with 'python test.py'
> the contents of test.py:
> import subprocess
>
> p = subprocess.Popen(['rename','-vn','s/(.*)\.htm$/
> model.html/','*.htm'],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
> print p.communicate()[0]
>
> i change to print p.communicate()[1] in case the output is blank the
> first time
>
> this is the output:
> *.htm renamed as model.html
>
> when I add shell=True to the subprocess command, I get the following
> output:
> Usage: rename [-v] [-n] [-f] perlexpr [filenames]
>
> am i doing something wrong?

in addition, when I use Popen without any quotes, or without quotes
for the regular expression, I get an exception.

I'm running ubuntu linux 7.10 with python 2.5.1

thanks
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: subprocess.popen function with quotes

2008-03-26 Thread skunkwerk

On Mar 26, 8:05 am, Jeffrey Froman <[EMAIL PROTECTED]> wrote:
> skunkwerk wrote:
> > p = subprocess.Popen(['rename','-vn','s/(.*)\.htm$/
> > model.html/','*.htm'],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
> > print p.communicate()[0]
>
> > i change to print p.communicate()[1] in case the output is blank the
> > first time
>
> > this is the output:
> > *.htm renamed as model.html
>
> Without shell=True, your glob characters will not be expanded. Hence, the
> command looks for a file actually named "*.htm"
>
> > when I add shell=True to the subprocess command, I get the following
> > output:
> > Usage: rename [-v] [-n] [-f] perlexpr [filenames]
>
> Here the use of the shell may be confounding the arguments passed. Your
> command will probably work better if you avoid using shell=True. However,
> you will need to perform your own globbing:
>
> # Untested (no perl-rename here):
>
> command = ['rename','-vn', 's/(.*)\.htm$/model.html/']
> files = glob.glob('*.htm')
> command.extend(files)
> p = subprocess.Popen(
>     command,
>     stdout=subprocess.PIPE,
>     stderr=subprocess.PIPE,
>     )
>
> Jeffrey

thanks Jeffrey, that worked like a charm!
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: threading - race condition?

2008-05-12 Thread skunkwerk

On May 11, 1:55 pm, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote:
> On Sun, 11 May 2008 09:16:25 -0700 (PDT),skunkwerk
> <[EMAIL PROTECTED]> declaimed the following in comp.lang.python:
>
>
>
> > the only issue i have now is that it takes a long time for 100 threads
> > to initialize that connection (>5 minutes) - and as i'm doing this on
> > a webserver any time i update the code i have to restart all those
> > threads, which i'm doing right now in a for loop.  is there any way I
> > can keep the thread stuff separate from the rest of the code for this
> > file, yet allow access?  It wouldn't help having a .pyc or using
> > psycho, correct, as the time is being spent in the runtime?  something
> > along the lines of 'start a new thread every minute until you get to a
> > 100' without blocking the execution of the rest of the code in that
> > file?  or maybe any time i need to do a search, start a new thread if
> > the #threads is <100?
>
> Is this running as part of the server process, or as a client
> accessing the server?
>
> Alternative question: Have you tried measuring the performance using
> /fewer/ threads... 25 or less? I believe I'd mentioned prior that you
> seem to have a lot of overhead code for what may be a short query.
>
> If the .get_item() code is doing a full sequence of: connect to
> database; format&submit query; fetch results; disconnect from
> database... I'd recommend putting the connect/disconnect outside of the
> thread while loop (though you may then need to put sentinel values into
> the feed queue -- one per thread -- so they can cleanly exit and
> disconnect rather than relying on daemonization for exit).
>
> thread:
> dbcon = ...
> while True:
> query = Q.get()
> if query == SENTINEL: break
> result = get_item(dbcon, query)
> ...
> dbcon.close()
>
> Third alternative: Find some way to combine the database queries.
> Rather than 100 threads each doing a single lookup (from your code, it
> appears that only 1 result is expected per search term), run 10 threads
> each looking up 10 items at once...
>
> thread:
> dbcon = ...
> terms = []
> terminate = False
> while not terminate:
> while len(terms) < 10:
> query = Q.get_nowait()
> if not query: break
> if query == SENTINEL:
> terminate = True
> break
> terms.append(query)
> results = get_item(dbcon, terms)
> terms = []
> #however you are returning items; match the query term to the
> #key item in the list of returned data?
> dbcon.close()
>
> where the final select statement looks something like:
>
> SQL = """select key, title, scraped from ***
> where key in ( %s )""" % ", ".join("?" for x in terms)
> #assumes database adapter uses ? for placeholder
> dbcur.execute(SQL, terms)
> --
> WulfraedDennis Lee Bieber   KD6MOG
> [EMAIL PROTECTED] [EMAIL PROTECTED]
> HTTP://wlfraed.home.netcom.com/
> (Bestiaria Support Staff:   [EMAIL PROTECTED])
> HTTP://www.bestiaria.com/

thanks again Dennis,
   i chose 100 threads so i could do 10 simultaneous searches (where
each search contains 10 terms - using 10 threads).  the .get_item()
code is not doing the database connection - rather the intialization
is done in the initialization of each thread.  so basically once a
thread starts the database connection is persistent and .get_item
queries are very fast.  this is running as a server process (using
django).

cheers
--
http://mail.python.org/mailman/listinfo/python-list

Re: threading - race condition?

2008-05-12 Thread skunkwerk

On May 12, 1:40 am, Rhamphoryncus <[EMAIL PROTECTED]> wrote:
> On May 11, 10:16 am,skunkwerk<[EMAIL PROTECTED]> wrote:
>
>
>
> > On May 10, 1:31 pm, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote:
>
> > > On Fri, 9 May 2008 08:40:38 -0700 (PDT),skunkwerk<[EMAIL PROTECTED]>
> > > declaimed the following in comp.lang.python:
>
> > > Coming in late...
>
> > > > On May 9, 12:12 am, John Nagle <[EMAIL PROTECTED]> wrote:
> > > > >skunkwerkwrote:
> > > > > > i've declared a bunch of workerthreads(100) and a queue into which
> > > > > > new requests are inserted, like so:
>
> > > 
>
> > > > > > queue = Queue.Queue(0)
> > > > > >  WORKERS=100
> > > > > > for i in range(WORKERS):
> > > > > >thread = SDBThread(queue)
> > > > > >thread.setDaemon(True)
> > > > > >thread.start()
>
> > > > > > the thread:
>
> > > > > > class SimpleDBThread ( threading.Thread ):
> > > > > >def __init__ ( self, queue ):
> > > > > >self.__queue = queue
>
> > > Note: double-leading __ means "name mangling" -- typically only
> > > needed when doing multiple layers of inheritance where different parents
> > > have similar named items that need to be kept independent; a single _ is
> > > the convention for "don't touch me unless you know what you are doing"
>
> > > > > >threading.Thread.__init__ ( self )
> > > > > >def run ( self ):
> > > > > >while 1:
> > > > > >item = self.__queue.get()
> > > > > >if item!=None:
> > > > > >model = domain.get_item(item[0])
> > > > > >logger.debug('sdbthread item:'+item[0])
> > > > > >title = model['title']
> > > > > >scraped = model['scraped']
> > > > > >logger.debug("sdbthread title:"+title)
>
> > > > > > any suggestions?
> > > > > > thanks
>
> > > 
>
> > > > thanks John, Gabriel,
> > > >   here's the 'put' side of the requests:
>
> > > > def prepSDBSearch(results):
> > > >modelList = [0]
> > > >counter=1
> > > >for result in results:
> > > >data = [result.item, counter, modelList]
> > > >queue.put(data)
> > > >counter+=1
> > > >while modelList[0] < len(results):
> > > >print 'waiting...'#wait for them to come home
> > > >modelList.pop(0)#now remove '0'
> > > >return modelList
>
> > > My suggestion, if you really want diagnostic help -- follow the
> > > common recommendation of posting the minimal /runable (if erroneous)/
> > > code... If "domain.get_item()" is some sort of RDBM access, you might
> > > fake it using a pre-loaded dictionary -- anything that allows it to
> > > return something when given the key value.
>
> > > > responses to your follow ups:
> > > > 1)  'item' in thethreadsis a list that corresponds to the 'data'
> > > > list in the above function.  it's not global, and the initial values
> > > > seem ok, but i'm not sure if every time i pass in data to the queue it
> > > > passes in the same memory address or declares a new 'data' list (which
> > > > I guess is what I want)
>
> > > Rather confusing usage... In your "put" you have a list whose 
> > > first
> > > element is "result.item", but then in the work thread, you refer to the
> > > entire list as "item"
>
> > > > 3)  the first item in the modelList is a counter that keeps track of
> > > > the number ofthreadsfor this call that have completed - is there any
> > > > better way of doing this?
>
> > > Where? None of your posted code shows either "counter" or 
> > > modelList
> > > being used by thethreads.
>
> > > And yes, if you havethreadstrying to update a shared mutable, y

Re: threading - race condition?

2008-05-12 Thread skunkwerk

On May 11, 9:10 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:
> En Sun, 11 May 2008 13:16:25 -0300,skunkwerk<[EMAIL PROTECTED]> escribió:
>
> > the only issue i have now is that it takes a long time for 100 threads
> > to initialize that connection (>5 minutes) - and as i'm doing this on
> > a webserver any time i update the code i have to restart all those
> > threads, which i'm doing right now in a for loop.  is there any way I
> > can keep the thread stuff separate from the rest of the code for this
> > file, yet allow access?
>
> Like using a separate thread to create the other 100?
>
> --
> Gabriel Genellina

thanks Gabriel,
   i think that could do it - let me try it out.  don't know why i
didn't think of it earlier.
--
http://mail.python.org/mailman/listinfo/python-list

lots of futex_wait calls

2008-06-06 Thread skunkwerk

I've got a python program written for the django web framework that
starts about 100 threads.  When I start the server, it sometimes eats
up 100% of the CPU for a good minute or so... though none of the
threads are CPU-intensive

doing a strace on the program, i found lots of calls like this:

select(5, [4], [], [], {1, 0}) = 0 (Timeout)
futex(0x86a3ce0, FUTEX_WAIT, 0, NULL) = 0

i've read the man page for futex... but is this normal?

thanks
--
http://mail.python.org/mailman/listinfo/python-list

Re: lots of futex_wait calls

2008-06-10 Thread skunkwerk

On Jun 6, 10:03 am, André Malo <[EMAIL PROTECTED]> wrote:
> skunkwerkwrote:
> > I've got a python program written for the django web framework that
> > starts about 100 threads.  When I start the server, it sometimes eats
> > up 100% of the CPU for a good minute or so... though none of the
> > threads are CPU-intensive
>
> > doing a strace on the program, i found lots of calls like this:
>
> > select(5, [4], [], [], {1, 0}) = 0 (Timeout)
> > futex(0x86a3ce0, FUTEX_WAIT, 0, NULL) = 0
>
> > i've read the man page for futex... but is this normal?
>
> More or less. Most of the futex calls (if not all) are grabbing or releasing
> the global interpreter lock (GIL).
>
> It's usually helpful to increase the thread-schedule-checkinterval in order
> to lessen the system load (especially the number of context switches). See
> sys.setcheckinterval.
>
> nd

I've set the checkinterval to 200, and it seems to be ok... but after
one or two days, the python processes will start hogging 100% of the
CPU and bring the system to a crawl.  I ran strace again, and all of
the calls are:

select(5, [4],[],[],{1,0}) = 0 (Timeout)
futex(0x877d0c8, FUTEX_WAIT, 0 NULL) = 0
futex(0x877d0c8, FUTEX_WAKE,1) = 0

is there any way to find out what's causing this?  would you need to
look at my threading code?

thanks,
imran
--
http://mail.python.org/mailman/listinfo/python-list

popen pipe limit

2008-04-07 Thread skunkwerk

I'm getting errors when reading from/writing to pipes that are fairly
large in size.  To bypass this, I wanted to redirect output to a file
in the subprocess.Popen function, but couldn't get it to work (even
after setting Shell=True).  I tried adding ">","temp.sql" after the
password field but mysqldump gave me an error.

the code:
p1 = subprocess.Popen(["mysqldump","--all-databases","--user=user","--
password=password"], shell=True)
p2 = subprocess.Popen(["gzip","-9"], stdin=p1.stdout)
output = p2.communicate()[0]
file=open('test.sql.gz','w')
file.write(str(output))
file.close()

the output:
gzip: compressed data not written to a terminal. Use -f to force
compression.
For help, type: gzip -h
mysqldump: Got errno 32 on write

I'm using python rather than a shell script for this because I need to
upload the resulting file to a server as soon as it's done.

thanks
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: popen pipe limit

2008-04-09 Thread skunkwerk

On Apr 7, 6:17 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote:
> En Mon, 07 Apr 2008 20:52:54 -0300,skunkwerk<[EMAIL PROTECTED]>  
> escribió:
>
> > I'm getting errors when reading from/writing to pipes that are fairly
> > large in size.  To bypass this, I wanted to redirect output to a file
> > in the subprocess.Popen function, but couldn't get it to work (even
> > after setting Shell=True).  I tried adding ">","temp.sql" after the
> > password field but mysqldump gave me an error.
>
> > the code:
> > p1 = subprocess.Popen(["mysqldump","--all-databases","--user=user","--
> > password=password"], shell=True)
> > p2 = subprocess.Popen(["gzip","-9"], stdin=p1.stdout)
> > output = p2.communicate()[0]
> > file=open('test.sql.gz','w')
> > file.write(str(output))
> > file.close()
>
> You need a pipe to chain subprocesses:
>
> import subprocess
> p1 =  
> subprocess.Popen(["mysqldump","--all-databases","--user=user","--password=password"],
>   
> stdout=subprocess.PIPE)
> ofile = open("test.sql.gz", "wb")
> p2 = subprocess.Popen(["gzip","-9"], stdin=p1.stdout, stdout=ofile)
> p1.wait()
> p2.wait()
> ofile.close()
>
> If you don't want the final file on disk:
>
> p1 =  
> subprocess.Popen(["mysqldump","--all-databases","--user=user","--password=password"],
>   
> stdout=subprocess.PIPE)
> p2 = subprocess.Popen(["gzip","-9"], stdin=p1.stdout,  
> stdout=subprocess.PIPE)
> while True:
>    chunk = p2.stdout.read(4192)
>    if not chunk: break
>    # do something with read chunk
>
> p1.wait()
> p2.wait()
>
> --
> Gabriel Genellina

thanks Gabriel - tried the first one and it worked great!
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: subprocess.popen function with quotes

2008-04-13 Thread skunkwerk

On Mar 26, 10:33 pm, skunkwerk <[EMAIL PROTECTED]> wrote:
> On Mar 26, 8:05 am, Jeffrey Froman <[EMAIL PROTECTED]> wrote:
>
>
>
> >skunkwerkwrote:
> > > p = subprocess.Popen(['rename','-vn','s/(.*)\.htm$/
> > > model.html/','*.htm'],stdout=subprocess.PIPE,stderr=subprocess.PIPE)
> > > print p.communicate()[0]
>
> > > i change to print p.communicate()[1] in case the output is blank the
> > > first time
>
> > > this is the output:
> > > *.htm renamed as model.html
>
> > Without shell=True, your glob characters will not be expanded. Hence, the
> > command looks for a file actually named "*.htm"
>
> > > when I add shell=True to the subprocess command, I get the following
> > > output:
> > > Usage: rename [-v] [-n] [-f] perlexpr [filenames]
>
> > Here the use of the shell may be confounding the arguments passed. Your
> > command will probably work better if you avoid using shell=True. However,
> > you will need to perform your own globbing:
>
> > # Untested (no perl-rename here):
>
> > command = ['rename','-vn', 's/(.*)\.htm$/model.html/']
> > files = glob.glob('*.htm')
> > command.extend(files)
> > p = subprocess.Popen(
> >     command,
> >     stdout=subprocess.PIPE,
> >     stderr=subprocess.PIPE,
> >     )
>
> > Jeffrey
>
> thanks Jeffrey, that worked like a charm!

I'm trying to detect when the subprocess has terminated using the
wait() function - but when there is an error with the call to rename
(ie the file doesn't exist) rename (when run from the command line
just terminates and displays the error).  In the code above, though,
my call to p.wait() just hangs when rename should throw an error...
I've tried adding shell=True but that stops the rename from working.
any ideas?

thanks
-- 
http://mail.python.org/mailman/listinfo/python-list

logger output

2008-05-04 Thread skunkwerk

i'm redirecting the stdout & stderr of my python program to a log.
Tests i've done on a simple program with print statements, etc. work
fine.  however, in my actual program i get weird output like this:

2008-05-04 20:20:44,790 DEBUG Grabbing message from queue, if any
2008-05-04 20:20:44,790 DEBUG DEBUG:doit:Grabbing message from queue,
if any
2008-05-04 20:20:44,790 DEBUG DEBUG:doit:DEBUG:doit:Grabbing message
from queue, if any
2008-05-04 20:20:44,790 DEBUG
DEBUG:doit:DEBUG:doit:DEBUG:doit:Grabbing message from queue, if any

followed by:
2008-05-04 20:20:44,815 DEBUG DEBUG:doit:Traceback (most recent call
last):
2008-05-04 20:20:44,815 DEBUG DEBUG:doit:DEBUG:doit:Traceback (most
recent call last):
2008-05-04 20:20:44,815 DEBUG
DEBUG:doit:DEBUG:doit:DEBUG:doit:Traceback (most recent call last):

the code I'm using for the log stuff:

import logging
logger = logging.getLogger('doit')
hdlr = logging.FileHandler('/home/imran/doit.log')
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
hdlr.setFormatter(formatter)
logger.addHandler(hdlr)
logger.setLevel(logging.DEBUG)

class write2Log:
def write(self, x):
if x!='\n':
logger.debug(str(x))

sys.stdout = write2Log()
sys.stderr= write2Log()

any ideas what might be causing the problems?  some of the messages
being output are quite long - might this be a problem?

thanks
--
http://mail.python.org/mailman/listinfo/python-list

Re: logger output

2008-05-05 Thread skunkwerk

On May 4, 10:40 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
wrote:
> En Mon, 05 May 2008 00:33:12 -0300,skunkwerk<[EMAIL PROTECTED]> escribió:
>
>
>
> > i'm redirecting the stdout & stderr of my python program to a log.
> > Tests i've done on a simple program with print statements, etc. work
> > fine.  however, in my actual program i get weird output like this:
>
> > 2008-05-04 20:20:44,790 DEBUG Grabbing message from queue, if any
> > 2008-05-04 20:20:44,790 DEBUG DEBUG:doit:Grabbing message from queue,
> > if any
> > 2008-05-04 20:20:44,790 DEBUG DEBUG:doit:DEBUG:doit:Grabbing message
> > from queue, if any
> > 2008-05-04 20:20:44,790 DEBUG
> > DEBUG:doit:DEBUG:doit:DEBUG:doit:Grabbing message from queue, if any
>
> > class write2Log:
> >def write(self, x):
> >if x!='\n':
> >logger.debug(str(x))
>
> > any ideas what might be causing the problems?  some of the messages
> > being output are quite long - might this be a problem?
>
> Try this simplified example and see by yourself:
>
> import sys
>
> class Write2Log:
>  def write(self, x):
>  sys.__stdout__.write('[%s]' % x)
>
> sys.stdout = Write2Log()
>
> print "Hello world!"
> age = 27
> name = "John"
> print "My name is", name, "and I am", age, "years old."
>
> --
> Gabriel Genellina

thanks Gabriel,
   i tried the code you sent and got output like the following:
[My name is][][john][][and I am][][27][][years old.]

it doesn't really help me though.  does this have any advantages over
the syntax i was using?
are there any limits on what kind of objects the logger can write?  ie
ascii strings of any length?

thanks
--
http://mail.python.org/mailman/listinfo/python-list

Re: logger output

2008-05-06 Thread skunkwerk

On May 5, 3:44 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> wrote:
> En Mon, 05 May 2008 13:02:12 -0300,skunkwerk<[EMAIL PROTECTED]> escribió:
>
>
>
> > On May 4, 10:40 pm, "Gabriel Genellina" <[EMAIL PROTECTED]>
> > wrote:
> >> En Mon, 05 May 2008 00:33:12 -0300,skunkwerk<[EMAIL PROTECTED]> escribió:
>
> >> > i'm redirecting the stdout & stderr of my python program to a log.
> >> > Tests i've done on a simple program with print statements, etc. work
> >> > fine.  however, in my actual program i get weird output like this:
>
> >> > 2008-05-04 20:20:44,790 DEBUG Grabbing message from queue, if any
> >> > 2008-05-04 20:20:44,790 DEBUG DEBUG:doit:Grabbing message from queue,
> >> > if any
> >> > 2008-05-04 20:20:44,790 DEBUG DEBUG:doit:DEBUG:doit:Grabbing message
>
> >> Try this simplified example and see by yourself:
>
> >> import sys
>
> >> class Write2Log:
> >>      def write(self, x):
> >>          sys.__stdout__.write('[%s]' % x)
>
> >> sys.stdout = Write2Log()
>
> >> print "Hello world!"
> >> age = 27
> >> name = "John"
> >> print "My name is", name, "and I am", age, "years old."
>
> > thanks Gabriel,
> >    i tried the code you sent and got output like the following:
> > [My name is][][john][][and I am][][27][][years old.]
>
> > it doesn't really help me though.  does this have any advantages over
> > the syntax i was using?
> > are there any limits on what kind of objects the logger can write?  ie
> > ascii strings of any length?
>
> The example doesn't use any logger, so loggers aren't the problem here, ok?
>
> The write function above puts square brackets [] around anything it receives. 
> This way you can see exactly how write() is called: once per *item* in the 
> print statement, plus once per comma used (with an space character that you 
> didn't copy correctly).
>
> Back to your original code, you have to call logger.debug with a *line* of 
> text, but you are calling it with many small pieces - that's the problem. 
> Accumulate output until you see a '\n' - then join all the pieces into a 
> single, complete line and finally call logger.debug
>
> --
> Gabriel Genellina

thanks Gabriel,
   i wrote the function below, but am now getting an "Error in
sys.exitfunc:" error (which disappears when i comment out the last two
lines below):

class write2Log:
def write(self, x):
if x!=',':#ignore if a comma
if str(x).count('\n')==0:
buffer += str(x)
else:
list = str(x).split('\n')
logger.debug(buffer)
buffer = ""
for text in list:
logger.debug(text)

sys.stdout = write2Log()
sys.stderr= write2Log()

any ideas what might be wrong?
thanks again
--
http://mail.python.org/mailman/listinfo/python-list

threading - race condition?

2008-05-08 Thread skunkwerk

i'm getting the wrong output for the 'title' attributes for this
data.  the queue holds a data structure (item name, position, and list
to store results in).  each thread takes in an item name and queries a
database for various attributes.  from the debug statements the item
names are being retrieved correctly, but the attributes returned are
those of other items in the queue - not its own item.  however, the
model variable is not a global variable... so i'm not sure what's
wrong.

i've declared a bunch of worker threads (100) and a queue into which
new requests are inserted, like so:

queue = Queue.Queue(0)
 WORKERS=100
for i in range(WORKERS):
thread = SDBThread(queue)
thread.setDaemon(True)
thread.start()

the thread:

class SimpleDBThread ( threading.Thread ):
   def __init__ ( self, queue ):
self.__queue = queue
threading.Thread.__init__ ( self )
   def run ( self ):
while 1:
item = self.__queue.get()
if item!=None:
model = domain.get_item(item[0])
logger.debug('sdbthread item:'+item[0])
title = model['title']
scraped = model['scraped']
logger.debug("sdbthread title:"+title)

any suggestions?
thanks
--
http://mail.python.org/mailman/listinfo/python-list

Re: threading - race condition?

2008-05-08 Thread skunkwerk

On May 8, 4:54 pm, [EMAIL PROTECTED] wrote:
> On May 8, 5:45 pm, skunkwerk <[EMAIL PROTECTED]> wrote:
>
>
>
> > i'm getting the wrong output for the 'title' attributes for this
> > data.  the queue holds a data structure (item name, position, and list
> > to store results in).  each thread takes in an item name and queries a
> > database for various attributes.  from the debug statements the item
> > names are being retrieved correctly, but the attributes returned are
> > those of other items in the queue - not its own item.  however, the
> > model variable is not a global variable... so i'm not sure what's
> > wrong.
>
> > i've declared a bunch of workerthreads(100) and a queue into which
> > new requests are inserted, like so:
>
> > queue = Queue.Queue(0)
> >  WORKERS=100
> > for i in range(WORKERS):
> > thread = SDBThread(queue)
> > thread.setDaemon(True)
> > thread.start()
>
> > the thread:
>
> > class SimpleDBThread ( threading.Thread ):
> >def __init__ ( self, queue ):
> > self.__queue = queue
> > threading.Thread.__init__ ( self )
> >def run ( self ):
> > while 1:
> > item = self.__queue.get()
> > if item!=None:
> > model = domain.get_item(item[0])
> > logger.debug('sdbthread item:'+item[0])
> > title = model['title']
> > scraped = model['scraped']
> > logger.debug("sdbthread title:"+title)
>
> > any suggestions?
> > thanks
>
> I'll base this on terminology: if a model is in a brain (throughout
> the known universe), and a dollar's a dollar, it may not pay to build
> a computer out of brains.
>
> If man arises as a tool-carrier, we will carry tools, not people.
> Don't use Python to make people; make money, and not too much.  Pick a
> wage and you might get somewhere.

excuse me?
--
http://mail.python.org/mailman/listinfo/python-list

Re: threading - race condition?

2008-05-09 Thread skunkwerk

On May 9, 12:12 am, John Nagle <[EMAIL PROTECTED]> wrote:
> skunkwerk wrote:
> > i'm getting the wrong output for the 'title' attributes for this
> > data.  the queue holds a data structure (item name, position, and list
> > to store results in).  each thread takes in an item name and queries a
> > database for various attributes.  from the debug statements the item
> > names are being retrieved correctly, but the attributes returned are
> > those of other items in the queue - not its own item.  however, the
> > model variable is not a global variable... so i'm not sure what's
> > wrong.
>
> > i've declared a bunch of workerthreads(100) and a queue into which
> > new requests are inserted, like so:
>
> > queue = Queue.Queue(0)
> >  WORKERS=100
> > for i in range(WORKERS):
> >    thread = SDBThread(queue)
> >    thread.setDaemon(True)
> >    thread.start()
>
> > the thread:
>
> > class SimpleDBThread ( threading.Thread ):
> >    def __init__ ( self, queue ):
> >            self.__queue = queue
> >            threading.Thread.__init__ ( self )
> >    def run ( self ):
> >            while 1:
> >                    item = self.__queue.get()
> >                    if item!=None:
> >                            model = domain.get_item(item[0])
> >                            logger.debug('sdbthread item:'+item[0])
> >                            title = model['title']
> >                            scraped = model['scraped']
> >                            logger.debug("sdbthread title:"+title)
>
> > any suggestions?
> > thanks
>
>    Hm.  We don't have enough code here to see what's wrong.
> For one thing, we're not seeing how items get put on the queue.  The
> trouble might be at the "put" end.
>
>    Make sure that "model", "item", "title", and "scraped" are not globals.
> Remember, any assignment to them in a global context makes them a global.
>
>    You should never get "None" from the queue unless you put a "None"
> on the queue.  "get()" blocks until there's work to do.
>
>                                         John Nagle

thanks John, Gabriel,
  here's the 'put' side of the requests:

def prepSDBSearch(results):
modelList = [0]
counter=1
for result in results:
data = [result.item, counter, modelList]
queue.put(data)
counter+=1
while modelList[0] < len(results):
print 'waiting...'#wait for them to come home
modelList.pop(0)#now remove '0'
return modelList

responses to your follow ups:
1)  'item' in the threads is a list that corresponds to the 'data'
list in the above function.  it's not global, and the initial values
seem ok, but i'm not sure if every time i pass in data to the queue it
passes in the same memory address or declares a new 'data' list (which
I guess is what I want)
2)  john, i don't think any of the variables you mentioned are
global.  the 'none' check was just for extra safety.
3)  the first item in the modelList is a counter that keeps track of
the number of threads for this call that have completed - is there any
better way of doing this?

thanks again
--
http://mail.python.org/mailman/listinfo/python-list

Re: threading - race condition?

2008-05-11 Thread skunkwerk

On May 10, 1:31 pm, Dennis Lee Bieber <[EMAIL PROTECTED]> wrote:
> On Fri, 9 May 2008 08:40:38 -0700 (PDT),skunkwerk<[EMAIL PROTECTED]>
> declaimed the following in comp.lang.python:
>
>         Coming in late...
>
> > On May 9, 12:12 am, John Nagle <[EMAIL PROTECTED]> wrote:
> > >skunkwerkwrote:
> > > > i've declared a bunch of workerthreads(100) and a queue into which
> > > > new requests are inserted, like so:
>
>         
>
>
>
> > > > queue = Queue.Queue(0)
> > > >  WORKERS=100
> > > > for i in range(WORKERS):
> > > >    thread = SDBThread(queue)
> > > >    thread.setDaemon(True)
> > > >    thread.start()
>
> > > > the thread:
>
> > > > class SimpleDBThread ( threading.Thread ):
> > > >    def __init__ ( self, queue ):
> > > >            self.__queue = queue
>
>         Note: double-leading __ means "name mangling" -- typically only
> needed when doing multiple layers of inheritance where different parents
> have similar named items that need to be kept independent; a single _ is
> the convention for "don't touch me unless you know what you are doing"
>
> > > >            threading.Thread.__init__ ( self )
> > > >    def run ( self ):
> > > >            while 1:
> > > >                    item = self.__queue.get()
> > > >                    if item!=None:
> > > >                            model = domain.get_item(item[0])
> > > >                            logger.debug('sdbthread item:'+item[0])
> > > >                            title = model['title']
> > > >                            scraped = model['scraped']
> > > >                            logger.debug("sdbthread title:"+title)
>
> > > > any suggestions?
> > > > thanks
>
>         
>
> > thanks John, Gabriel,
> >   here's the 'put' side of the requests:
>
> > def prepSDBSearch(results):
> >    modelList = [0]
> >    counter=1
> >    for result in results:
> >            data = [result.item, counter, modelList]
> >            queue.put(data)
> >            counter+=1
> >    while modelList[0] < len(results):
> >            print 'waiting...'#wait for them to come home
> >    modelList.pop(0)#now remove '0'
> >    return modelList
>
>         My suggestion, if you really want diagnostic help -- follow the
> common recommendation of posting the minimal /runable (if erroneous)/
> code... If "domain.get_item()" is some sort of RDBM access, you might
> fake it using a pre-loaded dictionary -- anything that allows it to
> return something when given the key value.
>
> > responses to your follow ups:
> > 1)  'item' in thethreadsis a list that corresponds to the 'data'
> > list in the above function.  it's not global, and the initial values
> > seem ok, but i'm not sure if every time i pass in data to the queue it
> > passes in the same memory address or declares a new 'data' list (which
> > I guess is what I want)
>
>         Rather confusing usage... In your "put" you have a list whose first
> element is "result.item", but then in the work thread, you refer to the
> entire list as "item"
>
> > 3)  the first item in the modelList is a counter that keeps track of
> > the number ofthreadsfor this call that have completed - is there any
> > better way of doing this?
>
>         Where? None of your posted code shows either "counter" or modelList
> being used by thethreads.
>
>         And yes, if you havethreadstrying to update a shared mutable, you
> have a race condition.
>
>         You also have a problem if you are using "counter" to define where
> in modelList a thread is supposed to store its results -- as you can not
> access an element that doesn't already exist...
>
> a = [0]
> a[3] = 1        #failure, need to create elements 1, 2, 3 first
>
>         Now, if position is irrelevant, and a thread just appends its
> results to modelList, then you don't need some counter, all you need is
> to check the length of modelList against the count expected.
>
>         Overall -- even though you are passing things via the queue, the
> contents being pass via the queue are being treated as if they were
> global entities (you could make modelList a global, remove it from the
> queue entries, and have the same net access).

multiprocessing pipes with custom pickler

collecting variable assignments through settrace

settrace doesn't trace builtin functions

UnpicklingError: NEWOBJ class argument isn't a type object

Re: UnpicklingError: NEWOBJ class argument isn't a type object

automated unit test generation

subprocess.popen function with quotes

Re: subprocess.popen function with quotes

Re: subprocess.popen function with quotes

Re: subprocess.popen function with quotes

Re: subprocess.popen function with quotes

Re: subprocess.popen function with quotes

Re: threading - race condition?

Re: threading - race condition?

Re: threading - race condition?

lots of futex_wait calls

Re: lots of futex_wait calls

popen pipe limit

Re: popen pipe limit

Re: subprocess.popen function with quotes

logger output

Re: logger output

Re: logger output

threading - race condition?

Re: threading - race condition?

Re: threading - race condition?

Re: threading - race condition?

27 matches

Site Navigation

Mail list logo

Footer information