Avoiding deadlocks in concurrent programming
This is not really Python specific, but I know Python programmers are among the best in the world. I have a fair understanding of the concepts involved, enough to realize that I would benefit from the experience of others :) I have a shared series of objects in memory that may be > 100MB. Often to perform a task for a client several of these objects must be used. Since many clients can be making requests at once (>100 per second during peak load) these objects must be protected with some method of thread syncronization (threading.Lock would do fine.) This is somewhat complicated by a backup thread which takes the data every 10 minutes and exports it all to a format that can be saved to disk. Clearly the backup thread must have exclusive access to all of these objects at once, and there must be no half-completed transactions underway. The easiest way from a design stand point is have a single lock and let clients have exclusive access to everything even although they only ever need access to a fraction of the data. This makes it easy for the backup thread, and it ensures that there's no trouble with deadlocks or with partially completed transactions. However imagine what would happen if there's 100 clients in 100 threads waiting for access to that lock. One could be almost finished with it, and then 100 threads could get switched in and out, all doing nothing since they're all waiting for the lock held by the one thread. So I think you would need multiple locks so clients only acquire what they need. This would let multiple threads access the data at once. But now I have to deal with deadlocks since clients will usually acquire a resource and then block acquiring another. It is very likely that one client locks A, another locks B, then the guy with B waits for A and the guy with A waits for B. Worse yet the backup thread will go around trying to lock everything and will no doubt deadlock everybody. How do you avoid this? I thought instead of blocking forever you could try non-blocking acquires in a loop with a timeout of a few seconds, if the timeout is reached, release any locks held and try again later, with the backup thread being the only one to use blocking acquires since it would never complete it's task otherwise. No doubt this is a common problem, how would you people deal with it? -Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: Avoiding deadlocks in concurrent programming
Hi Paul, >Do you mean a few records of 20+ MB each, or millions of records of a >few dozen bytes, or what? Well they're objects with lists and dictionaries and data members and other objects inside of them. Some are very large, maybe bigger than 20MB, while others are very numerous and small (a hundred thousand tiny objects in a large list). Some of the large objects could have many locks inside of them to allow simultaneous access to different parts, while others would need to be accessed as a unit. >If the 100 threads are blocked waiting for the lock, they shouldn't >get awakened until the lock is released. So this approach is >reasonable if you can minimize the lock time for each transaction. Now that is interesting, because if 100 clients have to go through the system in a second, the server clearly is capable of sending 100 clients through in a second, and it doesn't matter if they all go through "at once" or one at a time so long as nobody gets stuck waiting for much longer than a few seconds. It would be very simple and painless for me to send them all through one at a time. It is also possible that certain objects are never accessed in the same action, and those could have seperate locks as an optimization (this would require carefull analysis of the different actions.) >You're doing what every serious database implementation needs to do ... >Are you sure you don't want to just use an RDBMS? It was considered, but we decided that abstracting the data into tables to be manipulated with SQL queries is substantially more complex for the programmer and way too expensive to the system since the average action would require 20-100 queries. Thanks, -Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: MySQLdb reconnect
I don't beleive that it does. You can however call ping() on the connection which should attempt an automatic reconnection. See the docs for mysql_ping: http://dev.mysql.com/doc/mysql/en/mysql-ping.html I've never tested that, but I have a need for it also so let me know if it works or not. -Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: Avoiding deadlocks in concurrent programming
Hi Steve, The backup thread only holds the lock long enough to create an in-memory representation of the data. It writes to disk on it's own time after it has released the lock, so this is not an issue. If you're saying what I think you are, then a single lock is actually better for performance than multiple locks so long as one avoids waiting for other resources like I/O. -Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: Avoiding deadlocks in concurrent programming
Thanks for all of the replies, I'm glad I posted here, you guys have been very helpful. >Obviously, I only know what you've told us about your data, but 20-100 >queries? That doesn't sound right ... RDBMSes are well- >studied and well-understood; they are also extremely powerful when used >to their potential. Well I doubt there'd be even as many as 20 unique queries in any one action, but 20-100 queries executed is about average, and it can be over ten thousand for one action, clearly painfull to any RDBM or programmer. And as Paul McGuire points out, RDBMs don't help avoid deadlocks, in fact it makes them easier to create in my opinion. Paul Rubin, I think you're right, a single lock is the approach I'm going to take. I don't want to seem stupid, but I don't understand what you're talking about with the Queues, isn't a Queue just a synchronized sequence? If you've got the time and you don't mind, I'd love a more detailed explanation. Niel, thanks for the link, I read through the article. >I try to pick up crumbs of knowledge from my co-workers, and one of the >smarter ones gave me this rubric for testing for deadlocks. That's true, I'm going to remember that one. Thanks a lot guys, I'm leaving on a trip tomorrow so I won't be able to reply to this thread again, but I will read any other posts on it when I come back. -Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: "also" to balance "else" ?
My first reaction was that this is terrible, else clauses on loops are confusing enough. But if I think about it more, I'm warming up to the idea. Also/Else for loops is clear, symmetrical, and would be useful. Reversing the meanign of else will break code, but it's not used that frequently, and it was a confusing thing to begin with, nothing wrong in breaking something (slowly!) if it was 'broken' to begin with. Alifs look evil, I couldn't deduce the control flow very easily, but then they are a new idea. I'll keep an open mind on that one. I think the best thing would be to compare the also/else syntax to what identical functionality looks like in python now [hint to someone with more time than me right now ;)]. I'd vote for whichever is the more concise and readable of the two. -Dan -- http://mail.python.org/mailman/listinfo/python-list
What do _ast Load | Store | Del | AugLoad | AugStore | Param mean?
In the _ast module Attribute, Subscript, Name, List, Tuple all have an expr_context associated with them which is defined as: expr_context = Load | Store | Del | AugLoad | AugStore | Param I have no idea what they mean, and what's the difference between them. I'm porting _ast to IronPython and this is the only part I've had difficulty understanding. The only clue I've found is that all these expressions are unique in that they can occur in assignment context. Any insights at all? Thanks, -Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: What do _ast Load | Store | Del | AugLoad | AugStore | Param mean?
On Oct 11, 9:22 pm, Benjamin <[EMAIL PROTECTED]> wrote: > Load: > a = 2 > Module(body=[Assign(targets=[Name(id='a', ctx=Store())], > value=Num(n=2))]) > > Store: > print a > Module(body=[Print(dest=None, values=[Name(id='a', ctx=Load())], > nl=True)]) > > Del: > del a > Module(body=[Delete(targets=[Name(id='a', ctx=Del())])]) > > Param: > def f(a): pass > Module(body=[FunctionDef(name='f', args=arguments(args=[Name(id='a', > ctx=Param())], vararg=None, kwarg=None, defaults=[]), body=[Pass()], > decorator_list=[])]) > > I hope that helps! > Very much! Thank you for such a clear explanation! Cheers, -Dan -- http://mail.python.org/mailman/listinfo/python-list
How about adding slice notation to iterators/generators?
I was just working with a generator for a tree that I wanted to skip the first result (root node.) And it occurs to me, why do we need to do: import sys from itertools import islice my_iter = islice(my_iter, 1, sys.maxint) When we could simply add slice operations to generators? for x in my_iter[1:]: pass The way I figure it, only if there is no __getitem__ defined, and the object has an __iter__ (i.e. not a list, but a iter(list) would be ok), then there should be a default __getitem__ that is really just islice, with similar limitations (no negative indices.) The idea might need a bit of polish, but fundamentally it seems like it could make it easier to deal with slicing generators? At the least, stop=sys.maxint, step=1, and islice needs to accept kwargs. So you could do islice(my_iter, start=1), that's an oversight imho. -- http://mail.python.org/mailman/listinfo/python-list
Re: How about adding slice notation to iterators/generators?
On Oct 16, 3:54 am, Terry Reedy wrote: > There is already an obvious standard way to do this. > > it = > next(it) #toss first item > for item in it: > > That fails if there is no first item. You're taking one corner case and saying there's an easy way to do it, which is more or less true, but you miss my point that I'm suggesting we make the general case easier. By giving iterators a default, overridable, __getitem__ that is just syntactic sugar for islice, they would share a slightly larger interface subset with the builtin container types. In a duck-typed language like python, that's almost always a good thing. You could use iterators in more situations which expect something more like a list. As long as it breaks no rationally existing code, I can think of no good reason why not to do this in a future python. -Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: How about adding slice notation to iterators/generators?
On Oct 16, 7:38 pm, Carl Banks wrote: > You would burden everyone who writes a custom iterator to provide a > __getitem__ method just because you're too lazy to type out the word > islice? No, of course not. That would be stupid. Custom iterators are iterators, so they would also get the default __getitem__ which would work just perfectly for them. If they needed to override it, they could. Remember, my proposal was to add a default __getitem__ for iterators that is really just islice. Ryles makes a good point though: >I think Python programmers have learned to expect certain things from >objects that support __getitem__. For example, indexing and slicing is >repeatable on the same object: That indexing/slicing iterators is not repeatable is likely to negate much of the advantage of supporting a larger interface (because it would be supported inconsistently, passing an iterator to something that expects __getitem__ could be surprising.) That does not mean it has no value in working with iterators, it offers the same value islice does, just more conveniently. Steven: >If you want to support slicing, go right ahead, but please, I beg you, >don't force me to support it in my iterators! As I said to Carl Banks, forcing you do something differently is not my aim here. Your other points apply equally to islice, which already exists, and do not invalidate its purpose. The main use case I'm thinking about here is skipping elements or limiting results from iterators, usually in for loops: for x in my_iter[5:]: ... for x in my_iter[:5]: ... Clearly that's easier and more readable than using islice. Nobody has yet to provide a concrete reason why that would be a _bad thing_. That doesn't make it a _good thing_, either. -Dan -- http://mail.python.org/mailman/listinfo/python-list
Find first matching substring
Almost every time I've had to do parsing of text over the last 5 years I've needed this function: def find_first(s, subs, start=None, end=None): results = [s.find(sub, start, end) for sub in subs] results = [r for r in results if r != -1] if results: return min(results) return -1 It finds the first matching substring in the target string, if there is more than one match, it returns the position of the match that occurs at the lowest index in the string. Has anyone else had problems where they could have applied this function? It seems to me that python's find (and rfind, index, rindex) could be modified (the same way that startswith and endswith have been) to behave this way if a tuple were passed. Do other's agree that this would be desirable? Thanks, Dan -- http://mail.python.org/mailman/listinfo/python-list
Dynamically pass a function arguments from a dict
You can take a dictionary of key/value pairs and pass it to a function as keyword arguments: def func(foo,bar): print foo, bar args = {'foo':1, 'bar':2} func(**args) will print "1 2" But what if you try passing those arguments to a function def func2(bar,zoo=''): print bar, zoo How can you determine that func2 will only accept bar and zoo, but not foo and call the function with bar as an argument? I know CherryPy does this, but it's not obvious from the source how they pulled it off. -Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: Dynamically pass a function arguments from a dict
Awesome, wrapping that into a function: def getargs(func): numArgs = func.func_code.co_argcount names = func.func_code.co_varnames return names[:numArgs] short, concise, and it works :) variables declared inside the function body are also in co_varnames but after the arguments only it seems A good module (huge, very comprehensive) for this kinda thing can be found at: http://lfw.org/python/inspect.py Thanks very much! -Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: How Do I get Know What Attributes/Functions In A Class?
Use the dir() function dir(myClass) returns a list of strings -Dan -- http://mail.python.org/mailman/listinfo/python-list
Re: Python v PHP for web, and restarting Apache?
On Nov 17, 12:07 pm, "walterbyrd" <[EMAIL PROTECTED]> wrote: > I think I have read somewhere that using Python to develop > web-applications requires some restarting of the Apache server, whereas > PHP does not. It depends what you do. CGI's operate much like PHP. mod_python has auto-reloading (and by implication most frameworks that build on it.) A poorly designed web application may require a restart. (Some people also disable auto-reload for production servers for performance advantages.) > Also, I seem to remember reading something about PHP being able to > recover from Apache restarting more easily than Python. I can think of no reason why that would be true. Perhaps these are poorly designed python applications? > I am not trying to suggest anything here. I'm just asking. Ask away. The only bad question is an unasked question. Most of us can act like adults here (although we all forget that from time to time.) Python is much better suited to writing and mainting large web applications though. Being both an experienced php and python programmer, I can tell you I don't use php any more unless I have no other choice. -Sandra -- http://mail.python.org/mailman/listinfo/python-list
How to test if an object IS another object?
If two objects are of equal value you can compare them with ==. What I want to do is find out if two objects are actually just references to the same object, how can I do this in Python? Thanks -- http://mail.python.org/mailman/listinfo/python-list