[EMAIL PROTECTED] wrote: > Carl J. Van Arsdall wrote: > >> [EMAIL PROTECTED] wrote: >> >>> Carl J. Van Arsdall wrote: >>> >>> I don't get what threading and Twisted would to do for >>> you. The problem you actually have is that you sometimes >>> need terminate these other process running other programs. >>> Use spawn, fork/exec* or maybe one of the popens. >>> >>> >> I have a strong need for shared memory space in a large distributed >> environment. >> > > Distributed shared memory is a tough trick; only a few systems simulate > it. > Yea, this I understand, maybe I chose some poor words to describe what I wanted. I think this conversation is getting hairy and confusing so I'm going to try and paint a better picture of what's going on. Maybe this will help you understand exactly what's going on or at least what I'm trying to do, because I feel like we're just running in circles. After the detailed explanation, if threads are the obvious choice or not, it will be much easier to pick apart what I need and probably also easier for me to see your point... so here goes... (sorry its long, but I keep getting dinged for not being thorough enough).
So, I have a distributed build system. The system is tasked with building a fairly complex set of packages that form a product. The system needs to build these packages for 50 architectures using cross compilation as well as support for 5 different hosts. Say there are also different versions of this with tweaks for various configurations, so in the end I might be trying to build 200+ different things at once. I have a computing farm of 40 machines to do this for me.. That's the high-level scenario without getting too detailed. There are also subsystems that help us manage the machines and things, I don't want to get into that, I'm going to try to focus on a scenario more abstract than cluster/resource management stuff. Alright, so manually running builds is going to be crazy and unmanageable. So what the people who came before me did to manage this scenario was to fork on thread per build. The threads invoke a series of calls that look like os.system(ssh <host> <command>) or for more complex operations they would just spawn a process that ran another python script) os.system(ssh <host> <script>) The purpose behind all this was for a couple things: * The thread constantly needed information about the state of the system (for example we don't want to end up building the same architecture twice) * We wanted a centralized point of control for an entire build * We needed to be able to use as many machines as possible from a central location. Python threads worked very well for this. os.system behaves a lot like many other IO operations in python and the interpreter gives up the GIL. Each thread could run remote operations and we didn't really have any problems. There wasn't much of a need to do fork, all it would have done is increased the amount of memory used by the system. Alright, so this scheme that was first put in place kind of worked. There were some problems, for example when someone did something like os.system(ssh <host> <script>) we had no good way of knowing what the hell happened in the script. Now granted, they used shared files to do some of it over nfs mounts, but I really hate that. It doesn't work well, its clunky, and difficult to manage. There were other problems too, but I just wanted to give a sample. Alright, so things aren't working, I come on board, I have a boss who wants things done immediately. What we did was created what we called a "Python Execution Framework". The purpose of the framework was to mitigate a number of problems we had as well as take the burden of distribution away from the programmers by providing a few layers of abstraction (i'm only going to focus on the distributed part of the framework, the rest is irrelevant to the discussion). The framework executes and threads modules (or lists of modules). Since we had limited time, we designed the framework with "distribution environment" in mind but realized that if we shoot for the top right away it will take years to get anything implemented. Since we knew we eventually wanted a distributed system that could execute framework modules entirely on remote machines we carefully design and prepared the system for this. This involves some abstraction and some simple mechanisms. However right now each ssh call will be executed from a thread (as they will be done concurrently, just like before). The threads still need to know about the state of the system, but we'd also like to be able to issue some type of control that is more event driven -- this can be sending the thread a terminate message or sending the thread a message regarding the completion of a dependency (we use conditions and events to do this synchronization right now). We hoped that in the case of a catastrophic event or a user 'kill' signal that the the system could take control of all the threads (or at least, ask them to go away), this is what started the conversation in the first place. We don't want to use a polling loop for these threads to check for messages, we wanted to use something event driven (I mistakenly used the word interrupt in earlier posts, but I think it still illustrates my point). Its not only important that the threads die, but that they die with grace. There's lots of cleanup work that has to be done when things exit or things end up in an indeterminable state. So, I feel like I have a couple options, 1) try moving everything to a process oriented configuration - we think this would be bad, from a resource standpoint as well as it would make things more difficult to move to a fully distributed system later, when I get my army of code monkeys. 2) Suck it up and go straight for the distributed system now - managers don't like this, but maybe its easier than I think its going to be, I dunno 3) See if we can find some other way of getting the threads to terminate. 4) Kill it and clean it up by hand or helper scripts - we don't want to do this either, its one of the major things we're trying to get away from. Alright, that's still a fairly high-level description. After all that, if threads are still stupid then I think I'll much more easily see it but I hope this starts to clear up confused. I don't really need a distributed shared memory environment, but right now I do need shared memory and it needs to be used fairly efficiently. For a fully distributed environment I was going to see what various technologies offered to pass data around, I figured that they must have some mechanism for doing it or at least accessing memory from a central location (we're setup to do this now we threads, we just need to expand the concept to allow nodes to do it remotely). Right now, based on what I have to do I think threads are the right choice until I can look at a better implementation (i hear twisted is good at what I ultimately want to do, but I don't know a thing about it). Alright, if you read all that, thanks, and thanks for your input. Whether or not I've agreed with anything, me and a few colleagues definitely discuss each idea as its passed to us. For that, thanks to the python list! -carl -- Carl J. Van Arsdall [EMAIL PROTECTED] Build and Release MontaVista Software -- http://mail.python.org/mailman/listinfo/python-list