On Wed, 11 Jul 2007 07:00:18 -0700, Andrew Warkentin <[EMAIL PROTECTED]> wrote: >On Jul 10, 8:19 pm, Steve Holden <[EMAIL PROTECTED]> wrote: >> Bjoern Schliessmann wrote: >> > Andrew Warkentin wrote: >> >> >> I am going to write a general-purpose modular proxy in Python. It >> >> will consist of a simple core and several modules for things like >> >> filtering and caching. I am not sure whether it is better to use >> >> multithreading, or to use an event-driven networking library like >> >> Twisted or Medusa/ Asyncore. Which would be the better >> >> architecture to use? >> >> > I'd definitely use an event-driven approach with Twisted. >> >> > Generally, multithreading is less performant than multiplexing. High >> > performance servers mostly use a combination of both, though. >> >> Converselt I'd recommend Medusa - not necessarily because it's "better", >> but becuase I know it better. There's also a nice general-purpose proxy >> program (though I'd be surprised if Twisted didn't also have one). >> >> >Would an event-driven proxy be able to handle multiple connections >with large numbers of possibly CPU-bound filters? I use The >Proxomitron (and would like to write my own proxy that can use the >same filter sets, but follows the Unix philosophy) and some of the >filters appear to be CPU-bound, because they cause The Proxomitron to >hog the CPU (although that might just be a Proxomitron design flaw or >something). Wouldn't CPU-bound filters only allow one connection to be >filtered at a time? On the Medusa site, it said that an event-driven >architecture only works for I/O-bound programs. >
Handling all of your network traffic with a single OS thread doesn't necessarily mean that all of your filters need to run in the same thread (or even in the same process, or on the same computer). Typically, however, a filtering rule should only need to operate on a small number of bytes (almost always only a few kilobytes). Is it the case that handling even this amount of data incurs a significant CPU cost? If not, then there's probably nothing to worry about here, and you can do everything in a single thread. If it is the case, then you might want to keep around a thread pool (or process pool, or cluster) and push the filtering work to it, reserving the IO thread strictly for IO. This is still a win, since you end up with a constant number of processes vying for CPU time (and you can tune this to an ideal value given your available hardware), rather than one per connection. This translates directly into reduced context switch overhead. Jean-Paul -- http://mail.python.org/mailman/listinfo/python-list