On Tue, Jun 3, 2014 at 4:36 PM, Marko Rauhamaa <ma...@pacujo.net> wrote: > I have yet to see that in practice. The "typical" thread works as > follows: > > while True: > while request.incomplete(): > request.read() # block > sql_stmt = request.process() > db.act(sql_stmt) # block > db.commit() # block > response = request.ok_response() > while response.incomplete(): > response.write() # block > > The places marked with the "block" comment are states with only one > valid input stimulus. > ... > Yes, a "nest of callbacks" can get messy very quickly. That is why you > need to be very explicit with your states. Your class needs to have a > state field named "state" with clearly named state values.
Simple/naive way to translate this into a callback system is like this: def request_read_callback(request, data): request.read(data) # however that part works if not request.incomplete(): request.process() def write(request, data): request.write_buffer += data request.attempt_write() # sets up callbacks for async writing def request.process(self): # I know this isn't valid syntax db.act(whatever) # may block but shouldn't for long db.commit() # ditto write(self, response) # won't block This works as long as your database is reasonably fast and close (common case for a lot of web servers: DB runs on same computer as web and application and etc servers). It's nice and simple, lets you use a single database connection (although you should probably wrap it in a try/finally to ensure that you roll back on any exception), and won't materially damage throughput as long as you don't run into problems. For a database driven web site, most of the I/O time will be waiting for clients, not waiting for your database. Getting rid of those blocking database calls means having multiple concurrent transactions on the database. Whether you go async or threaded, this is going to happen. Unless your database lets you run multiple simultaneous transactions on a single connection (I don't think the Python DB API allows that, and I can't think of any DB backends that support it, off hand), that means that every single concurrency point needs its own database connection. With threads, you could have a pool of (say) a dozen or so, one per thread, with each one working synchronously; with asyncio, you'd have to have one for every single incoming client request, or else faff around with semaphores and resource pools and such manually. The throughput you gain by making those asynchronous with callbacks is quite probably destroyed by the throughput you lose in having too many simultaneous connections to the database. I can't prove that, obviously, but I do know that PostgreSQL requires up-front RAM allocation based on the max_connections setting, and trying to support 5000 connections started to get kinda stupid. So how do you deal with the possibility that the database will block? "Pure" threading (one thread listens for clients, spin off a thread for each client, end the thread when the client disconnects) copes poorly; async I/O copes poorly. The thread pool copes well (you know exactly how many connections you'll need - one per thread in the pool), but doesn't necessarily solve the problem (you can get all threads waiting on the database and none handling other requests). Frankly, I think the only solution is to beef up the database so it won't block for too long (and, duh, to solve any stupid locking problems, because they WILL kill you :) ). > If threads simplify an asynchronous application, that is generally done > by oversimplifying and reducing functionality. Which means that I disagree with this statement. In my opinion, both simple models (pure threading and asyncio) can express the same functionality; the hybrid thread-pool model may simplify things a bit in the interests of resource usage; but threading does let you think about code the same way for one client as for fifty, without any change of functionality. Compare: # Console I/O: def print_menu(): print("1: Spam") print("2: Ham") print("3: Quit") def spam(): print("Spam, spam, spam, spam,") while input("Continue? ")!="NO!": print("spam, spam, spam...") def mainloop(): print("Welcome!") while True: print_menu() x = int(input("What would you like? ")) if x == 1: spam() elif x == 2: ham() elif x == 3: break else: print("I don't know numbers like %d."%x) print("Goodbye!") I could translate this into a pure-threading system very easily: # Socket I/O: import consoleio class TerminateRequest(Exception): pass tls = threading.local() def print(s): tls.socket.write(s+"\r\n") # Don't forget, most of the internet uses \r\n! def input(prompt): tls.socket.write(s) while '\n' not in tls.readbuffer: tls.readbuffer += tls.socket.read() if not tls.socket.connected(): raise TerminateRequest ret, _, tls.readbuffer = tls.readbuffer.partition("\n") return ret.strip("\r") consoleio.print = print consoleio.input = input def thread(sock): tls.socket = sock tls.readbuffer = "" try: consoleio.mainloop() except TerminateRequest: pass mainsock = socket.socket() # bind to port, whatever mainsock.listen(5) while True: threading.Thread(target=thread,args=(mainsock.accept(),)) The only changes I made to the application main loop are to change (via injection) print and input to use the socket. Everything else - the (bad) logic inside spam(), helper functions, etc, etc - works exactly the same way. So if you're coming from a single-thread single-client starting point, it's definitely way simpler to follow a model like this. (And I have written enough programs along this sort of line to be able to write the above without more than some quick checks to see how thread-local storage is done in Python. It's a really simple model. On the flip side, that does also mean the code above is completely untested, not to mention stubby in places, so it probably won't actually run.) How do you translate that to asynchronous I/O? Normally, with piles and piles of callbacks. I haven't looked into tulip yet, but my understanding is that it may be possible to tulipify the above code in the same way that I socketified it - you'd simply have a "yield from" inside input(), which would let the system go and deal with other requests; but otherwise, you would need to completely rewrite the main code. The exact same functionality would look a lot messier; and you have the exact same problems to deal with either way. Your connection state has to cope with both input and output buffers (since you have to make sure you don't block inside print() either), and while you can in most applications assume that there won't be a huge amount of output buffering (and can therefore let mainloop() pretend that it all got written instantly, while the actual writing happens in the background), sometimes you have to be explicit about that part of state too. So the consoleio module above would need a lot of separate states: every single print() or input() call becomes its own state, in which the connection might stall. Do you really want to do that with callbacks? Because it would *definitely* be less clear. ChrisA -- https://mail.python.org/mailman/listinfo/python-list