In article <87wqexmmuc....@elektro.pacujo.net>, Marko Rauhamaa <ma...@pacujo.net> wrote:
> * When you wake up from select() (or poll(), epoll()), you should treat > it as a hint. The I/O call (accept()) could still raise > socket.error(EAGAIN). People often misunderstand what select() does. The common misconception is that a select()ed descriptor has data waiting to be read. What the man page says is, "A file descriptor is considered ready if it is possible to perform the corresponding I/O operation (e.g., read(2)) without blocking." Not blocking includes failing immediately. And, once you introduce threading, things get even more complicated. Imagine two threads, both waiting in a select() call on the same socket. Data comes in on that socket. Both select() calls return. If both threads then do reads on the socket, you've got a race condition. One of them will read the data. The other will block in the read call, because the data has already been read by the other thread! So, yes, as Marko says, use select() as a hint, but then also do your reads in non-blocking mode, and be prepared for them to fail, regardless of whether select() said the descriptor was ready. > Note that modern software has to tolerate suspension (laptop lid, > virtual machines). Time is a tricky concept when your server wakes up > from a coma. Not to mention running in a virtual machine. Time is an equally tricky concept when your hardware clock is really some other piece of software playing smoke and mirrors. I once worked on a time-sensitive system which was running in a VM. The idiots who had configured the thing were running ntpd in the VM, to keep its clock in sync. Normally, this is a good thing, but they were ALSO using the hypervisor's clock management gizmo (vmtools?) to adjust the VM clock. The two mechanisms were fighting with each other, which did really weird stuff to time. It took me forever to figure out what was going on. How does one even observe that time is moving around randomly? I eventually ended up writing a trivial NTP client in Python (it's only a few lines of code) and periodically logging the difference between the local system clock and what my NTP reference was telling me. Of course, figuring out what was going on was the easy part. Convincing the IT drones to fix the problem was considerably more difficult. > * In each state, check that you handle all possible events and > timeouts. The state/transition matrix will be quite sizable even for > seemingly simple tasks. And, those empty boxes in the state transition matrix which are blank, because those transitions are impossible? Guess what, they happen, and you better have a plan for when they do :-) -- https://mail.python.org/mailman/listinfo/python-list