On Sun, Jan 6, 2013 at 2:38 AM, Roy Smith <r...@panix.com> wrote: > In article <mailman.121.1357398573.2939.python-l...@python.org>, > Chris Angelico <ros...@gmail.com> wrote: > >> it may or may not run faster than the explicit state machine, > > You got me by a factor of 3 or 4. Not bad.
You miss my point, though. I went for simple Pythonic code, and never measured its performance, on the expectation that it's "good enough". Written in C, the state machine is probably WAY faster than splitting and then iterating. My C++ MUD client uses code similar to that to parse TELNET and ANSI codes from a stream of bytes in a socket (and one of its "states" is that there's no more data available, so wait on the socket); the rewrite in a high level language divides the string on "\xFF" for TELNET and "\x1B" for ANSI, working them separately, and then afterward splits on "\n" to divide into lines. The code's much less convoluted, it's easier to test different parts (because I can simply call the ANSI parser with a block of text), and on a modern computer, you can't see the performance difference (since you spend most of your time waiting for socket data anyway). But it's gratifying to know that the obvious and brief way to do things is fast too :) >> but IMHO it's a lot clearer to read a split() than something >> that changes state when a particular character is found. > > Maybe. But being familiar with state machines is still a handy skill. > DNA sequence analysis has lots of problems like "find a start codon > which is within about 50 bases of a binding site, and then copy > everything up until you find a stop codon". Things like that often map > well to state machines. Especially if you're trying to do it in > parallel in all three reading frames. Sure. And if you're working with petabytes of data, these considerations become fairly important. When that happens, you start rewriting your algorithms in C, or using Cython, or something; at very least, you start rewriting clear and simple algorithms into more complex ones. But all this happens *after* the code has been tested and proven. All the rewrites can be verified as being identical to their reference implementations; you can test one piece at a time as you change them. It's ever so much easier to work that way. <anecdote>At work, we had one employee whose code was, shall we say, less than stellar. At one point, he embarked on a months-long rewrite of one of his modules; meanwhile, I was unable to adequately test code that called on it. Once the rewrite was finally complete, I discovered myriad bugs in my own code, ones that would have been found and fixed instantly if I'd had even a slow version of the code to work against. Starting with something you can easily debug helps enormously with that, because debugging doesn't demand mega-TPS throughput.</anecdote> ChrisA -- http://mail.python.org/mailman/listinfo/python-list