On Sat, Jul 4, 2009 at 1:33 PM, John Nagle<na...@animats.com> wrote: > As an example of code that really needs to run fast, but is > speed-limited by Python's limitations, see "tokenizer.py" in > > http://code.google.com/p/html5lib/ > > This is a parser for HTML 5, a piece of code that will be needed > in many places and will process large amounts of data. It's written > entirely in Python. Take a look at how much work has to be performed > per character. > > This is a good test for Python implementation bottlenecks. Run > that tokenizer on HTML, and see where the time goes. > > ("It should be written in C" is not an acceptable answer.) > > Python doesn't have a "switch" or "case" statement, and when > you need a state machine with many states, that makes for painful, > slow code. There's a comment in the code that it would be useful > to run a few billion lines of HTML through an instrumented version > of the parser to decide in which order the IF statements should be > executed. You shouldn't have to do that. >
If you're cases are hashable, just use a dict instead of the if chain. Then you get the constant time access to it. def func_a() : ... def func_b(): ... def func_c(): .... case = {"a":func_a, "b":func_b, "c":func_c} case[value]() > Yes, I've read PEP 3103. The big problem is the difficulty of figuring > out what's a constant and what might change. If all the cases are > constants, > case statements are easy. But in Python, the compiler can't tell. > > Parsers have many named compile-time constants. Python doesn't support > named compile-time constants, and this is one of the places where we > have to pay the bill for that limitation. > > Something to think about when you need three more racks of servers > because the HTML parser is slow. > > John Nagle > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list