John Nagle <na...@animats.com> wrote: > As an example of code that really needs to run fast, but is > speed-limited by Python's limitations, see "tokenizer.py" in > > http://code.google.com/p/html5lib/ > > This is a parser for HTML 5, a piece of code that will be needed > in many places and will process large amounts of data. It's written > entirely in Python. Take a look at how much work has to be performed > per character. > > This is a good test for Python implementation bottlenecks. Run > that tokenizer on HTML, and see where the time goes. > > ("It should be written in C" is not an acceptable answer.)
You could compile it with Cython though. lxml took this route... -- Nick Craig-Wood <n...@craig-wood.com> -- http://www.craig-wood.com/nick -- http://mail.python.org/mailman/listinfo/python-list