John Nagle <na...@animats.com> wrote:
>      As an example of code that really needs to run fast, but is
>  speed-limited by Python's limitations, see "tokenizer.py" in
> 
>       http://code.google.com/p/html5lib/
> 
>  This is a parser for HTML 5, a piece of code that will be needed
>  in many places and will process large amounts of data. It's written
>  entirely in Python.  Take a look at how much work has to be performed
>  per character.
> 
>  This is a good test for Python implementation bottlenecks.  Run
>  that tokenizer on HTML, and see where the time goes.
> 
>  ("It should be written in C" is not an acceptable answer.)

You could compile it with Cython though.  lxml took this route...

-- 
Nick Craig-Wood <n...@craig-wood.com> -- http://www.craig-wood.com/nick
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to