Re: Code that ought to run fast, but can't due to Python limitations.

Paul Rubin Sat, 04 Jul 2009 16:51:26 -0700

John Nagle <na...@animats.com> writes:
>     A dictionary lookup (actually, several of them) for every
> input character is rather expensive. Tokenizers usually index into
> a table of character classes, then use the character class index in
> a switch statement.


Maybe you could use a regexp (and then have -two- problems...) to
find the token boundaries, then a dict to identify the actual token.
Tables of character classes seem a bit less attractive in the Unicode
era than in the old days.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Code that ought to run fast, but can't due to Python limitations.

Reply via email to