Re: PEP 393 vs UTF-8 Everywhere

Paul Rubin Sat, 21 Jan 2017 01:18:46 -0800

Chris Angelico <ros...@gmail.com> writes:
> You can't do a look-ahead with a vanilla string iterator. That's
> necessary for a lot of parsers.


For JSON?  For other parsers you usually have a tokenizer that reads
characters with maybe 1 char of lookahead.

> Yes, which gives a two-level indexing (first find the strand, then the
> character), and that's going to play pretty badly with CPU caches.

If you're jumping around at random all over the string, you probably
really want a bytearray rather than a unicode string.  If you're
scanning sequentually you won't have to look at the outer table very
often.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: PEP 393 vs UTF-8 Everywhere

Reply via email to