On Sun, Aug 19, 2012 at 1:10 PM, Paul Rubin <no.email@nospam.invalid> wrote: > Chris Angelico <ros...@gmail.com> writes: >> I don't have a Python example of parsing a huge string, but I've done >> it in other languages, and when I can depend on indexing being a cheap >> operation, I'll happily do exactly that. > > I'd be interested to know what the context was, where you parsed > a big unicode string in a way that required random access to > the nth character in the string.
It's something I've done in C/C++ fairly often. Take one big fat buffer, slice it and dice it as you get the information you want out of it. I'll retain and/or calculate indices (when I'm not using pointers, but that's a different kettle of fish). Generally, I'm working with pure ASCII, but port those same algorithms to Python and you'll easily be able to read in a file in some known encoding and manipulate it as Unicode. It's not so much 'random access to the nth character' as an efficient way of jumping forward. For instance, if I know that the next thing is a literal string of n characters (that I don't care about), I want to skip over that and keep parsing. The Adobe Message Format is particularly noteworthy in this, but it's a stupid format and I don't recommend people spend too much time reading up on it (unless you like that sensation of your brain trying to escape through your ear). ChrisA -- http://mail.python.org/mailman/listinfo/python-list