On Wed, Jun 4, 2014 at 11:18 AM, Roy Smith <r...@panix.com> wrote: > In article <mailman.10656.1401842403.18130.python-l...@python.org>, > Chris Angelico <ros...@gmail.com> wrote: > >> A current discussion regarding Python's Unicode support centres (or >> centers, depending on how close you are to the cent[er]{2} of the >> universe) > > <sarcasm style="regex-pedant">Um, you mean cent(er|re), don't you? The > pattern you wrote also matches centee and centrr.</sarcasm>
Maybe there's someone who spells it that way! Let's not be excluding people. That'd be rude. >> around one critical question: Is string indexing common? > > Not in our code. I've got 80008 non-blank lines of Python (2.7) source > handy. I tried a few heuristics to find patterns which might be string > indexing. > > $ find . -name '*.py' | xargs egrep '\[[^]][0-9]+\]' > > and then looked them over manually. I see this pattern a bunch of times > (in a single-use script): > > data['shard_key'] = hashlib.md5(str(id)).hexdigest()[:4] Slicing is a form of indexing too, although in this case (slicing from the front) it could be implemented on top of UTF-8 without much problem. > withhyphen = number if '-' in number else (number[:-2] + '-' + > number[-2:]) # big assumption here This *definitely* counts; if strings were represented internally in UTF-8, this would involve two scans (although a smart implementation could probably count backward rather than forward). By the way, any time you slice up to the third from the end, you win two extra awesome points, just for putting [:-3] into your code and having it mean something. But I digress. > Anyway, there's a bunch more, but the bottom line is that in our code, > indexing into a string (at least explicitly in application source code) > is a pretty rare thing. Thanks. Of course, the pattern you searched for is looking only for literals; it's a bit harder to find cases where the index (or slice position) comes from a variable or expression, and those situations are also rather harder to optimize (the MD5 prefix is clearly better scanned from the front, the number tail is clearly better scanned from the back - but with a variable?). ChrisA -- https://mail.python.org/mailman/listinfo/python-list