On Sun, 19 Aug 2012 01:11:56 -0700, Paul Rubin wrote: > Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> writes: >> result = text[end:] > > if end not near the end of the original string, then this is O(N) even > with fixed-width representation, because of the char copying.
Technically, yes. But it's a straight copy of a chunk of memory, which means it's fast: your OS and hardware tries to make straight memory copies as fast as possible. Big-Oh analysis frequently glosses over implementation details like that. Of course, that assumption gets shaky when you start talking about extra large blocks, and it falls apart completely when your OS starts paging memory to disk. But if it helps to avoid irrelevant technical details, change it to text[end:end+10] or something. > if it is near the end, by knowing where the string data area ends, I > think it should be possible to scan backwards from the end, recognizing > what bytes can be the beginning of code points and counting off the > appropriate number. This is O(1) if "near the end" means "within a > constant". You know, I think you are misusing Big-Oh analysis here. It really wouldn't be helpful for me to say "Bubble Sort is O(1) if you only sort lists with a single item". Well, yes, that is absolutely true, but that's a special case that doesn't give you any insight into why using Bubble Sort as your general purpose sort routine is a terrible idea. Using variable-sized strings like UTF-8 and UTF-16 for in-memory representations is a terrible idea because you can't assume that people will only every want to index the first or last character. On average, you need to scan half the string, one character at a time. In Big-Oh, we can ignore the factor of 1/2 and just say we scan the string, O(N). That's why languages tend to use fixed character arrays for strings. Haskell is an exception, using linked lists which require traversing the string to jump to an index. The manual even warns: [quote] If you think of a Text value as an array of Char values (which it is not), you run the risk of writing inefficient code. An idiom that is common in some languages is to find the numeric offset of a character or substring, then use that number to split or trim the searched string. With a Text value, this approach would require two O(n) operations: one to perform the search, and one to operate from wherever the search ended. [end quote] http://hackage.haskell.org/packages/archive/text/0.11.2.2/doc/html/Data-Text.html -- Steven -- http://mail.python.org/mailman/listinfo/python-list