On Thu, 30 Aug 2012 16:44:32 -0400, Terry Reedy wrote: > On 8/30/2012 12:00 PM, Steven D'Aprano wrote: >> On Thu, 30 Aug 2012 07:02:24 -0400, Roy Smith wrote: [...] >>> Is the implementation smart enough to know that x == y is always False >>> if x and y are using different internal representations? > > Yes, after checking lengths, and in same circumstances, x != y is True. [snip C code]
Thanks Terry for looking that up. > 'a in s' is also False if a chars are wider than s chars. Now that's a nice optimization! [...] >> But x and y are not necessarily always False just because they have >> different representations. There may be circumstances where two strings >> have different internal representations even though their content is >> the same, so it's an unsafe optimization to automatically treat them as >> unequal. > > I am sure that str objects are always in canonical form once visible to > Python code. Note that unready (non-canonical) objects are rejected by > the rich comparison function. That's one thing that I'm unclear about -- under what circumstances will a string be in compact versus non-compact form? Reading between the lines, I guess that a lot of the complexity of the implementation only occurs while a string is being built. E.g. if you have Python code like this: ''.join(str(x) for x in something) # a generator expression Python can't tell how much space to allocate for the string -- it doesn't know either the overall length of the string or the width of the characters. So I presume that there is string builder code for dealing with that, and that it involves resizing blocks of memory. But if you do this: ''.join([str(x) for x in something]) # a list comprehension Python could scan the list first, find out the widest char, and allocate exactly the amount of space needed for the string. Even in Python 2, joining a list comp is much faster than joining a gen expression. -- Steven -- http://mail.python.org/mailman/listinfo/python-list