Dan Sugalski wrote:

Okay, here's a question for everyone to hash out.

Assuming I have a parrot string which is explicitly marked as a binary string...

What should happen when it's told to upcase/downcase/titlecase itself? (You may assume that we have strings which are explicitly marked at least Unicode, so there is a difference between STRING* which are text and binary)

Not necessarily a good example to follow, but take it for what it is worth:

Python has two data types: str and unicode. The unicode type can be unambiguously viewed as a sequence of characters. The string type, however, is a sequence of bytes. Taking a unicode string and calling the encode() method on it returns a str, reinforcing the notion that str can be viewed as binary.

However, str has an upper() method defined on it. The way it operates is to take the range of bytes that correspond to us-ascii and perform a us-ascii uppercase on them. The remaining bytes are left alone. Example output:

>>> u'\u0061\u00e1'.upper()
u'A\xc1'
>>> '\x61\xe1'.upper()
'A\xe1'
>>> u'\u0061\u00e1'.encode('iso-8859-1').upper()
'A\xe1'

- Sam Ruby

Reply via email to