On 07/19/2013 09:13 PM, Chris Angelico wrote:
On Sat, Jul 20, 2013 at 11:04 AM, Devyn Collier Johnson
<devyncjohn...@gmail.com> wrote:
On 07/19/2013 07:09 PM, Dave Angel wrote:
On 07/19/2013 06:08 PM, Devyn Collier Johnson wrote:
On 07/19/2013 01:59 PM, Steven D'Aprano wrote:
<snip>
As for the case-insensitive if-statements, most code uses Latin letters.
Making a case-insensitive-international if-statement would be
interesting. I can tackle that later. For now, I only wanted to take
care of Latin letters. I hope to figure something out for all characters.
Once Steven gave you the answer, what's to figure out? You simply use
casefold() instead of lower(). The only constraint is it's 3.3 and later,
so you can't use it for anything earlier.
http://docs.python.org/3.3/library/stdtypes.html#str.casefold
"""
str.casefold()
Return a casefolded copy of the string. Casefolded strings may be used for
caseless matching.
Casefolding is similar to lowercasing but more aggressive because it is
intended to remove all case distinctions in a string. For example, the
German lowercase letter 'ß' is equivalent to "ss". Since it is already
lowercase, lower() would do nothing to 'ß'; casefold() converts it to "ss".
The casefolding algorithm is described in section 3.13 of the Unicode
Standard.
New in version 3.3.
"""
Chris Angelico said that casefold is not perfect. In the future, I want to
make the perfect international-case-insensitive if-statement. For now, my
code only supports a limited range of characters. Even with casefold, I will
have some issues as Chris Angelico mentioned. Also, "ß" is not really the
same as "ss".
Well, casefold is about as good as it's ever going to be, but that's
because "the perfect international-case-insensitive comparison" is a
fundamentally impossible goal. Your last sentence hints as to why;
there is no simple way to compare strings containing those characters,
because the correct treatment varies according to context.
Your two best options are: Be case sensitive (and then you need only
worry about composition and combining characters and all those
nightmares - the ones you have to worry about either way), or use
casefold(). Of those, I prefer the first, because it's safer; the
second is also a good option.
ChrisA
Thanks everyone (especially Chris Angelico and Steven D'Aprano) for all
of your helpful suggests and ideas. I plan to implement casefold() in
some of my programs.
Mahalo,
DCJ
--
http://mail.python.org/mailman/listinfo/python-list