Case-insensitive string equality

Steven D'Aprano Thu, 31 Aug 2017 00:19:06 -0700

Three times in the last week the devs where I work accidentally 
introduced bugs into our code because of a mistake with case-insensitive 
string comparisons. They managed to demonstrate three different failures:


# 1
a = something().upper()  # normalise string
... much later on
if a == b.lower(): ...


# 2
a = something().upper()
... much later on
if a == 'maildir': ...


# 3
a = something()  # unnormalised
assert 'foo' in a
... much later on
pos = a.find('FOO')



Not every two line function needs to be in the standard library, but I've 
come to the conclusion that case-insensitive testing and searches should 
be. I've made these mistakes myself at times, as I'm sure most people 
have, and I'm tired of writing my own case-insensitive function over and 
over again.


So I'd like to propose some additions to 3.7 or 3.8. If the feedback here 
is positive, I'll take it to Python-Ideas for the negative feedback :-)


(1) Add a new string method, which performs a case-insensitive equality 
test. Here is a potential implementation, written in pure Python:


def equal(self, other):
    if self is other:
        return True
    if not isinstance(other, str):
        raise TypeError
    if len(self) != len(other):
        return False
    casefold = str.casefold
    for a, b in zip(self, other):
        if casefold(a) != casefold(b):
            return False
    return True

Alternatively: how about a === triple-equals operator to do the same 
thing?



(2) Add keyword-only arguments to str.find and str.index:

    casefold=False

    which does nothing if false (the default), and switches to a case-
    insensitive search if true.




Alternatives:

(i) Do nothing. The status quo wins a stalemate.

(ii) Instead of str.find or index, use a regular expression.

This is less discoverable (you need to know regular expressions) and 
harder to get right than to just call a string method. Also, I expect 
that invoking the re engine just for case insensitivity will be a lot 
more expensive than a simple search need be.

(iii) Not every two line function needs to be in the standard library. 
Just add this to the top of every module:

def equal(s, t):
    return s.casefold() == t.casefold()


That's the status quo wins again. It's an annoyance. A small annoyance, 
but multiplied by the sheer number of times it happens, it becomes a 
large annoyance. I believe the annoyance factor of case-insensitive 
comparisons outweighs the "two line function" objection.

And the two-line "equal" function doesn't solve the problem for find and 
index, or for sets dicts, list.index and the `in` operator either.


Unsolved problems:

This proposal doesn't help with sets and dicts, list.index and the `in` 
operator either.



Thoughts?



-- 
Steven D'Aprano
“You are deluded if you think software engineers who can't write 
operating systems or applications without security holes, can write 
virtualization layers without security holes.” —Theo de Raadt
-- 
https://mail.python.org/mailman/listinfo/python-list

Case-insensitive string equality

Reply via email to