Re: palindrome iteration

Bearophile Tue, 14 Sep 2010 17:40:05 -0700

Baba:

> def i_palindrome(pal):
>  while len(pal)>1:
>   if pal[0] == pal[-1]:
>    pal=pal[1:-1]
>  return True
>
> print i_palindrome('annab')



In normal programming a significant percentage of the time is spent
debugging. Experience shows that even short functions may be buggy. If
you don't want to waste too much time debugging your code you need to
adopt a bit more rigorous approach to write programs. Learning to play
piano requires a lot of self-discipline, programming too needs some.


Ask yourself what are the exact purposes of your function/class/
program. And then define what are the correct outputs for all input
corner cases you are able to find. This is TDD, Thought-driven
development.

For this program, what's the right output for mixed case strings, for
empty strings, for input strings that contain many words with mixed
white space and punctuation between them, for unicode strings that
contain weird characters? Generally even simple functions may become
very complex if you want to manage even complex input cases.

You are free to ignore some of those inputs, to keep your code simple
enough, but it's usually better for your function to not just ignore
some inputs, but give some kind of error for the inputs you don't want
to consider.


Let's say you accept simple Unicode strings, you don't want to ignore
white space, an empty string is palindrome, text cases need to be
ignored.

I don't like Test-Driven development a lot because your most powerful
tool is your brain&mind and not your unit-tests, but I like tests a
lot. So you need tests for this small function too. Here doctests are
enough. Your first test is better to fail, to be sure your doctest is
working:


def is_palindrome(text):
    """
    >>> is_palindrome("xy")
    False
    """
    return True

if __name__ == "__main__":
    import doctest
    doctest.testmod()
    print "Doctests done."


Code formatting and function and argument names are important to keep
code readable to yourself, improve its future maintenance, and reduce
the total bug count.

You may write this Python function in a high level style like this:

def is_palidrome(text):
    ltext = text.lower()
    return ltext == ltext[::-1]


But that doesn't teach you much programming, so for learning purposes
you may try to create one function in lower-level style (that also
doesn't stress the garbage collector, despite probably being overall
slower in Python). So we use iterative loops and single char tests.
But if you use a language as Scheme then a recursive solution too is a
normal option.

Now you need to invent the algorithm that solves your problem.
Inventing algorithms that solve your problems is an art that requires
training, experience and even ideas from this little book:
http://en.wikipedia.org/wiki/How_to_Solve_It

There are few ways to solve that problem, one possible way it to look
at the first and last char of the string (ignoring their case), if
they are different the given word is not palindrome, otherwise you
look at the second and penultimate char, and you perform similar
comparisons for all the char pairs. If your string length is odd there
is no need to test the last char, but if you want to test it with
itself because it keeps the code simpler it's not a problem.

To avoid bugs like ones in your code you may think about the loop
invariant and loop variant. This blog post shows an example to follow:
http://reprog.wordpress.com/2010/04/25/writing-correct-code-part-1-invariants-binary-search-part-4a/

I use two indexes, i and j, that move forward and backwards in sync
starting from the first and last char of the 'text' string. The
program stops when they are on the same char or they cross.

I have used an elaborate loop invariant, for this simple program it's
overkill, but shows how you may do it.


      i ==>                <== j
     +--+--+--+--+--+--+--+--+--+
text |  |  |  |  |  |  |  |  |  |
     +--+--+--+--+--+--+--+--+--+
      0  1  2  3  4  5  6  7  8


def _is_palindrome_loop_invariant(i, j, length):
    assert i >= 0
    assert i < length
    assert j >= 0
    assert j < length
    assert i <= j
    assert i == length - 1 - j
    return True


def is_palindrome(text):
    """
    >>> is_palindrome("")
    True
    >>> is_palindrome("1")
    True
    >>> is_palindrome(u"x")
    True
    >>> is_palindrome("aa")
    True
    >>> is_palindrome("ab")
    False
    >>> is_palindrome("abc")
    False
    >>> [is_palindrome(s) for s in ["abA", "ABA", "ABA"]]
    [True, True, True]
    >>> is_palindrome("aibohphobia")
    True
    >>> is_palindrome("aibohphobia" * 1000)
    True
    >>> is_palindrome(list("aibohphobia"))
    True
    >>> is_palindrome([1, 2, 3])
    Traceback (most recent call last):
      ...
    AttributeError: 'int' object has no attribute 'lower'
    """
    n = len(text)
    i = 0
    j = n - 1
    while i < j:
        assert _is_palindrome_loop_invariant(i, j, n)
        if text[i].lower() != text[j].lower():
            return False
        else:
            i += 1
            j -= 1
    return True


if __name__ == "__main__":
    import doctest
    doctest.testmod()
    print "Doctests done."


is_palindrome([1, 2, 3]) fails not because the input is a list, but
because integers lack the lower() method. It's useful to add few
failing doctests too, to show what are the limits of the duck typing
of the function. As you see I have tried some corner cases, empty
string, single char, two chars, three chars, and more. I have also
added a 'stress test' with a larger input.

In Python this code is not efficient because of the interpreter
overhead and because I call a lower() method on each char (string of
length 1), so this is not good Python code. But it's easy to translate
to a lower level language, and I think Psyco does a good enough work
on it.

In C language (plus stdbool) the tolower() is more efficient, but
strlen() wastes some time. Often it's better to keep around the length
too of your strings, for example using a "fat pointer", as in D.


// C code
//#define NDEBUG
#include "assert.h"
#include "stdio.h"
#include "string.h"
#include "stdbool.h"
#include "ctype.h"
#include "stdlib.h"

bool is_palindrome_loop_invariant(int i, int j, int length) {
    assert(i >= 0);
    assert(i < length);
    assert(j >= 0);
    assert(j < length);
    assert(i <= j);
    assert(i == length - 1 - j);
    return true;
}


bool is_palindrome(const char *text) {
    const int n = strlen(text);
    int i = 0;
    int j = n - 1;
    while (i < j) {
        assert(is_palindrome_loop_invariant(i, j, n));
        if (tolower(text[i]) != tolower(text[j])) {
            return false;
        } else {
            i++;
            j--;
        }
    }
    return true;
}


void is_palindrome_test() {
    assert(is_palindrome(""));
    assert(is_palindrome("1"));
    assert(is_palindrome("aa"));
    assert(!is_palindrome("ab"));
    assert(!is_palindrome("abc"));
    assert(is_palindrome("aba"));
    assert(is_palindrome("abA"));
    assert(is_palindrome("AbA"));
    assert(is_palindrome("ABA"));
    assert(is_palindrome("aibohphobia"));
    assert(!is_palindrome("aaaaabaaaa"));

    const int n = 1000;
    const char *s = "aibohphobia";
    const int len_s = strlen(s);
    char *text = malloc(n * len_s + 1);
    if (text == NULL)
        exit(EXIT_FAILURE);
    int i;
    char *p = text;
    for (i = 0; i < n; i++) {
        memcpy(p, s, len_s);
        p += len_s;
    }
    text[n * len_s] = '\0';
    // puts(text);
    assert(is_palindrome(text));
}


int main() {
    is_palindrome_test();
    return EXIT_SUCCESS;
}


C experts (or good C lints) are probably able to find some loss of
precision or loss of sign errors in the following lines:

const int n = strlen(text);
const int len_s = strlen(s);
char *text = malloc(n * len_s + 1);
memcpy(p, s, len_s);

Writing perfect C code isn't an easy art. (I have used signed values
because writing correct C code with unsigned values is a pain, despite
avoids loss of precision or sign).


Compiling the C code with GCC 4.5.1, and disabled assertions (NDEBUG
defined):
-O3 -Wall -fomit-frame-pointer -S

The loop of is_palindrome() gets compiled to (x86, 32 bit):

L10:
        addl    $1, %esi
        subl    $1, %ebx
        cmpl    %ebx, %esi
        jge     L9
L4:
        movsbl  (%edi,%esi), %eax
        movl    %eax, (%esp)
        call    _tolower
        movl    %eax, %ebp
        movsbl  (%edi,%ebx), %eax
        movl    %eax, (%esp)
        call    _tolower
        cmpl    %eax, %ebp
        je      L10


Those two calls to _tolower inside the loop don't help performance,
but if you know your strings contain only upper case and lower case
chars, and you are able to assume few other things on your machine,
it's easy to replace them with non-portable inlined code.

The addl and subl are the i++ and j--, GCC has moved them at top as
commonly done optimization.

The first cmpl is the (i < j) test, and the jge is a jump that ends
the loop when it's the right time to do so.

The movsbl go read one char, and put them in eax. The movl are
necessary to set arguments for the call to the tolower function.

The last cmpl compares the results of the tolower, and je (jump equal)
jumps to the start of the loop if they are equal.

Bye,
bearophile
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: palindrome iteration

Reply via email to