You can check len(sanesplit) to see how big your list is. If it is < 2, then there were no <td>'s, so move on to the next line.
It is probably possible to do the whole thing with a regular expression. It is probably not wise to do so. Regular expressions are difficult to read, and, as you discovered, difficult to program and debug. In many cases, Python code that relies on regular expressions for lots of program logic runs slower than code that uses normal Python. Suppose "words" contains all the words in English. Compare these two lines: foobarwords1 = [x for x in words if re.search("foo|bar", x) ] foobarwords2 = [x for x in words if "foo" in x or "bar" in x ] I haven't tested this with 2.4, but as of a few years ago it was a safe bet that foobarwords2 will be calculated much, much faster. Also, I think you will agree, foobarwords2 is a lot easier to read. -- http://mail.python.org/mailman/listinfo/python-list