On Jul 23, 11:53 am, Paul McGuire <pt...@austin.rr.com> wrote: > On Jul 22, 5:43 pm, Filip <pink...@gmail.com> wrote:
> > # Needs re.IGNORECASE, and can have tag attributes, such as <BR > CLEAR="ALL"> > line_break_re = re.compile('<br\/?>', re.UNICODE) Just in case somebody actually uses valid XHTML :-) it might be a good idea to allow for <br /> > # what about HTML entities defined using hex syntax, such as &#xxxx; > amp_re = re.compile('\&(?![a-z]+?\;)', re.UNICODE | re.IGNORECASE) What about the decimal syntax ones? E.g. not only and   but also   Also, entity names can contain digits e.g. ¹ ¾ -- http://mail.python.org/mailman/listinfo/python-list