Remco Gerlich wrote: > Not sure if this is sufficient for what you need, but how about > > import re > re.sub(u'[\s\xa0]+', ' ', s) > > That should replace all occurances of 1 or more whitespace or \xa0 > characters, by a single space. > It does indeed, and so does re.sub(u'\s\+', ' ', s) because u'\xa0' *IS* whitespace in the Python unicode world, but it's not whitespace in the HTML sense and it must be preserved.
Cheers, John -- http://mail.python.org/mailman/listinfo/python-list