John Machin wrote: > I'm happy enough with reassembling the second item. The problem is in > reliably and correctly collapsing the whitespace in each of the above > fiveelements. The standard Python idiom of u' '.join(text.split()) > won't work because the text is Unicode and u'\xa0' is whitespace > and would be converted to a space.
would this (or some variation of it) work? >>> re.sub("[ \n\r\t]+", " ", u"foo\n frab\xa0farn") u'foo frab\xa0farn' </F> -- http://mail.python.org/mailman/listinfo/python-list