Stefan Behnel wrote: > John Machin wrote: > >> On Jan 19, 11:00 pm, Fredrik Lundh <[EMAIL PROTECTED]> wrote: >> >>> John Machin wrote: >>> >>>> I'm happy enough with reassembling the second item. The problem is in >>>> reliably and correctly collapsing the whitespace in each of the above >>>> >>> > fiveelements. The standard Python idiom of u' '.join(text.split()) >>> > won't work because the text is Unicode and u'\xa0' is whitespace >>> >>> >>>> and would be converted to a space. >>>> >>> would this (or some variation of it) work? >>> >>> >>> re.sub("[ \n\r\t]+", " ", u"foo\n frab\xa0farn") >>> u'foo frab\xa0farn' >>> >>> </F> >>> >> Yes, partially. Leading and trailing whitespace has to be removed >> entirely, not replaced by one space. >> > > Sounds like adding a .strip() to me ... > > >
Sounds like adding a .strip(u' ') to me, otherwise any leading/trailing u'\xa0' gets blown away and this must not happen. -- http://mail.python.org/mailman/listinfo/python-list