John Machin wrote:

> I'm happy enough with reassembling the second item. The problem is in
> reliably and  correctly collapsing the whitespace in each of the above
 > fiveelements. The standard Python idiom of u' '.join(text.split())
 > won't work because the text is Unicode and u'\xa0' is whitespace
> and would be converted to a space.

would this (or some variation of it) work?

 >>> re.sub("[ \n\r\t]+", " ", u"foo\n  frab\xa0farn")
u'foo frab\xa0farn'

</F>

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to