Remco Gerlich wrote:
> Not sure if this is sufficient for what you need, but how about
>
> import re
> re.sub(u'[\s\xa0]+', ' ', s)
>
> That should replace all occurances of 1 or more whitespace or \xa0
> characters, by a single space.
>
It does indeed, and so does
re.sub(u'\s\+', ' ', s)
beca
Stefan Behnel wrote:
> John Machin wrote:
>
>> On Jan 19, 11:00 pm, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
>>
>>> John Machin wrote:
>>>
I'm happy enough with reassembling the second item. The problem is in
reliably and correctly collapsing the whitespace in each of the
John Machin wrote:
> On Jan 19, 11:00 pm, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
>> John Machin wrote:
>>> I'm happy enough with reassembling the second item. The problem is in
>>> reliably and correctly collapsing the whitespace in each of the above
>> > fiveelements. The standard Python idiom
On Jan 19, 11:00 pm, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> John Machin wrote:
> > I'm happy enough with reassembling the second item. The problem is in
> > reliably and correctly collapsing the whitespace in each of the above
>
> > fiveelements. The standard Python idiom of u' '.join(text.sp
Not sure if this is sufficient for what you need, but how about
import re
re.sub(u'[\s\xa0]+', ' ', s)
That should replace all occurances of 1 or more whitespace or \xa0
characters, by a single space.
Remco
On Jan 19, 2008 12:38 PM, John Machin <[EMAIL PROTECTED]> wrote:
> I'm trying to recove
John Machin wrote:
> I'm happy enough with reassembling the second item. The problem is in
> reliably and correctly collapsing the whitespace in each of the above
> fiveelements. The standard Python idiom of u' '.join(text.split())
> won't work because the text is Unicode and u'\xa0' is whitesp
I'm trying to recover the original data from some HTML written by a
well-known application.
Here are three original data items, in Python repr() format, with
spaces changed to tildes for clarity:
u'Saturday,~19~January~2008'
u'Line1\nLine2\nLine3'
u'foonly~frabjous\xa0farnarklingliness'
Here is