Neil Hodgson wrote:
> Fuzzyman:
>
> > Thanks - so I need to decode to unicode and *then* split on line
> > endings. Problem is, that means I can't use Python to handle line
> > endings where I don't know the encoding in advance.
> >
> > In another thread I've posted a small function that *guesses*
Fuzzyman:
> Thanks - so I need to decode to unicode and *then* split on line
> endings. Problem is, that means I can't use Python to handle line
> endings where I don't know the encoding in advance.
>
> In another thread I've posted a small function that *guesses* line
> endings in use.
You
Neil Hodgson wrote:
> Fuzzyman:
>
> > How should I handle line-endings for UTF16 ? Is it possible that other
> > programs (on windows) will have line endings as u'\r\n' ?
>
> Yes, try Notepad and save as Unicode. For the text
>
> Fuzzy
> End of lines
>
> >>> contents = open("C:\\fuzzy.txt", "
Fuzzyman:
> How should I handle line-endings for UTF16 ? Is it possible that other
> programs (on windows) will have line endings as u'\r\n' ?
Yes, try Notepad and save as Unicode. For the text
Fuzzy
End of lines
>>> contents = open("C:\\fuzzy.txt", "rb").read()
>>> contents
'\xff\xfeF\x
Hello all,
I'm handling some text files where I don't (necessarily) know the
encoding beforehand. Because I use regular expressions to parse the
text I *must* decode UTF16 encoded text (otherwise the regexes split on
byte boundaries).
I can recognise UTF8 and BOM and remove (but not necessarily d