Richard Schulman wrote:
> It turns out that the Unicode input files I was working with (from MS
> Word and MS Notepad) were indeed creating eol sequences of \r\n, not
> \n\n as I had originally thought. The file reading statement that I
> was using, with unpredictable results, was
>
> #in_file =
>
Many thanks for your help, John, in giving me the tools to work
successfully in Python with Unicode from here on out.
It turns out that the Unicode input files I was working with (from MS
Word and MS Notepad) were indeed creating eol sequences of \r\n, not
\n\n as I had originally thought. The fil
Richard Schulman wrote:
[big snip]
>
> The BOM is little-endian, I believe.
Correct.
> >in_file = codecs.open(filepath, mode, encoding="utf16???")
>
> Right you are. Here is the output produced by so doing:
You don't say which encoding you used, but I guess that you used
utf_16_le.
>
>
> u'
On Wed, 06 Sep 2006 03:55:18 GMT, Richard Schulman
<[EMAIL PROTECTED]> wrote:
>...I'm now using the codec with
>improved results, but am still puzzled as to how to handle the row
>termination of \n\n, which is being interpreted as two rows instead of
>one.
Of course, I could do a double read on e
On 5 Sep 2006 19:50:27 -0700, "John Roth" <[EMAIL PROTECTED]>
wrote:
>> [T]he file I actually want to process is Unicode (utf-16 encoding).
>>...
>> in_file = open("c:\\pythonapps\\in-graf1.my","rU")
>>...
John Roth:
>You're not detecting the file encoding and then
>using it in the open statement
Thanks for your excellent debugging suggestions, John. See below for
my follow-up:
Richard Schulman:
>> The following program fragment works correctly with an ascii input
>> file.
>>
>> But the file I actually want to process is Unicode (utf-16 encoding).
>> The file must be Unicode rather than AS
Richard Schulman wrote:
> The following program fragment works correctly with an ascii input
> file.
>
> But the file I actually want to process is Unicode (utf-16 encoding).
> The file must be Unicode rather than ASCII or Latin-1 because it
> contains mixed Chinese and English characters.
>
> Whe
Richard Schulman wrote:
[snip]
> in_line = in_file.readline()
[snip]
We'd already deduced that that line was incorrectly published.
Please don't start new threads like this; if you want to make a
correction, do a couple-of-lines reply to your original message.
Now please leave this new thread
The appended program fragment works correctly with an ascii input
file. But the file I actually want to process is Unicode (utf-16
encoding). This file must be Unicode rather than ASCII or Latin-1
because it contains mixed Chinese and English characters.
When I run the program I get an attribute_c
Richard Schulman wrote:
> The following program fragment works correctly with an ascii input
> file.
>
> But the file I actually want to process is Unicode (utf-16 encoding).
> The file must be Unicode rather than ASCII or Latin-1 because it
> contains mixed Chinese and English characters.
>
> When
The following program fragment works correctly with an ascii input
file.
But the file I actually want to process is Unicode (utf-16 encoding).
The file must be Unicode rather than ASCII or Latin-1 because it
contains mixed Chinese and English characters.
When I run the program below I get an attr
11 matches
Mail list logo