On Jul 30, 4:18 am, Carey Tilden wrote:
> In this case, you've been able to determine the
> correct encoding (latin-1) for those errant bytes, so the file itself
> is thus known to be in that encoding.
The most probably "correct" encoding is, as already stated, and agreed
by the OP to be, cp1252.
In message , Joe
Goldthwaite wrote:
> Next I tried to write the unicodestring object to a file thusly;
>
> output.write(unicodestring)
>
> I would have expected the write function to request the byte string from
> the unicodestring object and simply write that byte string to a file.
Encoded ac
In message <4c51d3b6$0$1638$742ec...@news.sonic.net>, John Nagle wrote:
> UTF-8 is a stream format for Unicode. It's slightly compressed ...
“Variable-length” is not the same as “compressed”.
Particularly if you’re mainly using non-Roman scripts...
--
http://mail.python.org/mailman/listin
In message , Joe
Goldthwaite wrote:
> Ascii.csv isn't really a latin-1 encoded file. It's an ascii file with a
> few characters above the 128 range that are causing Postgresql Unicode
> errors. Those characters work fine in the Windows world but they're not
> the correct byte representation for
On Thu, 29 Jul 2010 23:49:40 +, Steven D'Aprano wrote:
> It looks to me like Python uses a 16-bit implementation internally,
It typically uses the platform's wchar_t, which is 16-bit on Windows and
(typically) 32-bit on Unix.
IIRC, it's possible to build Python with 32-bit Unicode on Windows
"Joe Goldthwaite" wrote in message
news:5a04846ed83745a8a99a944793792...@newmbp...
Hi Steven,
I read through the article you referenced. I understand Unicode better
now.
I wasn't completely ignorant of the subject. My confusion is more about
how
Python is handling Unicode than Unicode its
On Thu, 29 Jul 2010 11:14:24 -0700, Ethan Furman wrote:
> Don't think of unicode as a byte stream. It's a bunch of numbers that
> map to a bunch of symbols.
Not only are Unicode strings a bunch of numbers ("code points", in
Unicode terminology), but the numbers are not necessarily all the same
John Nagle wrote:
On 7/28/2010 3:58 PM, Joe Goldthwaite wrote:
This still seems odd to me. I would have thought that the unicode
function
would return a properly encoded byte stream that could then simply be
written to disk. Instead it seems like you have to re-encode the byte
stream
to some
On 7/28/2010 3:58 PM, Joe Goldthwaite wrote:
This still seems odd to me. I would have thought that the unicode function
would return a properly encoded byte stream that could then simply be
written to disk. Instead it seems like you have to re-encode the byte stream
to some kind of escaped Ascii
Joe Goldthwaite wrote:
Hi Ulrich,
Ascii.csv isn't really a latin-1 encoded file. It's an ascii file with a
few characters above the 128 range . . .
It took me a while to get this point too (if you already have "gotten
it", I apologize, but the above comment leads me to believe you haven't).
On Thu, Jul 29, 2010 at 10:59 AM, Joe Goldthwaite wrote:
> Hi Ulrich,
>
> Ascii.csv isn't really a latin-1 encoded file. It's an ascii file with a
> few characters above the 128 range that are causing Postgresql Unicode
> errors. Those characters work fine in the Windows world but they're not th
Joe Goldthwaite wrote:
Hi Steven,
I read through the article you referenced. I understand Unicode better now.
I wasn't completely ignorant of the subject. My confusion is more about how
Python is handling Unicode than Unicode itself. I guess I'm fighting my own
misconceptions. I do that a lot
Hi Ulrich,
Ascii.csv isn't really a latin-1 encoded file. It's an ascii file with a
few characters above the 128 range that are causing Postgresql Unicode
errors. Those characters work fine in the Windows world but they're not the
correct byte representation for Unicode. What I'm attempting to d
Hi Steven,
I read through the article you referenced. I understand Unicode better now.
I wasn't completely ignorant of the subject. My confusion is more about how
Python is handling Unicode than Unicode itself. I guess I'm fighting my own
misconceptions. I do that a lot. It's hard for me to un
Joe Goldthwaite wrote:
> import unicodedata
>
> input = file('ascii.csv', 'rb')
> output = file('unicode.csv','wb')
>
> for line in input.xreadlines():
> unicodestring = unicode(line, 'latin1')
> output.write(unicodestring.encode('utf-8')) # This second encode
>
On Wed, 28 Jul 2010 15:58:01 -0700, Joe Goldthwaite wrote:
> This still seems odd to me. I would have thought that the unicode
> function would return a properly encoded byte stream that could then
> simply be written to disk. Instead it seems like you have to re-encode
> the byte stream to some
> Hello hello ... you are running on Windows; the likelihood that you
> actually have data encoded in latin1 is very very small. Follow MRAB's
> answer but replace "latin1" by "cp1252".
I think you're right. The database I'm working with is a US zip code
database. It gets updated monthly. The p
On Jul 29, 4:32 am, "Joe Goldthwaite" wrote:
> Hi,
>
> I've got an Ascii file with some latin characters. Specifically \xe1 and
> \xfc. I'm trying to import it into a Postgresql database that's running in
> Unicode mode. The Unicode converter chokes on those two characters.
>
> I could just manua
On 07/28/2010 09:29 PM, John Nagle wrote:
> for rawline in input :
> unicodeline = unicode(line,'latin1')# Latin-1 to Unicode
> output.write(unicodeline.encode('utf-8')) # Unicode to as UTF-8
you got your blocks wrong.
--
http://mail.python.org/mailman/listinfo/python-list
On 7/28/2010 11:32 AM, Joe Goldthwaite wrote:
Hi,
I've got an Ascii file with some latin characters. Specifically \xe1 and
\xfc. I'm trying to import it into a Postgresql database that's running in
Unicode mode. The Unicode converter chokes on those two characters.
I could just manually replac
On 07/28/2010 08:32 PM, Joe Goldthwaite wrote:
> Hi,
>
> I've got an Ascii file with some latin characters. Specifically \xe1 and
> \xfc. I'm trying to import it into a Postgresql database that's running in
> Unicode mode. The Unicode converter chokes on those two characters.
>
> I could just ma
Joe Goldthwaite wrote:
Hi,
I've got an Ascii file with some latin characters. Specifically \xe1 and
\xfc. I'm trying to import it into a Postgresql database that's running in
Unicode mode. The Unicode converter chokes on those two characters.
I could just manually replace those to character
In <[EMAIL PROTECTED]>, fidtz wrote:
import codecs
testASCII = file("c:\\temp\\test1.txt",'w')
testASCII.write("\n")
testASCII.close()
testASCII = file("c:\\temp\\test1.txt",'r')
testASCII.read()
> '\n'
> Bit pattern on disk : \0x0D\0x0A
testASCII.seek(0)
te
On 3 May, 13:39, "Jerry Hill" <[EMAIL PROTECTED]> wrote:
> On 2 May 2007 09:19:25 -0700, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>
> > The code:
>
> > import codecs
>
> > udlASCII = file("c:\\temp\\CSVDB.udl",'r')
> > udlUNI = codecs.open("c:\\temp\\CSVDB2.udl",'w',"utf_16")
> > udlUNI.write(u
On 3 May, 13:00, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> On 3 May 2007 04:30:37 -0700, [EMAIL PROTECTED] wrote:
>
>
>
> >On 2 May, 17:29, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> >> On 2 May 2007 09:19:25 -0700, [EMAIL PROTECTED] wrote:
>
> >> >The code:
>
> >> >import codecs
>
> >
On 2 May 2007 09:19:25 -0700, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> The code:
>
> import codecs
>
> udlASCII = file("c:\\temp\\CSVDB.udl",'r')
> udlUNI = codecs.open("c:\\temp\\CSVDB2.udl",'w',"utf_16")
> udlUNI.write(udlASCII.read())
> udlUNI.close()
> udlASCII.close()
>
> This doesn't se
On 3 May 2007 04:30:37 -0700, [EMAIL PROTECTED] wrote:
>On 2 May, 17:29, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
>> On 2 May 2007 09:19:25 -0700, [EMAIL PROTECTED] wrote:
>>
>>
>>
>> >The code:
>>
>> >import codecs
>>
>> >udlASCII = file("c:\\temp\\CSVDB.udl",'r')
>> >udlUNI = codecs.open("c
On 2 May, 17:29, Jean-Paul Calderone <[EMAIL PROTECTED]> wrote:
> On 2 May 2007 09:19:25 -0700, [EMAIL PROTECTED] wrote:
>
>
>
> >The code:
>
> >import codecs
>
> >udlASCII = file("c:\\temp\\CSVDB.udl",'r')
> >udlUNI = codecs.open("c:\\temp\\CSVDB2.udl",'w',"utf_16")
>
> >udlUNI.write(udlASCII.read
On 2 May 2007 09:19:25 -0700, [EMAIL PROTECTED] wrote:
>The code:
>
>import codecs
>
>udlASCII = file("c:\\temp\\CSVDB.udl",'r')
>udlUNI = codecs.open("c:\\temp\\CSVDB2.udl",'w',"utf_16")
>
>udlUNI.write(udlASCII.read())
>
>udlUNI.close()
>udlASCII.close()
>
>This doesn't seem to generate the corre
29 matches
Mail list logo