On 7/28/2010 11:32 AM, Joe Goldthwaite wrote:
Hi,

I've got an Ascii file with some latin characters. Specifically \xe1 and
\xfc.  I'm trying to import it into a Postgresql database that's running in
Unicode mode. The Unicode converter chokes on those two characters.

I could just manually replace those to characters with something valid but
if any other invalid characters show up in later versions of the file, I'd
like to handle them correctly.


I've been playing with the Unicode stuff and I found out that I could
convert both those characters correctly using the latin1 encoder like this;


        import unicodedata

        s = '\xe1\xfc'
        print unicode(s,'latin1')


The above works.  When I try to convert my file however, I still get an
error;

        import unicodedata

        input = file('ascii.csv', 'r')
        output = file('unicode.csv','w')

        for line in input.xreadlines():
                output.write(unicode(line,'latin1'))

        input.close()
        output.close()

Try this, which will get you a UTF-8 file, the usual standard for
Unicode in a file.

    for rawline in input :
        unicodeline = unicode(line,'latin1')    # Latin-1 to Unicode
        output.write(unicodeline.encode('utf-8')) # Unicode to as UTF-8


                                John Nagle
        
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to