On Thu, Jul 29, 2010 at 10:59 AM, Joe Goldthwaite <j...@goldthwaites.com> wrote: > Hi Ulrich, > > Ascii.csv isn't really a latin-1 encoded file. It's an ascii file with a > few characters above the 128 range that are causing Postgresql Unicode > errors. Those characters work fine in the Windows world but they're not the > correct byte representation for Unicode. What I'm attempting to do is > translate those upper range characters into the correct Unicode > representations so that they look the same in the Postgresql database as > they did in the CSV file.
Having bytes outside of the ASCII range means, by definition, that the file is not ASCII encoded. ASCII only defines bytes 0-127. Bytes outside of that range mean either the file is corrupt, or it's in a different encoding. In this case, you've been able to determine the correct encoding (latin-1) for those errant bytes, so the file itself is thus known to be in that encoding. Carey -- http://mail.python.org/mailman/listinfo/python-list