
Here is a sample of what I'm doing:

Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on 
Type "help", "copyright", "credits" or "license" for more information.
>>> filterMap = {}
>>> for i in range(0,255):
...     filterMap[chr(i)] = chr(i)
>>> filterMap[chr(9)] = chr(136)
>>> filterMap[chr(10)] = chr(133)
>>> filterMap[chr(136)] = chr(9)
>>> filterMap[chr(133)] = chr(10)
>>> line = '''this      has
... tabs        and     line
... breaks'''
>>> filteredLine = ''.join([ filterMap[a] for a in line])
>>> import codecs
>>> f = codecs.open('foo.txt', 'wU', 'utf-8')
>>> print filteredLine
>>> f.write(filteredLine)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\Python24\lib\codecs.py", line 501, in write
    return self.writer.write(data)
  File "C:\Python24\lib\codecs.py", line 178, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x88 in position 4: 
not in range(128)

"Mike Currie" <[EMAIL PROTECTED]> wrote in message 
>I did make a mistake, it should have been 'wU'.
> The starting data is ASCII.
> What I'm doing is data processing on files with new line and tab 
> characters inside quoted fields.  The idea is to convert all the new line 
> and characters to 0x85 and 0x88 respectivly, then process the files. 
> Finally right before importing them into a database convert them back to 
> new line and tab's thus preserving the field values.
> Will python not handle the control characters correctly?
> "Serge Orlov" <[EMAIL PROTECTED]> wrote in message 
>> On 6/27/06, Mike Currie <[EMAIL PROTECTED]> wrote:
>>> I'm trying to write out files that have utf-8 characters 0x85 and 0x08 
>>> in
>>> them.  Every configuration I try I get a UnicodeError: ascii codec can't
>>> decode byte 0x85 in position 255: oridinal not in range(128)
>>> I've tried using the codecs.open('foo.txt', 'rU', 'utf-8', 
>>> errors='strict')
>>> and that doesn't work and I've also try wrapping the file in an 
>>> utf8_writer
>>> using codecs.lookup('utf8')
>>> Any clues?
>> Use unicode strings for non-ascii characters. The following program 
>> "works":
>> import codecs
>> c1 = unichr(0x85)
>> f = codecs.open('foo.txt', 'wU', 'utf-8')
>> f.write(c1)
>> f.close()
>> But unichr(0x85) is a control characters, are you sure you want it?
>> What is the encoding of your data?


Reply via email to