Okay, Here is a sample of what I'm doing:
Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> filterMap = {} >>> for i in range(0,255): ... filterMap[chr(i)] = chr(i) ... >>> filterMap[chr(9)] = chr(136) >>> filterMap[chr(10)] = chr(133) >>> filterMap[chr(136)] = chr(9) >>> filterMap[chr(133)] = chr(10) >>> line = '''this has ... tabs and line ... breaks''' >>> filteredLine = ''.join([ filterMap[a] for a in line]) >>> import codecs >>> f = codecs.open('foo.txt', 'wU', 'utf-8') >>> print filteredLine thisêhasêàtabsêandêlineàbreaks >>> f.write(filteredLine) Traceback (most recent call last): File "<stdin>", line 1, in ? File "C:\Python24\lib\codecs.py", line 501, in write return self.writer.write(data) File "C:\Python24\lib\codecs.py", line 178, in write data, consumed = self.encode(object, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0x88 in position 4: ordinal not in range(128) >>> "Mike Currie" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] >I did make a mistake, it should have been 'wU'. > > The starting data is ASCII. > > What I'm doing is data processing on files with new line and tab > characters inside quoted fields. The idea is to convert all the new line > and characters to 0x85 and 0x88 respectivly, then process the files. > Finally right before importing them into a database convert them back to > new line and tab's thus preserving the field values. > > Will python not handle the control characters correctly? > > > "Serge Orlov" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] >> On 6/27/06, Mike Currie <[EMAIL PROTECTED]> wrote: >>> I'm trying to write out files that have utf-8 characters 0x85 and 0x08 >>> in >>> them. Every configuration I try I get a UnicodeError: ascii codec can't >>> decode byte 0x85 in position 255: oridinal not in range(128) >>> >>> I've tried using the codecs.open('foo.txt', 'rU', 'utf-8', >>> errors='strict') >>> and that doesn't work and I've also try wrapping the file in an >>> utf8_writer >>> using codecs.lookup('utf8') >>> >>> Any clues? >> >> Use unicode strings for non-ascii characters. The following program >> "works": >> >> import codecs >> >> c1 = unichr(0x85) >> f = codecs.open('foo.txt', 'wU', 'utf-8') >> f.write(c1) >> f.close() >> >> But unichr(0x85) is a control characters, are you sure you want it? >> What is the encoding of your data? > >
-- http://mail.python.org/mailman/listinfo/python-list