David Pratt wrote: [about ord(), chr() and stripping control characters] > Many thanks Steve. This is good information. I think this should work > fine. I was doing a string.replace in a cleanData() method with the > following characters but don't know if that would have done it. This > contains all the control characters that I really know about in normal > use. ord(c) < 32 sounds like a much better way to go and comprehensive. > So I guess instead of string.replace, I should do a ... for char > in ... and check evaluate each character, correct? - or is there a > better way of eliminating these other that reading a string in > character by character. > > '\a','\b','\e','\f','\n','\r','\t','\v','|' >
There are a number of different things you might want to try. One is translate() which, given a string and a translate table, will perform the translation all in one go. For example: >>> delchars = "".join(chr(i) for i in range(32)) + "|" >>> print repr(delchars) '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\ x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f|' >>> nultxfrm = "".join(chr(i) for i in range(256)) >>> So delchars is a list of characters you want to remove, and nultxfrm is a 256-character string where the nultxfrm[n] == chr(n) - this performs no translation at all. So then s = s.translate(nultxfrm, delchars) will remove all the "illegal" characters from s. Note that I am sort-of cheating here, as this is only going to work for 8-bit characters. Once Unicode enters the picture all bets are off. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/ -- http://mail.python.org/mailman/listinfo/python-list