> I have found that some people refuse to stick to standards, so whenever I > parse XML files I remove any characters that fall in the range > <= 0x1f > >>= 0xf0
Now of what help shall that be? Get rid of all accented characters? Sorry, but that surely is the dumbest thing to do here - and has _nothing_ to do with standards! Charactersets with codepoints > 128 are pretty common and well standarized, just not "ascii". I suggset you read up on the topic of unicode & encodings a bit - and then fix some code of yours... Diez -- http://mail.python.org/mailman/listinfo/python-list