> > I have found that some people refuse to stick to standards, so whenever I > > parse XML files I remove any characters that fall in the range > > <= 0x1f > > > >>= 0xf0 > > Now of what help shall that be? Get rid of all accented characters? > Sorry, but that surely is the dumbest thing to do here - and has > _nothing_ to do with standards!
Earlier versions of the Microsoft XML parser accept invalid characters (e.g. most of those < 0x1f). Sadly, you do find files in the wild that need to have these stripped before feeding them to a conforming parser. One can be too enthusiastic about the process, though. -- http://mail.python.org/mailman/listinfo/python-list