Thanks, John. That was all very helpful. It looks like one
option for me would be to put cdata[ around my text with all the weird
characters. Otherwise running it through on of the SAX utilities
before parsing might work.
I wonder if the sax utilities would give me a performance hit. I have 6000 xml files to parse at 100KB each.