Joachim, I believe the filter is wrong. It should return the UTF-8 value. This is a bug. Anyone want to look through the unicode code chart and recode all these values?
http://www.unicode.org/charts/PDF/U0080.pdf Sorry for the bug Joachim. -Troy. Joachim Ansorg wrote: > Hi, > replying to myself. > > I've been wrong in some of my assumptions. > > JFB is ThML. It contains the entity Æ > > StripText() calls the filter ThMLPlain which converts the Æ into 0xC9, > which is the corresponding cp1252 character code. > > I thought that StripText() would remove all markup and return text in the > encoding given to EncodingFilterMgr. > > My question: > Is that right or wrong? > > Some help would be wonderful, > Joachim > >> Hi, >> I'm just debugging a bug in BibleTime. >> >> Our SWMgr is created to output utf8. >> The module JFB contains the entitiy Æ . >> >> When I call StripText() the entitity is converted to the corresponding >> character in the cp1252 charset, i.e. char with the value 0xC9. >> I thought that the latin2utf8 filter would convert this plain text to utf8 >> because I told SWMgr to do this for me. >> >> Is there a way to set the output encoding for StripText() to be different >> than the module's encoding? >> >> Thanks a lot, >> Joachim > > > _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page