Re: [sword-devel] RTFHTML filter bugs

2014-05-21 Thread Greg Hellings
Greg On May 19, 2014 5:12 PM, "Jaak Ristioja" wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi! > > 1) According to http://www.crosswire.org/wiki/DevTools:conf_Files the > \u control word should be followed by a 16-bit signed integer. The > wiki page doesn't mention this, but I as

Re: [sword-devel] RTFHTML filter bugs

2014-05-21 Thread Greg Hellings
On May 21, 2014 8:00 AM, "Jaak Ristioja" wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > So this means that actually we want non-standard RTF (someone should > update the wiki). Should we assume UTF-8? Are you sure we don't have any > modules with ISO-8859-something encoded values?

Re: [sword-devel] RTFHTML filter bugs

2014-05-21 Thread DM Smith
The encoding of the conf is either cp1252 (the default, but called latin 1) or utf-8. The encoding of the conf matches that of the module. This may cause the conf to be read twice once for the default and once for UTF-8, if the module encoding is set to UTF-8. There have been confs that are inc

Re: [sword-devel] RTFHTML filter bugs

2014-05-21 Thread Jaak Ristioja
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 To sum up, we would need to agree on and specify a RTF subset which is Unicode-aware (UTF-8 only?), and implement an Unicode-aware transducer for it. On 21.05.2014 15:59, Jaak Ristioja wrote: > So this means that actually we want non-standard RTF (som

Re: [sword-devel] RTFHTML filter bugs

2014-05-21 Thread Jaak Ristioja
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 So this means that actually we want non-standard RTF (someone should update the wiki). Should we assume UTF-8? Are you sure we don't have any modules with ISO-8859-something encoded values? If we choose any ASCII superset encoding we have to consider

Re: [sword-devel] RTFHTML filter bugs

2014-05-21 Thread Chris Burrell
I believe some conf files have direct unicode (rather than escaped sequences) in them and that is preferred. On 20 May 2014 23:28, "Jaak Ristioja" wrote: > I've never done BiDi, but I'm not sure I need to take that into account > while fixing the RTF parsing. As I currently understand it, this >