-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I think I don't understand what you're saying. The frontend should read the configuration twice? Sword should? Huh? I don't understand. Why this complexity?! Do you mean that:
* Sword reads the entire configuration file as CP1252 encoded * On failure, re-read the configuration file as UTF-8 encoded ??? If this is the case, then this is error prone (even when reading only parts of the configuration), because CP1252 and UTF-8 overlap. Hence data encoded as UTF-8 might be parsed correctly as valid CP1252, even though it was intended to be UTF-8. I mean I find it likely that valid UTF-8 strings might be accepted by a perfectly correct CP1252 encoding checker as valid CP1252. Jaak On 21.05.2014 17:45, DM Smith wrote: > The encoding of the conf is either cp1252 (the default, but called > latin 1) or utf-8. The encoding of the conf matches that of the > module. This may cause the conf to be read twice once for the > default and once for UTF-8, if the module encoding is set to > UTF-8. > > There have been confs that are incorrect with regard to this rule. > > In Him, DM > > On May 21, 2014, at 8:59 AM, Jaak Ristioja <j...@ristioja.ee > <mailto:j...@ristioja.ee>> wrote: > > So this means that actually we want non-standard RTF (someone > should update the wiki). Should we assume UTF-8? Are you sure we > don't have any modules with ISO-8859-something encoded values? > > If we choose any ASCII superset encoding we have to consider at > least the two points: > > * Since the RTF control words and delimeters are specified in ASCII > only, we need to decide whether how the bytes of the superset act > as delimeters and parts of "RTF" control words. For example, > whether the Unicode letter, number, spacing, punctuation, control > etc characters constitute parts of RTF control words or act as > delimiters. > > * In case of encodings where characters may consist of multiple > bytes (e.g. the variable-length UTF-8) we must consider the > character bondaries. We can't just pass through any non-ASCII byte > values. For example, the following bit sequence wouldn't make > sense: > > 11100010 01011100 10000010 01110001 10101100 01100011 > > which is an UTF-8 encoded Euro sign, €, interleaved with bytes of > the ASCII string "\qc". It just doesn't make sense, whereas the > following sequences would be correct: > > 11100010 10000010 10101100 01011100 01110001 01100011 (€\qc) > 01011100 01110001 01100011 11100010 10000010 10101100 (\qc€) > > So depending on the encoding it were correct to detect such cases, > otherwise we end up with invalid Unicode output. > > Blessings, Jaak > > On 21.05.2014 15:19, Chris Burrell wrote: >>>> I believe some conf files have direct unicode (rather than >>>> escaped sequences) in them and that is preferred. >>>> >>>> On 20 May 2014 23:28, "Jaak Ristioja" <j...@ristioja.ee >>>> <mailto:j...@ristioja.ee> <mailto:j...@ristioja.ee>> wrote: >>>> >>>> I've never done BiDi, but I'm not sure I need to take that >>>> into account while fixing the RTF parsing. As I currently >>>> understand it, this particular piece of code does not >>>> support any part from the RTF spec dealing with bidirectional >>>> text handling. Hence all BiDi information contained in the >>>> configuration file strings (e.g. About=) is contained either >>>> in the plain ASCII text or the \u<num> Unicode escapes which >>>> this algorithm should pass through unmodified. >>>> >>>> ...except for HTML entities which should actually be >>>> escaped. This bug in the algorithm I previously failed to >>>> notice. Additionally I forgot that non-ASCII characters in >>>> the input string should also lead to parsing failure. >>>> >>>> Jaak >>>> >>>> >>>> On 20.05.2014 21:01, David Haslam wrote: >>>>> Take care with Right to Left languages such as Hebrew. >>>>> >>>>> i.e. After any patches to the filter, please include some >>>>> testing >>>> for BiDi >>>>> text in the About= field and others. >>>>> >>>>> David >>>>> >>>>> >>>>> >>>>> -- View this message in context: >>>> http://sword-dev.350566.n4.nabble.com/RTFHTML-filter-bugs-tp4653969p4653970.html >>>>> >>>> >>>> Sent from the SWORD Dev mailing list archive at Nabble.com >>>>> <http://Nabble.com>. >>>>> >>>>> _______________________________________________ >>>>> sword-devel mailing list: sword-devel@crosswire.org >>>>> <mailto:sword-devel@crosswire.org> >>>> <mailto:sword-devel@crosswire.org> >>>>> http://www.crosswire.org/mailman/listinfo/sword-devel >>>>> Instructions to unsubscribe/change your settings at above >>>>> page >>>>> >>>> >>>> >>>> >>>> _______________________________________________ sword-devel >>>> mailing list: sword-devel@crosswire.org >>>> <mailto:sword-devel@crosswire.org> >>>> <mailto:sword-devel@crosswire.org> >>>> http://www.crosswire.org/mailman/listinfo/sword-devel >>>> Instructions to unsubscribe/change your settings at above >>>> page >>>> >>>> >>>> >>>> _______________________________________________ sword-devel >>>> mailing list: sword-devel@crosswire.org >>>> <mailto:sword-devel@crosswire.org> >>>> http://www.crosswire.org/mailman/listinfo/sword-devel >>>> Instructions to unsubscribe/change your settings at above >>>> page >>>> > >> >> _______________________________________________ sword-devel >> mailing list: sword-devel@crosswire.org >> <mailto:sword-devel@crosswire.org> >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page > > > > _______________________________________________ sword-devel > mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel Instructions > to unsubscribe/change your settings at above page > -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQgcBAEBAgAGBQJTfbGVAAoJELozJlbjIn79zn4//3Jx81Qgjoj22zshBizjqjrM Liky9QigioZFvoqTSdCp3E51S7ruYhK0CdKl44OL+/66RbeflTbvu/YPUkJswB8Y lb/7e5HKUrVTVB2/pIU0OeRBFK0YLZl8JyupsHg6oidBTHt1yt5TMJMv1TeXaJYs cYh4QwPH7Cn5yH2EzfVW9rSeUKyOwDSAWM4f3DyvsAKyIIHkZyZf3DtxhY6T81/4 FB8jCYq3Jrj3jihVOe9rjRafBmIGDXuQWmT4zlwmoZrXa7MrPdx2Cxmaa4rUu98c AK5HDS7sD/LJslxYCmsMV3VXxdG4UMeM+/oLrl237Uh1vRjALtAx9rads1j/brtV eNAoWfSNJDf3AHZW3CrHF5yiO8bTPUh6AdpNsQtfwg2FK4kF1EfZTW6lwRH/7HES Z2TUYRATwpTUinRZxlF3CUQCdhldNQXFk2yEBmWr1ZtziPRd+3bqZBOmg1qSjN1/ PmqOS7Vxfsw1f7OvFdnFN03KAt2C0Rqo0OBSFgujJbb08PdvdZFIfUldnBXL5Slf AQgOQpMpP4nX0V8S+GA4k+oQBxMYg7Ow3BWyj2ugc9PZ3wR07oeB91Mi+uEQIUK4 fdhIE3POwoeGYMuQoq6CvcGQ+fq4piNETnwGEKU2Gxi8yrGmLwbUl861Nx4VW6ar y91D9n0Yiror3ziuAqmfp3PwIQjBcxsFev4HAZw+N7uXSR8WUGpPhmW+Fv5ulhHy fkzNe8dTvY7qYebjLbD73nLLleyLp1CC+MnJ/pPvV59WyqxOT2s37ar97u5Ktqan 3NUvq9DxNB2A9W7PN20v61kxSbFvaWjKMvbXfpN+qvvLqHf0wfAS2o6Y8/JzuHrO wsQNNgCXyugzRv1nIyP5ZjPTo9fcOUNxp+JmC60HpbKtElYD8e5DQQjNovcj7iTu 1zZgux2tSnc++pILLdu0XLeFOM0YO10wsYUt3uyKW6ldmpfKOzwYDZK1/2IIc40F Y4wGZLTGayOV/H5LWbFszdyTIee678YJIT/rz9nxxxZMDO9F6ZfvBTZ3zolyE9/7 /lO4VOy7vSZZRsy5ecfSsApYVugNgYBy7KED2zAl/65DwPPLOw3y9OUhAWxxJ1hl WOetXDilRCrlHrHQx88f5fhtYwNga1+Qv9rMJy6/gsQclSNs7AQ/bweGil8o4jqN e59YGRgOou5k9eW9wY+RAGz6QvKN2qtq3djIn/5UudHI9NDi9lvkvGttURceOYCM Is3r21LZvgKQorAtOumxienhauK31QmmO1qQcoKE07N+/4CiMCAPfSUE/E75mA2B j81+hPt5/R4FLfa42hN6evL3286Al+7zYcB4VEfAWHzHUT4psNqJG5B5PdtkA+zA TbmOgqkrgYmfA37PBLvAxpps0Zn2EZ+JtH/dcznijOMeiUmk59L+rxM9nzjXsJ2B RzuhklK2h68Y/9G0CAki917l8UWz/S113+IsYCkfvo++EZHMmjLjktkKrkMGYhlQ eppDE3cYKEEsLKHquMj4dMJdrjc7GOpYyUd8JETlWyHF13Zy7m7MgyWihDJf3Mre g1axaEueASaA+MU3VPV2e/uiWphBRWmo07Ye8mnIC2O0Fnxzx5/YwYKFJK8bjVDy iEH4rDohPoJENBJKV7hUyU3D89+pzUlOGKRTqWY2HQpOc9Hhd4GBfvvfbB3HAhYg miWImi7Itx7h3VuuVbCCcZr6EucHD8uKPFsUjN1eqkEq9GyV4hj37MxN+1taGyZi 8yIYoHBa/OcHMWq+Wg85XC+IAYyNYxGEq0D07Ap3SabASw3B8D1FpjhfXi/ZqLMr cgLIDNF6Gecm8Gq+Fdd4mA/Rhukavu8Kh1l1QUSTvdK6iV6a2RvWVW9WmEdrIpmK Ko++rRUdCXBVpg8m9Wx6U16+6k2heYyvWeE4iqiuAWxM6d6SDMMOZpWGF1EJwzVP bScm+PuiJi88CMcIBnap4YYzJc9BDpORz6ca/S9s0Z6Q53kdzc3pK2AJ2W2lIpJL jFxAEdRBZBIHT+93clejyA3TXeSHUNvF6w+CBjcgDf4f+HOeB3KrcyjwEzpKZZjG D5IxfoxQyR2oHp8JfFb65YFvRJ8Tm1U3SsrtODDxReHqZ9WTaH1DjScLpuOe0K87 ikK/CU9M0ipMLcdjn/VU312Qz+qSze1vRJz2J58GX/gjVyi773ccm7mhzdZ+EzbD e6XsGH0poUXyyNSL4R2YGyDlegacZbAd5J+HlLFmN+9Ln8JAviP5lCMr/D1QokmU BlW9WiKxVU72FxwO6Ohu432iFhLhhsGGVzkxvaiRzcIzf/b3A0neTp3qvKtZWeOG v+XjxWw1Pz5ZzVp202t5jDZ/9CGl/wLbpVwdp4OUo5L+VMUXoXXApiEfpAA2mfBC 0J5CrKc5ywMMoOAiHyi6ZDQ3d51P4YT0fZyqgZIBSNrVUIGgf6bgTEEVB1e1uXkY Ht4JoSVEmVNT60V2mMurJSGvFbYgMNmakCktv4i+P/tHDF05oXx1gmh2td1/Xqxz pFe2PWPKEITsDr8MkpzZ/evDKfZcfxnx/HI6GSd1joXEiqcI8DMwfI8TUMRVXppy EsyOxOGFdlex1WzCqXTH3HHja3Dm+IC2ery9ohcyTY4LYEYSVkfsJEtz5zOamzUy P/FztoIp0sO7vKDOxMso8YIESMly/6wOjd9zvuUGtrsgtKd32WvpizaQK3uuNS3x 5bAjQAWdEcD9uL5JF9zl =wEfl -----END PGP SIGNATURE----- _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page