On May 21, 2014 8:00 AM, "Jaak Ristioja" <j...@ristioja.ee> wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > So this means that actually we want non-standard RTF (someone should > update the wiki). Should we assume UTF-8? Are you sure we don't have any > modules with ISO-8859-something encoded values? >
The wiki states that the Unicode character is preferred, at least for conf files, over the RTF escaped value. Specifically it must be Unicode encoded as UTF 8 or CP1252. > If we choose any ASCII superset encoding we have to consider at least > the two points: > > * Since the RTF control words and delimeters are specified in ASCII > only, we need to decide whether how the bytes of the superset act as > delimeters and parts of "RTF" control words. For example, whether the > Unicode letter, number, spacing, punctuation, control etc characters > constitute parts of RTF control words or act as delimiters. > > * In case of encodings where characters may consist of multiple bytes > (e.g. the variable-length UTF-8) we must consider the character > bondaries. We can't just pass through any non-ASCII byte values. For > example, the following bit sequence wouldn't make sense: > > 11100010 01011100 10000010 01110001 10101100 01100011 > Did you literally split the individual bytes of the euro character around the other bytes? What possibly valid encoding permits that? Is that a valid UTF 8 sequence? If not, then the file fails to be UTF 8 encoded and the engine either will error or otherwise behave in undefined ways due to invalid input. --Greg > which is an UTF-8 encoded Euro sign, €, interleaved with bytes of the > ASCII string "\qc". It just doesn't make sense, whereas the following > sequences would be correct: > > 11100010 10000010 10101100 01011100 01110001 01100011 (€\qc) > 01011100 01110001 01100011 11100010 10000010 10101100 (\qc€) > > So depending on the encoding it were correct to detect such cases, > otherwise we end up with invalid Unicode output. > > Blessings, > Jaak > > On 21.05.2014 15:19, Chris Burrell wrote: > > I believe some conf files have direct unicode (rather than escaped > > sequences) in them and that is preferred. > > > > On 20 May 2014 23:28, "Jaak Ristioja" <j...@ristioja.ee > > <mailto:j...@ristioja.ee>> wrote: > > > > I've never done BiDi, but I'm not sure I need to take that into account > > while fixing the RTF parsing. As I currently understand it, this > > particular piece of code does not support any part from the RTF spec > > dealing with bidirectional text handling. Hence all BiDi information > > contained in the configuration file strings (e.g. About=) is contained > > either in the plain ASCII text or the \u<num> Unicode escapes which this > > algorithm should pass through unmodified. > > > > ...except for HTML entities which should actually be escaped. This bug > > in the algorithm I previously failed to notice. Additionally I forgot > > that non-ASCII characters in the input string should also lead to > > parsing failure. > > > > Jaak > > > > > > On 20.05.2014 21:01, David Haslam wrote: > > > Take care with Right to Left languages such as Hebrew. > > > > > > i.e. After any patches to the filter, please include some testing > > for BiDi > > > text in the About= field and others. > > > > > > David > > > > > > > > > > > > -- > > > View this message in context: > > http://sword-dev.350566.n4.nabble.com/RTFHTML-filter-bugs-tp4653969p4653970.html > > > Sent from the SWORD Dev mailing list archive at Nabble.com. > > > > > > _______________________________________________ > > > sword-devel mailing list: sword-devel@crosswire.org > > <mailto:sword-devel@crosswire.org> > > > http://www.crosswire.org/mailman/listinfo/sword-devel > > > Instructions to unsubscribe/change your settings at above page > > > > > > > > > > > _______________________________________________ > > sword-devel mailing list: sword-devel@crosswire.org > > <mailto:sword-devel@crosswire.org> > > http://www.crosswire.org/mailman/listinfo/sword-devel > > Instructions to unsubscribe/change your settings at above page > > > > > > > > _______________________________________________ > > sword-devel mailing list: sword-devel@crosswire.org > > http://www.crosswire.org/mailman/listinfo/sword-devel > > Instructions to unsubscribe/change your settings at above page > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.22 (GNU/Linux) > > iQgcBAEBAgAGBQJTfKM/AAoJELozJlbjIn79gXpAAMxwoq17dvVzCikAplQUjON0 > xDJXlDFfKK14w8xj11NSUvJEPjVWlwTi82WzEplQBKfkxtFY09010ZB5IKotEtSP > dcJMjzc4FmuJmPifB7s3gtEOQ81OThMArlnq/aFHvGj6+5D8qjFkQiqOzSJeaORS > C8dPobXSnJkJ/g3zKCdJf/k5msphFbmuIQOD4Ovco2ZHHlukL8QNd8pt3RcPN4Hy > BMxYx9glw3+YJK5Jj63isdsmOGLeRory3PDcHZoPJzu8zssW78Chlsgoh+xWlfkn > zI5PdP1ARhq7K/kUnPp7jXx3LDFiEbmPjrNBi/A03k+n7s2oZWdxm9uBfEEq5VpB > DpdCA19msaEE+fOWOyAAvvZstnCxYrrd01j+HxXUGoA4JHBBVQo01H5udfOdbiBu > nSI5M0GUKBjSSfLSmrh2oTC0qniVMRw4t+IAIJU1chjfBCsoNAx6xTiDE8x+hpjd > A+s8wvgBU0gNbqeOMvWXkHeOWSu7O0oPEp0vVl+6fUPPFDHGR1+2vPXLnCcbASwj > pEJwls9IBis7touUlIt4stlois1Imtw8zKGXXU8h0UmSgRHK0G2Ck8clNptClkMY > +9xP+TGXZI0q+WlzA7M4aD2puQAiJ0iJTm/kV+QGF/1RiaWNGWTG7Oxfufz5XdDn > xqTrAkYoVw3a+ZRgZPs4YbyK3ysVqncvAOFKuqLcEEwiA4zEYztGxPMAhcypQJFH > n6ORlF3/Kmkukj3eapanznmcvoZ+H/APKNWmo2b+TZ10WABCtZVDO+pd1Ed+l2U5 > EytGhMYEqNSMqV109k3It9Ll7a8GVQa6k7AX8/BSXlh6/GaaoIzkSgGJBFAU8Zsj > dW7u6O7wBOTBmE+lUUrwA3igveDhTDhzjORE7Ek74xkhoNVwh1DmqWwJGZbIGb5R > 47yWwxql4pqS4jq3M+TM8SUZaeY/NTjRTn+WLFBGahKVH5Gg/NiB6onfBBRLyYwK > iorFYngEhpKDNJBPp8rfSIg4NxhbupwG9B1Bbrdg6Kj+E+kGsXDuDkBWQEgf1Jwv > 3XbiDBEjUf2wr4TdbUx9GrwrBNP7q9YW0RmbQGlvIahVwtr3/PJGhiU/kS47fAZf > HQMac1US7eYgtW5hzH/YG+41cCI9J0byZBEuSJS2GuSd0LD0Of4bPLxyOxiXqvTU > kwSPIQwsBOZpFIA5Qfc35x5KxVqCGUYBvXhglpZtZGlGr8uIPpshc1gz9ukCejuz > 754upiYTlCzocKpvPbER9QpMZFYb+iDTdc4bU8whmxkP8ATKSDQmYIqUS2ohLKV8 > co5X0741kRaG5oNOBBrM7kn/9nWgFNspFBkJAvGLbD8h6R8S11cu7INrXzJjxv/e > bCAxGXb2UQXXUe18FCYeqUvl5VdQOQt3f7gja3XbitCKkJjUA6i7t1+5vjuMQsAY > NFliiFxNeNjNE4hIIpvA7G3N+2t0W8IjGsystXm6ONN0lM78eLZLLlsrfkPi8NgR > Nydc78zEJfGr8APkiYleIYTi6ftgtDrI9927wNWqgIPqO4vqA1TZngX8wx6YPJou > uF8cSnI0PlcOfEKtsBgZedOpbZlqAt61wvMGMW0YUfiL5LhuP95KQekqDMMBDCQX > mGMehJHRJ5PvoDt8485lGOWdwXn6T7PlakZ1UCtYeMV0Nx2PfPBfU7bnCwSRFQKg > vpUhPCkW5qpvlkBLOpPLwkqcZGiSyLL/YSGp6cVExeeQVHc2hI169zGY9dUHBEMN > CaKwI9Wjn5V95bax3gsMlHnY9c1TB/6yLWnVEJAilm5ijgWW5KxstWoJMd/OptY8 > QvbsOA7K36HfwOwNCblQCGbUrPjikhXTw8ew1aap4OHqGIKUWCMm3z/eHOPRU5mD > Ce2Z86vwYb9T2PcyqUiZOs1WW9TBZx70Hr2JQmRwgMyWpT4DERjofP83IA8vxZdP > 9uKT4j+EBUGoI2zGgE2lapLL/VWrzt6OBMv5iUmR4OIFLdnHevAAy5w53c4+tWjs > SNmjAz8tW5FWiVFR99FQBN6KWXIjKdJGQl+zccOlE0zBQe2grnqFmUeuuBbPiojb > Wch+hqrKDX/VLr/gIP9EErMJ7ZvZ7st+gwPZlFwC7Evf3OCrUnRYIbMI6iLGLoZ6 > c9YLbK67hj1Ho+X99XTeoQj8l2V14TSRCFZBmO7Os5L2kXOEiw0yeV8Dn87LJPFp > 4VcfgFGLi9FRnI36K4+h5JWoyhrGhNHrHsO60Xs2U3a02fRfeUgn/T1Xf0xXbVMC > gX8zJ3aC15pUy/dJaqJ4HIszzPe5ErO7J9GB7AhjVnx8pEE0xayoJkA4VM0YF8Lk > b/IF04rm/dNlsLL7zRzdGpr2uo9esMzFJDYcHnhInhaE7t2iGR4+cgUdRJKA7NJW > ZumxNz3a1EjeZHRLqRxfT8O6Cc55hG4GwVO7JxUnXJtRMx+ENXZslf4ExGdhcTdf > ntjsfngGemyKYv8aMJ9pDlLFVyR+91xSpFp8QYRDtcP14y5Dfh/jh4Kmdu0BqTzt > Wt0KUUZQlx8Qu8XJbatPiieDmjtQ8HPmhsHQAA+QmLzrhEmakrAjTfpWq5eNYQeQ > ei6tawFllPyuNrez2BOP3nfXuSBlfn2+yBfi3H1mJc8urrFwDtt/zqTHdoOtyCNO > PVaqMROmVzgdKg7yyXTBek3UBe8TxMWigvepRvxkGlmMZQkW42/5ft0269esY/bw > tuy57vDPyvQfrJzpN62y > =RNpJ > -----END PGP SIGNATURE----- > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page