>>The binary representation of 127 is 0111 1111 and valid ascii char. DEL >>actually (sh$ man ascii) > > Right, and that's why it is encoded: No control characters in a URI.
Great ! :) > The final algorithm for the shiny new unicode aware percent encoding > function would be: > > - percent encode all characters in TABLE > - percent encode all characters below 32 and above 126 > - encode the char in utf-8 > - percent escape all bytes of the encoded char > > The remaining problem is keeping backward compatibility. There are Org > files out there where "á" is encoded as "%E1" and not "%C3A1". The > percent decoding function should be able to recognize these old > escapes and return the right value. > > I looks like this could be done by changing the behavior of > `org-protocol-unhex-string'. Currently it returns the empty string > for "%E1" because it does not represent a valid utf-8 encoded unicode > char. Maybe we could say: If the percent encoded sequence does not > form a valid char, use the old method (extended ASCII?) to decode the > sequences. Well, yes. The function _should_ return something if the end of the string is reached or something else but a `%' is found. I'll have to find out where the function has to look up the correct char. 167 will be a different character for different encodings. This will not handle cases like `Größe' though. Are there cases where strings are encoded the way you showed above, and decoded using `org-unhex-string'? Sebastian _______________________________________________ Emacs-orgmode mailing list Please use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode