On Sat, Aug 06, 2022 at 03:28:52PM +0200, Patrice Dumas wrote: > Answering to myself, the protection of URL actually does not mean > protecting all the characters, as the : of the scheme, / as path > separator should be left as is, and parts already %-escaped should also > be left as is. After some thinking, maybe the best, in @url, @email and > @image would be to protect only non reserved and non unreserved > characters, and not protect % either, like > $result_string =~ s/([^^A-Za-z0-9\-_.!~*'()\$&+,\/:;=\?@\[\]%])/ sprintf > "%%%02x", ord $1 /eg; > Such that if urls are given they are not % encoded. We also could do > something different for @image and @url. >
Characters should be protected if they are not part of the syntax of the URL but they could be. Maybe more readable than the WHATWG documentation: https://www.rfc-editor.org/rfc/rfc3986#page-12 This gives a list of reserved characters, of which there a quite a few. (It's likely that not all of them occur in Texinfo output.) So if an image filename has a colon in it, that colon should be encoded in the href attribute, but a colon that follows the protocol (http:) should not be encoded, as you say. Perhaps the percent encoding algorithm could be performed on a subset of the URL, rather than taking a URL string and percent encoding throughout. The treatment of @url/@uref could be different, as you say. The user provides the entire URL in the source document. Arguably it is up to the user to percent encode appropriately within the URL, and non-ASCII bytes inside the argument are a risk that the user has made as to whether they are valid or not.
