Re: url protection

Gavin Smith Sat, 06 Aug 2022 07:20:35 -0700

On Sat, Aug 06, 2022 at 03:28:52PM +0200, Patrice Dumas wrote:
> Answering to myself, the protection of URL actually does not mean
> protecting all the characters, as the : of the scheme, / as path
> separator should be left as is, and parts already %-escaped should also
> be left as is.  After some thinking, maybe the best, in @url, @email and
> @image would be to protect only non reserved and non unreserved
> characters, and not protect % either, like
>   $result_string =~ s/([^^A-Za-z0-9\-_.!~*'()\$&+,\/:;=\?@\[\]%])/ sprintf 
> "%%%02x", ord $1 /eg;
> Such that if urls are given they are not % encoded.  We also could do
> something different for @image and @url.
>


Characters should be protected if they are not part of the syntax of the URL
but they could be.

Maybe more readable than the WHATWG documentation:
https://www.rfc-editor.org/rfc/rfc3986#page-12

This gives a list of reserved characters, of which there a quite a few.
(It's likely that not all of them occur in Texinfo output.)

So if an image filename has a colon in it, that colon should be encoded
in the href attribute, but a colon that follows the protocol (http:) should
not be encoded, as you say.  Perhaps the percent encoding algorithm could
be performed on a subset of the URL, rather than taking a URL string and
percent encoding throughout.

The treatment of @url/@uref could be different, as you say.  The user provides
the entire URL in the source document.  Arguably it is up to the user to
percent encode appropriately within the URL, and non-ASCII bytes inside the
argument are a risk that the user has made as to whether they are valid or
not.

Re: url protection

Reply via email to