Re: url protection

Gavin Smith Thu, 04 Aug 2022 12:33:50 -0700

On 8/3/22, Patrice Dumas <[email protected]> wrote:
> On Wed, Aug 03, 2022 at 12:08:15PM -0700, Per Bothner wrote:
>> On 8/3/22 06:26, Patrice Dumas wrote:
>> > The standard does not seems to clear on the encoding to use for the %
>> > encodings.  URI::Escape has uri_escape() and uri_escape_utf8.  My
>> > feeling is that the best would be to use first encode to the output
>> > encoding and then call URI::Escape uri_escape().
>>
>> If I read https://metacpan.org/pod/URI::Escape correctly,
>> uri_escape_utf8 is equivalent to utf8::encode followed by uri_escape.
>>
>> For html/xhtml output (including epub) I think we should keep it simple:
>> always emit utf8.
>
> This is not what we do in general for html/xhtml.  For epub we always
> emit utf8, as it is mandated by the standard, but for html/xhtml, we
> use, in the default case, the input encoding for the output encoding.
>
>>  The input to url-encoding is a sequence
>> of utf8-bytes. So whether to use uri_escape_utf8 or uri_escape
>> depends on whether conversion to utf8 has already been done.
>
> The conversion should not have already been done at that point, we are
> still character strings in internal perl unicode encoding.  But that was
> not really myquestion, my question was more on whether we should use the
> output encoding to encode string before doing the URI::Escape call, or
> always use UTF-8, even if the document encoding is not UTF-8.
>
>> --
>>      --Per Bothner
>> [email protected]   http://per.bothner.com/
>>
>
>


Are there browsers in non UTF-8 locales manage to follow links percent
encoded in non UTF-8 encodings? This seems like a very niche case.

If not, then it would be only make sense to use UTF-8 encoding.

(Replying from my phone so apologies for bad formatting of email.

Re: url protection

Reply via email to