On 8/3/22, Patrice Dumas <[email protected]> wrote: > On Wed, Aug 03, 2022 at 12:08:15PM -0700, Per Bothner wrote: >> On 8/3/22 06:26, Patrice Dumas wrote: >> > The standard does not seems to clear on the encoding to use for the % >> > encodings. URI::Escape has uri_escape() and uri_escape_utf8. My >> > feeling is that the best would be to use first encode to the output >> > encoding and then call URI::Escape uri_escape(). >> >> If I read https://metacpan.org/pod/URI::Escape correctly, >> uri_escape_utf8 is equivalent to utf8::encode followed by uri_escape. >> >> For html/xhtml output (including epub) I think we should keep it simple: >> always emit utf8. > > This is not what we do in general for html/xhtml. For epub we always > emit utf8, as it is mandated by the standard, but for html/xhtml, we > use, in the default case, the input encoding for the output encoding. > >> The input to url-encoding is a sequence >> of utf8-bytes. So whether to use uri_escape_utf8 or uri_escape >> depends on whether conversion to utf8 has already been done. > > The conversion should not have already been done at that point, we are > still character strings in internal perl unicode encoding. But that was > not really myquestion, my question was more on whether we should use the > output encoding to encode string before doing the URI::Escape call, or > always use UTF-8, even if the document encoding is not UTF-8. > >> -- >> --Per Bothner >> [email protected] http://per.bothner.com/ >> > >
Are there browsers in non UTF-8 locales manage to follow links percent encoded in non UTF-8 encodings? This seems like a very niche case. If not, then it would be only make sense to use UTF-8 encoding. (Replying from my phone so apologies for bad formatting of email.
